We designed at the register-transfer-level digital signal processing (DSP) circuits for 21.8 Gb/s and 43.7 Gb/s QPSK- and 16-QAM-encoded optical orthogonal frequency division multiplexing (OFDM) transceivers, and carried out synthesis and simulations assessing performance, power consumption and chip area. The aim of the study is to determine the suitability of OFDM technology for low-cost optical interconnects. Power calculations based on synthesis for a 65nm standard-cell library showed that the DSP components of the transceiver (FFTs, equalisation, (de)mapping and clipping/scaling circuits) consume 18.2 mW/Gb/s and 12.8 mW/Gb/s in the case of QPSK and 16-QAM respectively.
© 2011 OSA
Optical orthogonal frequency division multiplexing (OFDM) offers high spectral efficiency, resilience to fibre distortion, and simple equalisation that make it a suitable technology for next generation optical communication systems . Moreover, high-order multi-carrier modulation formats such as QPSK- or QAM-OFDM  are promising approaches to implement the high-capacity optical interconnects required in data centres and high performance computing networks.
Recently, several studies confirming the practical feasibility of real-time digital signal processing (DSP) for multi-gigabit per second optical OFDM signal generation [3–5] and detection [6,7] have been demonstrated using field programmable gate arrays (FPGA). Buchali et al.  demonstrated a real-time 12.1 Gb/s FPGA-based transmitter in wavelength division multiplexing (WDM) transmission experiments with coherent detection, while Benlachtar et al. showed real-time direct detection OFDM signal generation at a sampling rate of 21.4 GS/s  and transmission of 8.36 Gb/s over 1600 km of standard single-mode fibre (SMF) . Coherent optical OFDM reception was demonstrated by Yang et al.  and Kaneda et al.  where a real-time receiver operating at 2.5 GS/s was used to receive a sub-band of a 53.3 Gb/s multiband CO-OFDM signal. Giddings et al. demonstrated real-time transceivers running at 1.5 Gb/s , 3 Gb/s , and 6 Gb/s . In , they also showed an 11.25 Gb/s real-time transceiver using 64-QAM encoding and achieved a transmission over 25 km of standard and MetroCor SMF.
Although the above-mentioned work demonstrated the feasibility of optical OFDM, the real-time implementations were all based on FPGAs. Power consumption must be kept as low as possible to avoid high operating costs and ensure scalability , making it necessary to implement the OFDM transceiver DSP and DAC/ADC in application specific integrated circuit (ASIC) form. In addition to giving accurate estimations of the power consumption, ASIC implementation gives a better reference that can be used to compare OFDM systems with other competing solutions for high speed optical systems. In , we carried out register-transfer-level ASIC design studies assessing the feasibility of using OFDM technology for low-cost, low-power optical interconnects based on 21.8 Gb/s QPSK-OFDM transceivers.
In this paper, we extend the study to include a 43.7 Gb/s 16-QAM system design and include symbol equalisation at the receiver. Key components of 50-channel, QPSK- and 16-QAM-OFDM transceivers were synthesised for a commercial 65nm standard-cell library using Synopsys Design Compiler . Simulations of the DSP circuits and optical link were performed to assess the dependence of the received signal quality on the resolution of the fast Fourier transform (FFT) used to de-multiplex the channels, and to investigate power consumption and chip area trade-offs for a wide variety of FFT algorithms and implementations.
2. System configuration
The DSP part of the transceiver design is shown in Fig. 1 and the optical interconnect design is shown in Fig. 2 . The incoming bit stream is mapped into QPSK or 16-QAM symbols, then fed to an n-bit 128-point inverse FFT. The system uses the discrete multi-tone (DMT) modulation format , in which 50 channels are used to carry data. The remaining 14 channels have zero input to achieve x1.28 oversampling. The time domain signal is then clipped and passed to a 6-bit DAC to convert it into an analogue signal, used, with a DC bias added, to drive an optical intensity modulator. A short length of SMF (up to 300 m) is assumed and results in negligible signal distortion. The data converters are assumed to have negligible roll-off in the frequency response over the frequency range of the sub-carriers. Following direct-detection and a 14 GHz square low-pass filter, the DSP at the receiver converts the incoming serial samples of an 8-bit ADC into parallel and feeds them to a p-bit FFT, following which the data-carrying channels are equalised with 1-tap equalisers and decoded. The DAC and ADC operate at a sampling rate of 28 GS/s and the raw data-rate of the OFDM signal is 21.8 Gb/s for QPSK and 43.7 Gb/s for 16-QAM. Designs with different clock speeds and FFT algorithms are investigated as described in section 4. Other sources of noise besides quantisation noise were neglected. Currently, our transceiver simulations assume synchronisation with a single clock . Further work is planned to develop receiver synchronisation circuits.
We explored a wide space of possible options for FFT algorithms and generated hardware implementations of them together with the other DSP components (those shown as dark grey blocks in Fig. 1) as described in the next section.
3. Transmitter and receiver FFT resolution
We assessed the impact of varying the FFT resolution on the received signal quality using a 215 de Bruijn sequence as an input stimulus. The inverse FFT in the transmitter used a 10 bit precision for the QPSK modulation format based on results from our previous work . The bit precisions of the inverse FFT for 16-QAM, FFT for QPSK, and FFT for 16-QAM were varied over the ranges 6-32, 10-32, and 8-32 respectively. The register-transfer-level transceiver designs were tested in a Verilog simulation test bench in a digital back-to-back configuration i.e. the output from the clipper at the transmitter is directly connected to the scaling block at the receiver and no data converters were used. The received signal quality was assessed using the error vector magnitude (EVM) which is a measure of distances between the ideal constellation and the symbol positions, normalized to the peak constellation symbol magnitude vmax. The EVM is given by :where Ik and Qk are the components of the k-th received symbol and and are the components of the k-th ideal symbol.
Figure 3 shows the EVM of the channels at the input to the QPSK symbol de-mapping block for different FFT resolutions. From Fig. 3a, it can be seen that EVM is not constant over all subcarriers. In particular, low frequency subcarriers suffer a higher penalty than the other subcarriers. Our experiments suggest that this behaviour is caused by biased rounding of fixed point data. As can be seen in Fig. 3a, this penalty at low frequencies is removed when utilising a floating point data format. Increasing the bit precision reduces the average EVM from −11 dB for 10 bits to −35 dB for 32 bits. Based on Fig. 3b, it was decided that the 14-bit FFT gives a good trade-off between performance (EVM < −30 dB) and resources. The results for the 16-QAM system are shown in Fig. 4 . Figure 4a shows the average EVM over all subcarriers when the bit precision of the transmitter IFFT was changed from 6 to 32 and a floating-point FFT was used at the receiver. For the results shown in Fig. 4b the IFFT used floating point representation while the bit precision of the fixed point FFT was varied from 8 to 32. From these plots, it was decided that for 16-QAM, the best trade-off for IFFT and FFT bit precisions are 12 and 16 respectively.
4. ASIC power consumption and area
Once the optimum IFFT/FFT precisions had been identified, the designs employing them were used for synthesis and area/power analysis. When implementing the fast Fourier transform (FFT) and its inverse (IFFT) in hardware, there are many different algorithmic and architectural options to choose from. This means that there are many feasible implementations, each with different cost and throughput characteristics. We used the Spiral hardware generation framework  to automatically explore a range of register-transfer-level IFFT and FFT implementations in Verilog, and we evaluated each in the context of the proposed transceiver.
For this study, we generated seventeen different IFFT and FFT implementations for each system (with 10 bit and 14 bit fixed point data types, respectively, for the QPSK design and 12 bit and 16 bit fixed point data types, respectively, for the 16-QAM design). All designs are based on the Iterative Cooley-Tukey FFT algorithm with varying radices . The designs we consider process either 32, 64, or 128 samples per cycle, and thus must be clocked at 875, 437.5, or 218.75 MHz (respectively) in order to meet the throughput requirement of 28 GS/s. FFT radices 2, 4, 8, 16, and 32 were used for each frequency; radix 64 was additionally used in the 437.5 and 218.75 MHz designs. To meet our OFDM performance target, each design must perform approximately 1012 fixed point operations per second. The designs that process more samples per cycle have a higher area but are clocked at a lower frequency and thus consume less power. This allows a trade-off between power consumption and area.
We used Synopsys Design Compiler Ultra  to synthesize and characterize the trade-off between power and area across the transmitter and receiver design space. We constructed transceivers from each generated IFFT and FFT implementation by integrating appropriate modules for QPSK/16-QAM mapping, de-mapping, clipping/scaling, and equalisation. Then, we synthesized each design for a commercial 65nm standard-cell library.
Figures 5 and 6 show Design Compiler’s reported area and power consumption for each design in the QPSK and 16-QAM systems respectively. Each point on the graphs indicates the power and area of a single implementation of the transmitter or receiver.
In both QPSK and 16-QAM systems, the most power-efficient transmitters and receivers process 128 samples per cycle at 218.75 MHz, while the most area-efficient designs process 32 samples per cycle at 875 MHz. For these architectures, the 128 point mixed radix Cooley-Tukey FFT algorithm has lowest cost when radix is 16, which minimises the number of operations at this problem size under these architectural assumptions. Simpler algorithms, such as the commonly-seen radix 2 FFT have higher computational cost and correspond to the least efficient points seen on the graph.
The solid lines indicate the Pareto-optimal set of designs (the set of designs that give the best trade-off between power and area). Typically, power is the most critical constraint within the system, so one would choose the lowest/rightmost design from the Pareto-optimal set. The most power-efficient QPSK transmitter and receiver consume 6.3 and 11.9 mW/Gb/s respectively, while the 16-QAM transmitter and receiver consume 4.3 and 8.5 mW/Gb/s respectively. These values are for the DSP components considered in the study i.e. FFTs, equalisation, (de)mapping and clipping/scaling circuits. In the receivers, the equalisation multipliers consume approximately 20-30% of the area and power.
5. Optical interconnect simulation
The transmitter and receiver DSP designs using the lowest power FFT pairs (mixed radix Cooley-Tukey with radices 8 and 16) were tested in simulations of a 300m SMF-based interconnect, with the design shown in Fig. 2. The DSP designs were simulated at the register-transfer-level using modelSim, and their input/output data were interfaced with Matlab functions that modelled the operation of the DAC, ADC, S/P and P/S blocks, and optical components (CW laser, linear intensity modulator, fibre, and square-law photodetector). The clipping ratio was optimised to minimise EVM in each case. DAC/ADC and FFT quantisation noise were assumed to be the dominant sources of noise. Figure 7 shows the received signal scatter diagram and EVM values (for both QPSK and 16-QAM systems), which were lower than −20 dB for all 50 channels in the received signal, indicating error-free operation would be expected with these system designs. The QPSK system exhibited high EVM at low frequency channels especially the first five channels. This is caused in part by the fixed point IFFT/FFT implementations as indicated in Fig. 3 and in part by the inter-modulation products generated by the nonlinear photodetector. It is less prominent in the 16-QAM case because higher precision IFFT/FFT cores were used.
We designed and synthesized 28 GS/s DSP circuits for 21.8 Gb/s QPSK- and 43.7 Gb/s 16QAM-OFDM transceivers, and carried out simulations assessing performance, power consumption and chip area, to determine their suitability for optical interconnects for data centre and high performance computing applications. Based on synthesis for commercial 65nm standard-cell libraries, minimum power consumption of the FFTs, equalisation, (de)mapping and clipping/scaling circuits was determined to be 0.40 W in total for the QPSK system and 0.56 W for the 16-QAM system. This contributes approximately 18.2 mW/Gb/s and 12.8 mW/Gb/s to the power consumption of the two OFDM transceivers respectively. This is in addition to the power requirements of the synchronization circuits, and also the DAC and ADC, discussed in detail in . Optical interconnect simulations using this transceiver design predicted average error vector magnitude values lower than −26 dB.
Based on the promising results of these studies, further work is currently underway, based on post-synthesis simulations, assessing optical OFDM transceivers with higher order formats (up to 256-QAM) .
References and links
1. W. Shieh, X. Yi, Y. Ma, and Q. Yang, “Coherent optical OFDM: has its time come? [Invited],” J. Opt. Netw. 7(3), 234–255 (2008). [CrossRef]
2. B. J. C. Schmidt, Z. Zan, L. B. Du, and A. J. Lowery, “100Gbit/s transmission using single-band direct-detection optical OFDM”, in Proc. Optical Fiber Comm.(OFC), paper PDPC3 (2009).
3. Y. Benlachtar, P. M. Watts, R. Bouziane, P. Milder, D. Rangaraj, A. Cartolano, R. Koutsoyannis, J. C. Hoe, M. Püschel, M. Glick, and R. I. Killey, “Generation of optical OFDM signals using 21.4 GS/s real time digital signal processing,” Opt. Express 17(20), 17658–17668 (2009). [CrossRef] [PubMed]
4. F. Buchali, R. Dischler, A. Klekamp, M. Bernhard, and D. Efinger, “Realization of a real-time 12.1 Gb/s optical OFDM transmitter and its application in a 109 Gb/s transmission system with coherent reception,” in Proc. European Conference on Optical Communication (ECOC), (Vienna, 2009), PD paper 2.1.
5. S. C. J. Lee, F. Breyer, D. Cardenas, S. Randel, and A. M. J. Koonen, “Real-time gigabit DMT transmission over plastic optical fibre,” Electron. Lett. 45(25), 1342–1343 (2009). [CrossRef]
7. N. Kaneda, Q. Yang, X. Liu, S. Chandrasekhar, W. Shieh, and Y. Chen, “Real-Time 2.5 GS/s Coherent Optical Receiver for 53.3-Gb/s Sub-Banded OFDM,” J. Lightwave Technol. 28(4), 494–501 (2010). [CrossRef]
8. Y. Benlachtar, P. M. Watts, R. Bouziane, P. Milder, R. Koutsoyannis, J. C. Hoe, M. Püschel, M. Glick, and R. I. Killey, “21.4 GS/s real-time DSP-based optical OFDM signal generation and transmission over 1600km of uncompensated fibre,” in Proc. European Conference on Optical Communication (ECOC), (Vienna, 2009), PD paper 2.4.
9. R. P. Giddings, X. Q. Jin, H. H. Kee, X. L. Yang, and J. M. Tang, “Real-time implementation of optical OFDM transmitters and receivers for practical end-to-end optical transmission systems,” Electron. Lett. 45(15), 800–802 (2009). [CrossRef]
10. X. Q. Jin, R. P. Giddings, and J. M. Tang, “Real-time transmission of 3 Gb/s 16-QAM encoded optical OFDM signals over 75 km SMFs with negative power penalties,” Opt. Express 17(17), 14574–14585 (2009). [CrossRef] [PubMed]
11. R. P. Giddings, X. Q. Jin, and J. M. Tang, “First experimental demonstration of 6Gb/s real-time optical OFDM transceivers incorporating channel estimation and variable power loading,” Opt. Express 17(22), 19727–19738 (2009). [CrossRef] [PubMed]
12. R. P. Giddings, X. Q. Jin, E. Hugues-Salas, E. Giacoumidis, J. L. Wei, and J. M. Tang, “Experimental demonstration of a record high 11.25Gb/s real-time optical OFDM transceiver supporting 25km SMF end-to-end transmission in simple IMDD systems,” Opt. Express 18(6), 5541–5555 (2010). [CrossRef] [PubMed]
13. R. S. Tucker, “The role of optics and electronics in high capacity routers,” IEEE/OSA J. Lightwave Technol. 24–12, 4655–4673 (2006).
14. R. Bouziane, P. Milder, R. Koutsoyannis, Y. Benlachtar, C. R. Berger, J. C. Hoe, M. Püschel, M. Glick, and R. I. Killey, “Design Studies for an ASIC Implementation of an Optical OFDM Transceiver”, in Proc. European Conference on Optical Communications (ECOC), Torino, 19 - 23 September 2010, paper Tu.5.A.4.
15. Synopsys Design Compiler, http://www.synopsys.com/Tools/Implementation/RTLSynthesis/Pages/DCUltra.aspx
16. Y. Benlachtar, P. M. Watts, R. Bouziane, P. Milder, R. Koutsoyannis, J. C. Hoe, M. Püschel, M. Glick, and R. I. Killey, “Real-Time Digital Signal Processing for the Generation of Optical Orthogonal Frequency-Division-Multiplexed Signals,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1235–1244 (2010). [CrossRef]
17. R. A. Shafik, M. S. Rahman, and A. H. M. R. Islam, “On the extended relationships among EVM, BER and SNR as performance metrics,” in Proc. ICECE’06, Bangladesh, Dec. 2006, pp. 408–411.
18. P. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, “Formal datapath representation and manipulation for implementing DSP transforms,” in Proc. ACM/IEEE Design Automation Conference (DAC), pp. 385–390, 8–13 June 2008.
19. C. Van Loan, Computational Frameworks for the Fast Fourier Transform, SIAM (1992)
20. I. Dedic, “56Gs/s ADC: Enabling 100GbE,” in Proc. OFC/NFOEC, 2010, paper OThT6.
21. P. Milder, R. Bouziane, R. Koutsoyannis, C. R. Berger, Y. Benlachtar, R. I. Killey, M. Glick, and J. C. Hoe, “Design and Simulation of 25 Gb/s Optical OFDM Transceiver ASICs”, to be presented at European Conference on Optical Communications (ECOC), Geneva, 18 – 22 September 2011.