Real-time OFDM transmitters breaking the 100 Gbit/s barrier require high-performance, usually FPGA-based digital signal processing. Especially the Fourier transform as a key operation of any OFDM system must be optimized with respect to performance and chip area utilization. Here, we demonstrate an alternative to the widely adopted fast Fourier transform algorithm. Based on an extensive yet optimized use of pre-set look-up tables, our FPGA implementation supports fast reconfigurable channel equalization and switching times in the nanosecond range without re-loading any code. We demonstrate the potential of the concept by realizing the first real-time single polarization OFDM transmitter generating a 101.5 Gbit/s data stream by modulating 58 subcarriers with 16QAM.
©2011 Optical Society of America
Orthogonal frequency division multiplexing (OFDM) emerged as a candidate for high performance optical communications in both long-haul and access network scenarios [1,2]. First experiments have been shown using offline processing for transmitting and receiving OFDM signals over optical fiber [3,4]. However, practical applications call for transmitters and receivers that perform real-time data processing. Application specific integrated circuits (ASIC) or field programmable gate arrays (FPGA) together with high-speed digital-to-analog (DAC) and analog-to-digital converters (ADC) enable the implementation of high-speed real-time OFDM transmission. While ASICs are compact and efficient, they require extensive development time and budget. Conversely, FPGAs are ideal for fast prototyping. Despite the high complexity of real-time OFDM, experiments demonstrating real-time OFDM transmitters [5–8] and receivers [9,10] have recently been shown.
In this paper we introduce the first single polarization 101.5 Gbit/s real-time OFDM transmitter modulating 58 subcarriers with 16QAM. Contrary to other implementations that rely on the fast Fourier transform (FFT) algorithm by Cooley and Tuckey, we concentrated on a 64-point inverse discrete Fourier transform (IDFT), which we implemented in a highly efficient form on an FPGA. The paper is an in-depth report of the first real-time 100 Gbit/s OFDM transmitter demonstration recently presented at OFC .
2. Experimental setup
The real-time OFDM transmitter (Tx) comprises two Xilinx Virtex 5 FPGAs, two high-speed Micram DACs with 6 bit resolution, and an optical IQ-modulator. We modulate a continuous wave (CW) external cavity laser (ECL) with in-phase (I) and quadrature-phase (Q) data as shown in Fig. 1 . A 28 GHz sinusoidal clock signal is split, phase aligned and fed to the DACs which provide the reference clock signal for the FPGAs. Within the FPGAs, a complex OFDM signal is calculated from a pseudo random bit sequence (PRBS, 215 − 1) in real-time and passed to the DACs. The DAC outputs deliver 28 GSa/s and feed the IQ-modulator for generating a 101.5 Gbit/s single polarization coherent optical OFDM signal.
The receiver (Rx) comprises an erbium doped fiber amplifier (EDFA) that boosts the optical OFDM signal in order to set the power to the receiver’s optimum operating point. The signal is then received by an Agilent modulation analyzer (OMA) and sampled by a 20 GHz real-time oscilloscope with 80 GSa/s on two channels simultaneously. The received data are processed offline using standard OFDM receiver algorithms. After a fast Fourier transform (FFT), phase drifts and time-linear phase variations are compensated through the phase information provided by the pilot tones. For real-time processing, additional training sequences would facilitate channel estimation and equalization. Finally, we equalize the frequency-dependent amplitude of the subcarriers (SC), decode the SC signals, and evaluate the error-vector magnitude (EVM).
3. Digital signal processing
One of the main challenges in generating single-carrier quadrature amplitude modulated (QAM) signals and OFDM waveforms is due to demanding digital signal processing (DSP) requirements that come with any real-time OFDM transmitter. Basic DSP schemes can be found in several publications [11,12], but the actual implementation and optimization of the DSP blocks is crucial for high performance transmitters.
3.1 General processing within the FPGA
In order to generate a complex OFDM waveform, we use two FPGAs to calculate the real and imaginary part of the OFDM signal xn, respectively. Each FPGA generates pseudo-random binary sequences (PRBS) for randomly generating complex spectral data Xk, which then form the OFDM symbol via an inverse Fourier transform, Eq. (1) and Fig. 2 . The PRBS generators on both FPGAs are synchronized, i. e., they generate the same random complex data Xk in synchronism. For M-QAM, a number of log2 M consecutive PRBS bits determine Xk within one OFDM symbol for subcarrier k.
The FPGA-internal DSP blocks are shown in Fig. 2 including the interface to the high-speed DAC. Binary data are fed to the inverse discrete Fourier transform (IDFT) block which is later on described in more detail. After the IDFT block, a clipping and rescaling module trims the IDFT output to the 6 bit physical resolution of the DAC. For our investigations we choose the 16QAM format (4 bit) and a number of 64 subcarriers. The relative amount of signal clipping and scaling is selected such that clipping errors and quantization noise have minimum impact on the output signal . To this end, we employ a 64-point IFFT with a numerical precision of 53 bit (double precision floating point, 64 bit word length), which generates an “ideal” random sequence of OFDM symbols. For various combinations of clipping levels and signal scaling we calculate the minimum error vector magnitude (EVM) for a target DAC resolution of 6 bit. Under these conditions, the optimum EVM amounts to 2.7% and the probability is as large as 93% that the compound OFDM signal’s values are located inside the DAC quantization window.
We used Xilinx Virtex 5 FPGAs for the experiments. They provide 24 high-speed transmitters (referred to as GTX) that drive the inputs of the Micram DAC at 7 Gbit/s on each line. Multiplexers enhance the output sampling rate by a factor of 4 resulting in a sampling rate of 28 GSa/s at the DAC’s output. The DAC provides a reference clock to the FPGA which is used for the DSP blocks. Since the FPGA gains its speed through parallelization, a relatively low clock rate suffices to generate the data. Therefore, the full-rate DAC sampling clock of 28 GHz is divided by a factor of 128 resulting in an FPGA reference clock rate of 218.75 MHz.
3.2 Inverse discrete Fourier transform
The key element of an OFDM transmitter is the transformation of frequency domain data into a time domain waveform. As most real-time OFDM transmitters exploit the strength of the IFFT algorithm, we follow a different approach by directly implementing the inverse discrete Fourier transform (IDFT)Fig. 3 .
Since digitally modulated subcarriers are represented by a discrete set of M coefficients Xk only, all possible variations of modulated subcarriers can be stored using look-up tables (LUT), thereby avoiding complex multiplications at runtime. For simplicity, Fig. 3 shows a selection of 4 × M subcarrier waveform LUTs out of N × M in a waterfall display. Each LUT stores N samples for M possible waveforms of a fixed subcarrier k = 0, 1, …, N−1. The figure displays the LUTs for the real part of a modulated SC only. The calculation of the imaginary part follows an analog scheme. This arrangement lends itself to parallelization employing two synchronized FPGAs. The OFDM output signal is obtained by summing all N subcarrier samples for each position n.
Although this implementation avoids real-time complex multiplications, the extensive use of LUTs and adders could seem inefficient at first sight. However, a closer inspection of Eq. (1) reveals redundancies. This is discussed in the following with reference to Fig. 4 .
First, we exploit the periodicity of the N modulated subcarriers Xk exp (j 2 π kn / N) for a given k with respect to time-sample index n. This periodicity is discussed with the help of Fig. 4(a) where all N = 64 Xk exp (j 2 π kn / N) coefficients for k = 6 and Xk = 1 are plotted. It can be seen that the solid line interpolating on the sample dots has a subcarrier period pk = N / k, which need not necessarily be an integer. Further, the samples repeat with the periodicity pn. To extract this periodicity, we need to find the greatest common divisor (GCD) of N and k. If N can be factored into two integer numbers such that N = k 3 k 2, and k can be factored into k = k 1 k 2 with k 1, k 2, k 3 = 1, 2, …, N, then the maximum number k 2 > 1 is the (non-trivial) GCD of N and k, GDC(N, k) = k 2, and the quantities N and k are called (non-trivially) commensurable with respect to k 2. We find that this period is pn = N / k 2. Hence, for each possible data symbol Xk, only the k 2 th part of the maximum number of N samples needs to be processed separately. If the GCD of N and k is k itself, see Fig. 4(b), then pn = pk, and only pk values out of N are different. If k = 0, then only one sample has to be processed.
If redundancies are disregarded, the total number P of samples to be processed with the help of the LUT is, according to Eq. (1),
If sample periodicities are exploited, then the total number of samples to be processed can be reduced to
For the case where N = 2q is a power q of 2, maximum savings of 1 / 3 can be achieved for N >> 1. This may be derived from Eq. (3) or is directly proved as follows: For half of the subcarriers, i. e., for ½ 2q subcarriers, the carrier index k = ko is odd, so N and ko have the GCD k 2 = 1. All 2q samples have to be processed, because no repetition of the basic interval pk can be found. For the even-numbered subcarriers, one half of them, namely ½ 2q −1 subcarriers, have the GCD k 2 = 2. Therefore, only the k 2 th part of the 2q samples is to be processed, i. e., 2q −1 samples. The other half of the even-numbered subcarriers is again split in two. One of these groups, namely ½ 2q −2 subcarriers, has a GCD of k 2 = 4 with respect to N and k, therefore 2q −2 different samples have to be processed. If we proceed with splitting into subcarrier groups, we end up with a sequence of 2q −0, 2q −1, 2q −2, …, 2q − q samples which need processing. We find the total number of samples to be processed per group by multiplying with the number of subcarriers in each group, namely ½ 2q −0, ½ 2q −1, ½ 2q −2, …, ½ 2q −( q −1), 2q − q. In this sequence, the last term deviates from the rest, because in this subcarrier group (as in the last but one group), one subcarrier has to be counted. The total number Pq of samples to be processed for the case where N = 2q is then
For the limiting case that N = 2q is large, the number of samples to be processed is well approximated by Pq >> 1 ≈2 N 2 / 3. Therefore, a maximum of 33% complex adders and LUTs can be saved without compromising computation accuracy.
Second, only data transmitting subcarriers have to be processed using our described IDFT whereas the IFFT must process all N subcarriers at all times. Thus additional savings in complexity are obtained when inserting frequency guard bands as is regularly done in OFDM systems. For instance, we use four pilot tones that have been assigned to the odd subcarriers k = 7, 21, 43, 57 which corresponds to the positions 7, 21, −21, −7 defined in the IEEE 802.11-2007 WLAN standard. Instead of processing 4 × M × N samples for the pilot tones, only 64 samples suffice to represent the full information.
Third, even more LUT storage space can be saved by taking advantage of symmetry relations between the M modulation coefficients Xk. Given that digital modulation formats lead to (in general complex) modulation coefficients Xk with only a discrete number of different phases (e. g., 4 for QPSK), another 75% of LUT storage space can be saved resulting in an overall reduction of 83.5%. The principle of this technique is shown in Fig. 5 for a multilevel phase modulation of the first subcarrier (SC 1). For a better understanding, Fig. 5(b) explicitly shows the periodic continuation of the basic modulated subcarrier. Instead of storing all M waveforms corresponding to the M symbols as shown in Fig. 5(a), only M / 4 waveforms have to be stored since only they consist of different values. A pointer selects an offset within the LUT corresponding to a phase shift of the waveform (e.g., φ = π/4).
Comparisons with other algorithms such as the fast Fourier transform heavily depend on the specific implementation on the FPGA and the hardware platform. Therefore a general estimation of performance and utilized chip area cannot be easily given. This would require implementing competing algorithms and evaluating maximum operation speed, calculation accuracy, and chip area utilization. For higher order IDFTs LUT space becomes an issue. Nevertheless, a 128-point IDFT following the described procedure is not out of reach. Naturally, the storage space doubles. However, in the current design two 64-point IDFTs run in parallel, but could be replaced by a single 128-point IDFT upon need. Hence, the LUT number remains constant whereas the binary adder tree grows by one stage. For a large number of subcarriers an optimized, multiplierless IFFT algorithm could well be superior.
An estimation of complexity comparing our optimized IDFT and the IFFT can be done as follows: If the IDFT of Eq. (1) was performed with the most advanced split-radix fast Fourier transform algorithm , the total number of real additions α(N) and multiplications μ(N) required would be:
In order to compare our multiplierless, optimized IDFT to the split-radix IFFT we need to relate the number of real multiplications to the number of real additions. Although there are many different implementations of multipliers, generally we can build a multiplier from an adder tree. Typically, to implement a binary multiplier with a resolution of r bit, a number of r adders and r bit shifters are needed. Bit shifters are not counted for this estimate. The required resolution of the adders varies from r to 2r − 1 depending on their position in the tree. In a simple attempt to relate the effort of 1-bit adders representing a multiplier to the number of 1-bit adders representing an r-bit adder therefore we introduce a complexity relation c(r), which for the resolution range 4 ≤ r ≤ 64 has been determined for some typical multiplication techniques to be . This way we now can estimate the total of equivalent real adders in an IFFT, namely α(N) + μ(N) c(r), and compare it to the number of real adders 2 × (2/3) (N−5) N used in our optimized IDFT. (Note: the first factor “2” is due to the fact that complex adders require 2 real add operations. Next, in our technique 4 subcarriers comprise constant pilot tones and therefore require a single add operation only. Last, the DC and Nyquist frequencies do not count either. Therefore we require not N 2 but (N−5) N additions only. As a result we find that for N = 32 (r ≥ 8), N = 64 (r ≥ 12) and for N = 128 (r ≥ 20) our IDFT is equivalent or superior to one of the most advanced IFFT algorithms for generating OFDM signals.
The measured overall chip utilization is found to be 84% of slices, 67% of slice registers, and 62% of occupied LUTs based on the FPGA Virtex 5 (XCV5FX200T). Without optimization, we estimate that from the available resources 194% of slices, 187% of slice registers, and 127% of LUTs would be required. Looking only at the part of the design which we optimized in complexity by removing redundancies, we find that in comparison to a non-optimized design we saved 35% of the slices, which fits nicely to the 33% prediction derived from Eq. (4). In addition, by exploiting redundancies in the storage of modulation coefficients Xk, 85% both of the slice registers and the number of LUT could be saved. This again fits to the prediction for symmetric modulation coefficients Xk, where we calculated savings of 83.5%.
FPGAs make very efficient use of LUTs which allows reducing the computational effort, especially if multiplications can be avoided. In the current design we do not take advantage of the onboard block random access memory (BRAM). However, the LUT contents can very well be stored within the BRAM which provides access cycles as fast as 1.8 ns. Further, a flexible response to changing channel properties can be implemented, if the LUTs are reloaded: LUT contents can be overwritten by loading for example a new set of waveforms via an external interface handled by the on-chip microprocessor. The loading time for these waveforms depends strongly on the implementation. However, once the waveforms are received by the respective LUTs, switching from one LUT to another takes only 5 ns. Updating LUT contents can be achieved during runtime without any loss of data.
4. Hardware simulations
For designing FPGA-based DSP systems the use of VHDL (very high speed integrated circuit hardware description language) is common. For debugging complex systems, the simulation platform Modelsim verifies the proper functionality of the developed DSP blocks. We feed the debugged Modelsim output to an offline OFDM decoder for analyzing the expected hardware performance. Spectrum and decoded constellation diagram for simulated data are depicted in Fig. 6 .
Four spectral lines can be clearly identified as pilot tones since they are not broadened by any modulation. The pilot tones help in carrier phase recovery and frequency offset compensation when decoding the received signal. The overlap of all subcarrier constellation diagrams with the pilot tones is also shown in Fig. 6(b). In this simulation distortions are only due to quantization and clipping noise. No analog properties are considered at this point. A residual EVM of 4.8% is found representing the best signal quality that can be theoretically achieved with the described system. This differs from the optimum EVM of 2.7% specified in the previous section. The discrepancy is due to the differences in arithmetic accuracy: The FPGA uses fixed-point effective 8 bit arithmetic, while EVM = 2.7% is a limiting value for an “ideal” computation accuracy.
5. Experimental results
Experimentally, our OFDM Tx works at symbol rates of 437.5 MBd. A number of 58 subcarriers is modulated in the 16QAM format (4 bit / symbol) resulting in an aggregate line-rate of 101.5 Gbit/s. Figure 7(a) displays the electrical spectrum of the measured signal in red and the simulated spectrum in black. The four pilot tones are essential in performing phase recovery and symbol window synchronization. The DC and Nyquist frequency subcarriers remain un-modulated. Together with the 4 pilot tones, we use a total of 58 SC out of 64 for the transport of payload data.
Image frequencies arising from digital-to-analog (Tx) and analog-to-digital (Rx) conversions are separated from the signal band by a 437.5 MHz gap. For removing the image frequencies, the slopes of an analog filter must accommodate to this narrow gap — a challenging task, but not out of reach. However, we decided to use a broadband Rx in combination with DSP to suppress the image spectra before further processing. In Fig. 7(b) we plot the error vector magnitude (EVM) for each modulated subcarrier.
The measured EVM can be used to reliably estimate bit error ratios of BER < 10−3 [15,16]. State-of-the-art forward error correction (FEC) algorithms reduce this BER to values below 10−9 at the cost of some overhead.
Subcarriers far remote from the optical carrier show poorer performance due to a drop in the amplitude transfer function of the transmission system. Figure 7(c) and (d) exemplarily illustrate received constellation diagrams for subcarriers 27 and 51 with an EVM of 7.6% and 9.7%, respectively.
The DAC’s influence on the signal is depicted in Fig. 8(a) . Each DAC comprises a sample-and-hold stage at the output, which results in a rectangular impulse response r(t) and translates to a sinc-shaped spectrum R(f). The discrete DAC input signal xn(t) has a spectrum Xk(f) which is periodic. The DAC output spectrum then is Xk(f) R(f) (not drawn in Fig. 8(b)), so that the spectrum shows a frequency roll-off, red line in Fig. 8(b, right). This spectral drop only partially explains the experimental outcome depicted in Fig. 7(a), where higher-frequency subcarriers show lesser power. The discrepancy is due to the low-pass character of the system, and the total roll-off equals 7 dB /13.6 GHz. Still, the average EVM over all subcarriers is 11.1% which corresponds to a BER of 10−3. Pre-equalization of the system’s frequency response can be easily done by changing the pre-computed LUT symbols given that our optimized IDFT algorithm is used for signal generation.
We show for the first time a real-time OFDM transmitter achieving line rates of 101.5 Gbit/s. We modulate 58 subcarriers, generated by a 64-point IDFT, with 16QAM. The IDFT uses a reconfigurable and highly optimized algorithm. Four pilot tones help in performing phase recovery and window synchronization at the receiver end. The EVM averaged over all modulated subcarriers amounts to 11.1%, corresponding to a BER of 10−3. Using state-of-the-art FEC, this error probability level allows error-free reception.
This work was supported by the European FP7 Project ACCORDANCE, the European Network of Excellence EuroFOS, the Xilinx University Program (XUP), by Micram Microelectronic GmbH, and by the Agilent University Relations Program and the German BMBF Project CONDOR. We further acknowledge financial support from Karlsruhe School of Optics & Photonics (KSOP).
References and links
2. N. Cvijetic, “Optical OFDM for next-generation PON,” in Signal Processing in Photonic Communications, OSA Technical Digest (CD) (Optical Society of America, 2010), paper SPTuB6. http://www.opticsinfobase.org/abstract.cfm?URI=SPPCom-2010-SPTuB6.
3. X. Liu, S. Chandrasekhar, B. Zhu, P. Winzer, A. Gnauck, and D. Peckham, “448-Gb/s reduced-guard-interval CO-OFDM transmission over 2000 km of ultra-large-area fiber and five 80-GHz-grid ROADMs,” J. Lightwave Technol. 29(4), 483–490 (2011). [CrossRef]
4. X. Liu, S. Chandrasekhar, B. Zhu, and D. Peckham, “Efficient digital coherent detection of a 1.2-Tb/s 24-carrier no-guard-interval CO-OFDM signal by simultaneously detecting multiple carriers per sampling,” in Optical Fiber Communication Conference, OSA Technical Digest (CD) (Optical Society of America, 2010), paper OWO2. http://www.opticsinfobase.org/abstract.cfm?URI=OFC-2010-OWO2.
5. F. Buchali, R. Dischler, A. Klekamp, M. Bernhard, and Y. Ma, “Statistical transmission experiments using a real-time 12.1 Gb/s OFDM transmitter,” in Optical Fiber Communication Conference, OSA Technical Digest (CD) (Optical Society of America, 2010), paper OMS3. http://www.opticsinfobase.org/abstract.cfm?URI=OFC-2010-OMS3.
6. R. Schmogrow, M. Winter, B. Nebendahl, D. Hillerkuss, J. Meyer, M. Dreschmann, M. Huebner, J. Becker, C. Koos, W. Freude, and J. Leuthold, “101.5 Gbit/s real-time OFDM transmitter with 16QAM modulated subcarriers,” in Optical Fiber Communication Conference, OSA Technical Digest (CD) (Optical Society of America, 2011), paper OWE5. http://www.opticsinfobase.org/abstract.cfm?URI=OFC-2011-OWE5
7. Y. Benlachtar, P. M. Watts, R. Bouziane, P. Milder, D. Rangaraj, A. Cartolano, R. Koutsoyannis, J. C. Hoe, M. Püschel, M. Glick, and R. I. Killey, “Generation of optical OFDM signals using 21.4 GS/s real time digital signal processing,” Opt. Express 17(20), 17658–17668 (2009). [CrossRef] [PubMed]
8. B. Inan, O. Karakaya, P. Kainzmaier, S. Adhikari, S. Calabro, V. Sleiffer, N. Hanik, and S. Jansen, “Realization of a 23.9 Gb/s real time optical-OFDM transmitter with a 1024 Point IFFT,” in Optical Fiber Communication Conference, OSA Technical Digest (CD) (Optical Society of America, 2011), paper OMS2. http://www.opticsinfobase.org/abstract.cfm?URI=OFC-2011-OMS2
9. D. Qian, T. Kwok, N. Cvijetic, J. Hu, and T. Wang, “41.25 Gb/s real-time OFDM receiver for variable rate WDM-OFDMA-PON transmission,” in National Fiber Optic Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2010), paper PDPD9. http://www.opticsinfobase.org/abstract.cfm?URI=NFOEC-2010-PDPD9.
10. N. Kaneda, Q. Yang, X. Liu, S. Chandrasekhar, W. Shieh, and Y. Chen, “Real-time 2.5 GS/s coherent optical receiver for 53.3-Gb/s sub-banded OFDM,” J. Lightwave Technol. 28(4), 494–501 (2010). [CrossRef]
11. Q. Yang, N. Kaneda, X. Liu, S. Chandrasekhar, W. Shieh, and Y. Chen, “Towards Real-Time Implementation of Optical OFDM Transmission,” in Optical Fiber Communication Conference, OSA Technical Digest (CD) (Optical Society of America, 2010), paper OMS6. http://www.opticsinfobase.org/abstract.cfm?URI=OFC-2010-OMS6.
12. X. Q. Jin, R. P. Giddings, E. Hugues-Salas, and J. M. Tang, “Real-time demonstration of 128-QAM-encoded optical OFDM transmission with a 5.25bit/s/Hz spectral efficiency in simple IMDD systems utilizing directly modulated DFB lasers,” Opt. Express 17(22), 20484–20493 (2009). [CrossRef] [PubMed]
13. C. R. Berger, Y. Benlachtar, and R. Killey, “Optimum clipping for optical OFDM with limited resolution DAC/ADC,” in Signal Processing in Photonics Communications, OSA Technical Digest (CD) (Optical Society of America, 2011), paper SPMB5.
14. S. G. Johnson and M. Frigo, “A modified split-radix FFT with fewer arithmetic operations,” IEEE Trans. Signal Process. 55(1), 111–119 (2007). [CrossRef]
15. R. Schmogrow, B. Nebendahl, M. Winter, A. Josten, D. Hillerkuss, J. Meyer, M. Dreschmann, M. Huebner, C. Koos, J. Becker, W. Freude, and J. Leuthold, “Error vector magnitude as a performance measure for advanced modulation formats,” (to be submitted).
16. R. A. Shafik, M. S. Rahman, and A. H. M. R. Islam, “On the extended relationships among EVM, BER and SNR as performance metrics”, in Proceedings of 4th International Conference on Electrical and Computer Engineering, (Dhaka, Bangladesh, 2006) pp.408–411.