Hardware-efficient implementation and experimental demonstration of Hermitian-symmetric IFFT for optical DMT transmitter

Ming Chen; Gang Liu; Long Zhang; Xiang Wang; Hui Zhou; Qinghui Chen; Changqing Xiang

doi:10.1364/OE.27.029817

1. Introduction

Recently, optical orthogonal frequency-division multiplexing (OFDM) has been widely considered as one of the most promising technologies for future high-speed optical communications, due to its high spectral efficiency (SE), robustness to optical fiber dispersions and effective mitigation of transmission impairments with powerful digital signal processing (DSP) [1–4]. In general, optical OFDM can be mainly classified into direct-detection optical OFDM (DDO-OFDM) and coherent optical OFDM (CO-OFDM). In terms of dispersion tolerance and receiver sensitivity, CO-OFDM makes it an excellent candidate for long-haul transmission systems [1–2]. However, it requires more and complex components and DSP algorithms. By contrast, DDO-OFDM has a simple system configuration and is, therefore, more cost-effective for short-reach applications such as optical access networks and data center optical interconnections [3,5–6]. Optical discrete multi-tone (DMT) as a special form of DDO-OFDM allows direct generation of the real-valued OFDM signal by using an inverse fast Fourier transform (IFFT) function with Hermitian symmetric input data [7]. In this way, there is no need to perform digital or analog up-conversion. Therefore, DMT is a more practical solution for hardware implementation of DDO-OFDM transmission systems. DMT technique has been widely used for wired multicarrier transmission such as asymmetric digital subcarrier lines (ADSL). In recent years, optical DMT technique has been extensively studied by offline DSP approaches. Furthermore, the field programmable gate array (FGPA) based real-time DMT transmitters and/or receivers have also been implemented and experimentally demonstrated to fully verify its feasibility for the practical applications [8–13]. Due to its high data-rate of transmission, a high-level partial-parallel or fully-parallel IFFT function is required for the real-time optical DMT transmitter. However, the hardware implementation of such an IFFT function is very on-chip area consumed. In [8], a fully-parallel 128-point complex-valued IFFT required approximately 65% of the FPGA (Virtex-4 XC4VFX100) slices and all on-chip dedicated multipliers for generating a real-time 21.4 G/s DMT signal. To address this problem, a higher radix IFFT may be a straightforward approach, but the complexity of its butterfly processing element increases. As a result, it involves a non-trivial very-large-scale integration (VLSI) implementation problem [14]. In fact, the VLSI implementation of radix-2 and 4 IFFT is dominant due to its low butterfly complexity. In [15], an efficient in-place radix-2 decimation-in-frequency (DIF) FFT algorithm is developed for Hermitian symmetric data and studied by using programming software. The results indicated that the computation and storage requirements were reduced by half compared to the conventional FFT algorithm. However, its hardware implementation has not been investigated. Moreover, many low-complexity IFFT architectures based Hermitian symmetry property have also been proposed and implemented with FPGAs [16–17]. However, these FPGA-based Hermitian symmetric IFFT (HS-IFFT) functions have low throughput due to the serial architecture with in-place computation [16] or low level of parallelism [17]. It is interesting to fully verify the feasibility of fully-parallel HS-IFFT for real-time high-speed optical DMT transmitter.

Inspired by the simplification method for HS-FFT presented in [15], we theoretically analyze the output of each stage of radix-2 decimation-in-time (DIT) IFFT with Hermitian symmetric input data, and then propose a hardware-efficient implementation structure for the HS-IFFT. The implementation complexity of the proposed fully-parallel pipelined HS-IFFT is also analyzed and compared with the conventional complex-valued IFFT (CC-IFFT). Besides, a fully-parallel pipelined 128-point HS-IFFT was implemented on a single FPGA. Furthermore, two real-time DMT transmitters with the proposed HS-IFFT and CC-IFFT are also implemented on the FPGA. Both on-chip resources usage and power consumption of the two DMT transmitters are analyzed. Furthermore, the real-time HS-IFFT and CC-IFFT enabled DMT transmitters were also experimentally demonstrated in a directly-modulated laser (DML)-based short-reach direct-detection system. The bit error rate (BER) and error vector magnitude (EVM) performance of the electrical/optical back to back case and post 20-km single-mode fiber (SMF) transmission are investigated and compared.

The contribution of this work is twofold. Firstly, we theoretically analyzed the principle of the radix-2 DIT HS-IFFT algorithm and proposed a hardware-efficient fully-parallel pipelined HS-IFFT structure. Secondly, the proposed HS-IFFT and the corresponding DMT transmitter were implemented with a single FPGA. Besides, the experimental demonstration of the feasibility of the proposed HS-IFFT was performed in a short-reach optical DMT system for the first time.

The rest of this paper is organized as follows. Section 2 describes the principle of the proposed HS-IFFT in detail. A hardware-efficient implementation of the fully-parallel HS-IFFT structure is proposed. The hardware implementation complexity of the proposed HS-IFFT is theoretically analyzed and compared with the CC-IFFT in Section 3. In Section 4, the proposed HS-IFFT and CC-IFFT as well as the associated real-time DMT transmitters are implemented based on a single FPGA. Moreover, On-chip resources and power consumption of the two real-time DMT transmitters are analyzed. Experimental setup and results are provided and discussed in Section 5. Conclusions are drawn in Section 6.

2. The principle of the proposed hardware-efficient fully parallel HS-IFFT

Mathematically, the (n + 1)-th output of N-point inverse discrete Fourier transform (IDFT) can be written in the following form [18]

(1)$$x(n )= \sum\limits_{k = 0}^{N - 1} {X(k ){e^{j\frac{{2\pi kn}}{N}}}} = \sum\limits_{k = 0}^{N - 1} {X(k )W_N^{ - nk}} $$

where $W_N^{ - nk} = {e^{{{j2\pi nk} \mathord{\left/ {\vphantom {{j2\pi nk} N}} \right.} N}}}$ denotes twiddle factor and n ranges from 0 to N-1. According to the radix-2 DIT algorithm, after splitting the IDFT formula into summations, one of which involves the sum over the first N/2 data and the second sum involves the last N/2 data, the Eq. (1) can be reduced as

(2)$$x(n )= \sum\limits_{k = 0}^{N - 1} {\left[ {X(k )+ {{({ - 1} )}^n}X\left( {k + \frac{N}{2}} \right)} \right]W_N^{ - nk}} $$

Now, let us decimate x(n) into the even and odd-numbered data, and can be expressed as

(3)$$\left\{ \begin{array}{l} x({2n^{\prime}} )= \sum\limits_{k = 0}^{{N \mathord{\left/ {\vphantom {N 2}} \right.} 2} - 1} {\left[ {X(k )+ X\left( {k + \frac{N}{2}} \right)} \right]W_{{N \mathord{\left/ {\vphantom {N 2}} \right.} 2}}^{ - n^{\prime}k}} \\ x({2n^{\prime} + 1} )= \sum\limits_{k = 0}^{{N \mathord{\left/ {\vphantom {N 2}} \right.} 2} - 1} {\left[ {\left( {X(k )- X\left( {k + \frac{N}{2}} \right)} \right)W_N^{ - k}} \right]W_{{N \mathord{\left/ {\vphantom {N 2}} \right.} 2}}^{ - n^{\prime}k}} \end{array} \right.$$

where $n^{\prime} \in \left[ {0,{\kern 1pt} {\kern 1pt} {\kern 1pt} \frac{N}{2} - 1} \right]$,the computational procedure presented Eq. (3) can be repeated through the decimation of the two N/2-point IDFTs x(2n’) and x(2n’+1). The entire process involves L = log₂(N) stages of decimation, where each stage involves N/2 butterflies. Consequently, N-point IDFT via DIT IFFT requires (N/2)log₂(N) complex-valued multiplications and Nlog₂(N) complex-valued additions.

From Eq. (3), we know that the output data from the 1-st stage of N/2 butterflies are split into two groups; each involves N/2 data points. In the general form, there are 2^l groups each involves N/2^l data points from the l-th stage of N/2 butterflies. Here, we define the k-th data points of the (m + 1)-th group in the l-th stage as X_l_,m(k), where $l \in [{1,{\kern 1pt} {\kern 1pt} {\kern 1pt} L} ],m \in [{0,{\kern 1pt} {\kern 1pt} {\kern 1pt} {2^l} - 1} ]$, $k \in \left[ {0,{\kern 1pt} {\kern 1pt} \frac{N}{{{2^l}}} - 1} \right]$ and ${X_{0,0}}(k )= X(k )$. For the N/2^l-¹ data points of the (m + 1)-th group in the (l-1)-th stage {X_l-_1,m(k)}, we can obtain two groups {X_l_,2m(k)} and {X_l_,2m+1(k)} after a butterfly operation, each involves N/2^l in the l-th stage. The relationships between X_l_,2m(k), X_l_,2m+1(k) and X_l-_1,m(k) are defined as

(4)$$\left\{ \begin{array}{l} {X_{l,2m}}(k )= {X_{l - 1,m}}(k )+ {X_{l - 1,m}}\left( {k + \frac{N}{{{2^l}}}} \right)\\ {X_{l,2m + 1}}(k )= \left[ {{X_{l - 1,m}}(k )- {X_{l - 1,m}}\left( {k + \frac{N}{{{2^l}}}} \right)} \right]W_N^{ - k{2^{l - 1}}} \end{array} \right.$$

Given the input data X(k) for N-point IDFT are constrained to have Hermitian symmetry, i.e., X(N-k) = X*(k), where k ≠ 0 and N/2, and X(0) and X(N/2) are real numbers, we analyze N/2^l data points of (m + 1)-th group in the l-th stage. From Eq. (4), we can obtain

(5)$$\begin{aligned} {X_{l,2m}}\left( {\frac{N}{{{2^l}}} - k} \right) &= {X_{l - 1,m}}\left( {\frac{N}{{{2^l}}} - k} \right) + {X_{l - 1,m}}\left( {\frac{N}{{{2^l}}} - k + \frac{N}{{{2^l}}}} \right)\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} &= {X_{l - 1,m}}\left( {\frac{N}{{{2^{l - 1}}}} - \left( {k + \frac{N}{{{2^l}}}} \right)} \right) + {X_{l - 1,m}}\left( {\frac{N}{{{2^{l - 1}}}} - k} \right)\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} &= {X^\ast }_{l - 1,m}\left( {k + \frac{N}{{{2^l}}}} \right) + {X^\ast }_{l - 1,m}(k )\\ {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} &= {X^\ast }_{l,2m}(k ){\kern 1pt} {\kern 1pt} \end{aligned}$$

Similarly, we can prove that

(6)$${X_{l,2m + 1}}\left( {\frac{N}{{{2^l}}} - k} \right) = {X^\ast }_{l,2m + 1}(k ){\kern 1pt} {\kern 1pt}$$

Based on the above derivation, we know that the N/2^l data points of each group in the l-th stage have also Hermitian symmetry. It should be pointed out that the numbers of the data points of each group in the L-1 and L-th stages are 2 and 1, respectively, and no longer applicable to Eqs. (5) and (6). To clarify this point clearly, a data-flow graph of 16-point radix-2 DIT HS-IFFT is presented in Fig. 1.

Fig. 1. Data-flow graph of 16-point radix-2 DIT HS-IFFT.

Download Full Size | PDF

We can see clearly from Fig. 1 that there are 3 and 2 butterflies (blue dotted line) in the 1-st and 2-nd stage, respectively, can be removed due to the output data in the same group of the butterflies are Hermitian symmetric. In general, there are N/4-2^l^-1 butterflies in the l-th stage, where l ranges from 1 to L-2, can be removed. Hence, (L-2)N/4-2^L^-2+1 complex-valued multiplications and (L-2)N/2-2^L^-1+2 additions can be saved for N-point HS-IFFT. Based on the nature of Hermitian symmetry, Eq. (4) can be modified as

(7)$$\left\{ \begin{array}{l} {X_{l,2m}}(k )= {X_{l - 1,m}}(k )+ {X^\ast }_{l - 1,m}\left( {\frac{N}{{{2^l}}} - k} \right)\\ {X_{l,2m + 1}}(k )= \left[ {{X_{l - 1,m}}(k )- {X^\ast }_{l - 1,m}\left( {\frac{N}{{{2^l}}} - k} \right)} \right]W_N^{ - k{2^{l - 1}}} \end{array} \right.$$

where $l \in [{1,{\kern 1pt} {\kern 1pt} L - 2} ]$, $k \in \left[ {0,{\kern 1pt} {\kern 1pt} \frac{N}{{{2^{l + 1}}}}} \right]$ and $m \in [{0,{\kern 1pt} {\kern 1pt} {2^l} - 1} ]$. In this way, on-chip resources used for achieving HS of input data of N-point IFFT and storing conjugate data points {X*_l_,m(k)} in each stage can be further saved. According to Eq. (7), two modified butterflies and the corresponding symbols are given in Fig. 2. Here, we call the two modified butterflies as computing unit A (CUA) and computing unit B (CUB). It should be noted that CUA has the same hardware implementation complexity as the conventional butterfly, and CUB is a special case of CUA when the two input data is conjugate to each other, X_l_-1,m(N/2^l⁺¹) and X*_l_-1,m(N/2^l⁺¹). Especially, the implementation complexity of CUAs in the L-1 and L-th stages can be further reduced, because the inputs of CUAs are only real or imaginary data.

Fig. 2. Modified butterflies: (a,c) CUA and (b,d) CUB

Download Full Size | PDF

As an example, we give the data-flow graph of a 16-point hardware-efficient radix-2 DIT HS-IFFT structure by using CUA and CUB as illustrated in Fig. 3. The CrossWire modules are only used to realize cross-connections in space (X_l_,m(k) to X_l_,m(k)) and do not consume on-chip resources such as registers and LUTs.

Fig. 3. Data-flow graph of the proposed 16-point hardware-efficient DIT HS-IFFT structure.

Download Full Size | PDF

3. Complexity comparison between the proposed HS-IFFT and CC-IFFT

It is necessary to analyze the real-valued multiplications and additions of CUA and CUB for some special cases to further enhance hardware-efficient implementation of the proposed HS-IFFT. In general, four real multipliers and six real adders are required to realize the CUA. However, it can be reduced when the twiddle factor in CUA equals to $W_N^0$ or $W_N^{{{ - N} \mathord{\left/ {\vphantom {{ - N} 8}} \right.} 8}}$. For $W_N^0$ case, no real multipliers are required for CUA, but only 4 real adders; while two real multipliers can be saved for $W_N^{{{ - N} \mathord{\left/ {\vphantom {{ - N} 8}} \right.} 8}}$ case due to the complex-valued multiplication between a complex-valued number a + jb and $W_N^{{{ - N} \mathord{\left/ {\vphantom {{ - N} 8}} \right.} 8}}$ is ${{\sqrt 2 } \mathord{\left/ {\vphantom {{\sqrt 2 } 2}} \right.} 2}({a - b} )+ j{{\sqrt 2 } \mathord{\left/ {\vphantom {{\sqrt 2 } 2}} \right.} 2}({a + b} )$. For CUB, multiplying by 2 can be achieved by shifting left by 1 bit. Therefore, the complex-valued multiplication can be simplified as two’s complement operation. The complexity analysis for N-point radix-2 DIT HS-IFFT is shown in Table 1.

Table 1. Complexity analysis of the proposed N-point HS-IFFT

View Table | View all tables in this article

We also analyze the hardware implementation complexity of the N-point radix-2 DIT CC-IFFT for comparative purposes, as shown in Table 2. The main reason why we make a comparison between the proposed HS-IFFT and CC-IFFT is that the CC-IFFT has been widely used in real-time optical DMT transmitters [8–13,19].

Table 2. Complexity analysis of the N-point Radix-2 DIT CC-IFFT

View Table | View all tables in this article

It shows that the proposed HS-IFFT required real multipliers can be reduced by half compared to the CC-IFFT. Also, the real adders can be saved up to ∼50% when IFFT size is large enough. The numbers of real multipliers and adders are required for different IFFT sizes are given in Fig. 4, where the theoretical real multiplications and additions without taking the special twiddle factors into account are 2Nlog₂(N) and 3Nlog₂(N).

Fig. 4. Required real multipliers and adders versus different IFFT sizes.

Download Full Size | PDF

4. Hardware implementation, on-chip resources usage and power estimation

Based on the proposed hardware-efficient HS-IFFT structure and complexity analysis mentioned above, we design a 128-point radix-2 fully-parallel HS-IFFT with 16-bit fixed-point inputs/outputs with Verilog hardware description language (HDL). It is implemented on a single FPGA chip (XC7VX485T-2FFG1761) by using Xilinx ISE software tool. Furthermore, a real-time DMT transmitter with the HS-IFFT is also developed on the FPGA. For the purpose of comparison, a 128-point fully-parallel Spiral CC-IFFT core with Verilog HDL is also generated online [20]. The detail parameters are as follows: 16-bit fixed inputs/outputs, unscaled, radix-2, fully streaming and data ordering with natural input/natural output. The online generated report shows that the CC-IFFT requires 908 real multipliers and 2308 real adders. It is consistent with the complexity analysis, as shown in Table 2. Besides, the corresponding real-time baseband DMT transmitter is also implemented on the FPGA. The detailed DSP flows in the DMT transmitters are shown in Fig. 5.

Fig. 5. DSP flows in the real-time DMT transmitters and OFDM frame structure.

Download Full Size | PDF

In the DMT transmitter, a peso-random binary sequence (PRBS) with a length of 2¹⁵-1 is generated offline and stored to the on-chip read-only memory (ROM) of the FPGA. A control finite-state machine (FSM) is utilized to control the read or/and write operations of PRBS ROM and first input first output (FIFO) modules. This due to the low sampling rate (5 GS/s) of ADC is employed in our experiment. Besides, the FSM module is also used to control the Mapper module to generate two types of training symbol (TS) and data-carrying OFDM symbol. An OFDM frame consists of 1 TS for timing synchronization, 4 TSs for channel estimation and followed by 400 data-carrying OFDM symbols as shown in the inset of Fig. 5. The time-domain structure of the 5 TSs with BPSK modulation format is the same as the data-carrying OFDM symbol with 16-QAM modulation format. In the process of generation each OFDM symbol, there are 60 subcarriers at the low-frequency bins are used for data transmission; direct current (DC) subcarrier and four high-frequency subcarriers are filled with zeros. Thus, every data-carrying OFDM requires 240-bit PRBS data. When the number of the data stored in the first FIFO module with a width of 240 bits reaches 400, the random BPSK symbols for the generation of the five TSs are first transmitted by the Mapper module under the control of control FSM module. After that, the data from FIFO are mapped to 60 complex-valued 16-QAM symbols per clock cycle (156.25 MHz). The mapped symbols are fed into the proposed HS-IFFT to generate real-valued time-domain OFDM data. However, the mapped symbols should be constrained to have HS if the CC-IFFT is used. The real-valued outputs are digitally clipped and scaled for both peak-to-average power ratio (PAPR) and quantitation noise reduction. The digital clipping ratio is 10.5 dB. Besides, the scaled data are converted to 12-bit data for the DAC. The 128-point data in parallel are stored into the second FIFO module which consists of three FIFOs with 768 (384)-bit write (read) width. When a full OFDM frame is stored this FIFO module, four 384-bit data, each contains 32 12-bit samples, are read out in the former four clock cycles of the continuous five clock cycles. And the fifth clock cycle is for inserting a 32-point cyclic prefix (CP). Once CP insertion is done, thirty-two samples in parallel are fed to the DAC interface module for signed-to-unsigned conversion, bit reordering, and parallel-to-serial conversion. Subsequently, the serialized data are sent to a 5 GS/s DAC via a low-voltage differential signaling (LVDS) interface with 4 LVDS groups each involves 12 LVDS pairs. The DAC clocked by a 5-GHz external clock source and outputs a 625 MHz clock for the FPGA. The DSP modules on the FPGA are clocked by a 156.25 MHz which outputs from the DAC interface module. Thus, the real-time DMT transmitters enabled by both the HS-IFFT and CC-IFFT operate at 20 GS/s without taking CP into account. From the synthesis reports of the Xilinx ISE software tool, the maximum operating frequency can reach up to 300 MHz. Thus, it means that the proposed HS-IFFT and CC-IFFT can support the real-time DMT transmitter to work up to 38.4 GS/s.

The on-chip resource usages of the DSP modules in the HS-IFFT and CC-IFFT based real-time DMT transmitters are shown in Table 3 and 4, respectively.

Table 3. FPGA chip resource usage of the HS-IFFT based DMT transmitter

View Table | View all tables in this article

As we can see from Tables 3 and 4, the required on-chip dedicated multipliers (DSP48E1) for the proposed HS-IFFT and CC-IFFT are 454 and 700, respectively. It is well in agreement with the theoretical complexity analysis for the HS-IFFT case. However, there are 208 multipliers less than the theoretical analysis (908 multipliers) for the CC-IFFT case. This is due to the real parts of the outputs of the CC-IFFT are only used. Meanwhile, the Xilinx ISE software tool automatically removes the signals associated with image parts for optimization in the process of implementation. Therefore, the related on-chip resources involving registers, LUTs and multipliers are saved. Even though this optimization will save some on-chip resources usage for the CC-IFFT, the proposed HS-IFFT still has a low hardware implementation and more than 49% registers, 43% LUTs and 35% multipliers can be reduced compared to the optimized CC-IFFT by the software. The total registers, LUTs and multipliers of the HS-IFFT enabled DMT transmitter are reduced by more than 47%, 38% and 35%, respectively, compared to the CC-IFFT enabled one. Moreover, the on-chip power consumption of the two DMT transmitters is also estimated by using the Xilinx ISE tool XPower Analyzer, and the corresponding power estimates are shown in Table 5.

Table 4. FPGA chip resource usage of the CC-IFFT based DMT transmitter

View Table | View all tables in this article

Table 5. Power consumption estimates of the HS-IFFT and CC-IFFT based DMT transmitters

View Table | View all tables in this article

As we can see from Table 5, the power consumption of on-chip clocks, logic, signals, DSPs and BRAMs in total is reduced by up to 59% by using the proposed HS-IFFT. The IOs have a high power consumption due to 48 LVDS pairs are used for interfacing the 5 GS/s DAC. Two mixed-mode clock managers (MMCMs) are employed in the DAC interface module to generate 200 MHz and 156.25 MHz clocks. The power consumption of MMCMs, IQs and leakage is similar in the two transmitters. By using the proposed HS-IFFT, the total power of the real-time transmitter can be reduced by 32%, compared to that of the CC-IFFT enabled one.

5. Experimental setup, results and discussion

To investigate the quality of the real-time generated DMT signals and transmission performance, the FPGA-based DMT transmitters are experimentally demonstrated in a short-reach direct-detection system. The experimental setup and its photograph are illustrated in Figs. 6(a) and 6(b), respectively. In the real-time DMT transmitter, a Xilinx FPGA evaluation board VC707 equipped with the FPGA chip XC7VX485T-2FFG1761 is used to implement the HS/CC-IFFT and other DSP algorithms. In addition, the FPGA is also used as an FPGA Mezzanine card (FMC) carrier to drive a Euvis 5 GS/S DAC FMC module. The DAC is clocked by an external 5 GHz clock and generates the baseband DMT signal with a peak-to-peak voltage (Vpp) of ∼500 mV. A variable electrical attenuator (ATT) is employed before the electrical amplifier (EA, Mini-Circuit ZX60-14012L-S+) to reduce the nonlinear distortion. The 1 Vpp amplified signal drives a 10 GHz DML operated at 1556.3 nm to perform the electrical-to-optical conversion. The optical DMT signal from the DML is coupled into a transmission span of 20 km SMF (ITU-T G.652D). The launch power is 2 dBm. In optical back-to-back case, the SMF is removed. At the receiver end, a variable optical attenuator (VOA) is used to adjust the received optical power (ROP). To measure the ROP in an easy way, a 9:1 optical coupler (OC) is placed in front of a power meter (PM) and a 10 GHz PIN with a transimpedance amplifier (TIA). The 90% output from the OC is directly detected by the PIN-TIA with AC-coupled output. Subsequently, the detected signals are sampled by a digital storage oscilloscope (DSO, Teledyne Lecroy Wavemaster 820Zi-A) for offline DSP processing. The vertical resolution, sampling rate and bandwidth of DSO are 8-bit, 20 GS/s and 20 GHz, respectively.

Fig. 6. Experimental setup (a) and its photograph (b).

Download Full Size | PDF

The offline receiver DSP flows include TS-based timing synchronization, CP removal, FFT, joint channel estimation based on TS-based inter-symbol frequency averaging (Inter-SFA) and intra-symbol frequency averaging (ISFA) [21], channel equalization and QAM demapping. The ISFA taps are 5 in our experiments. At last, the EVM and BER performances are analyzed.

We measure the EVM and BER performance of the received signals under the system configurations of EB2B, OB2B and post-20 km SMF at the ROP of -4 dBm. The EVM and BER values along with the corresponding 16-QAM constellation diagrams are presented in Fig. 7. It exhibits that the HS-IFFT enabled DMT transmitter has similar EVM and BER performances as the CC-IFFT enabled one in all three cases. What’s more, the EVM performance as a function of subcarrier (SC) index is also shown in Fig. 8.

Fig. 7. Constellation diagrams: (a-c)/(d-f) EB2B, OB2B and post-20 km SMF for HS/CC-IFFT.

Download Full Size | PDF

Fig. 8. The measured EVM performance over different data-carrying SCs

Download Full Size | PDF

In the EB2B case, the EVM performance is gradually degraded as the SC index increases. This is mainly due to the decreased effective number of bits (ENOB) of the DAC and more power fading on the high-frequency SCs caused by the limited output bandwidth. The nonlinear distortions of the EA and DML are the main reasons for the degraded EVM performance in the OB2B case. After 20 km SMF transmission, the EVM performance on the high-frequency SCs is further degraded when compared to that of the OB2B case, which is mainly attributed to the optical fiber dispersions. Nevertheless, a similar EVM performance over the data-carrying SC between HS-IFFT and CC-IFFT enabled transmitters are obviously observed. It should be pointed out that the proposed DIT HS-IFFT structure can also be used to realize the DIF FFT with real-valued inputs after minor modifications for the DMT receiver, according to Eqs. (4) and (7).

6. Conclusion

We have theoretically analyzed the radix-2 decimation-in-time HS-IFFT algorithm. Based on the theoretical analysis, a hardware-efficient fully-parallel HS-IFFT structure was proposed. The hardware implementation complexity of the proposed HS-IFFT was also theoretically analyzed and compared with the CC-IFFT. It showed that the real-valued multiplications and additions could be reduced by up to about 50% compared to the fully-parallel CC-IFFT. Real-time DMT transmitters by using the proposed HS-IFFT and the CC-IFFT are implemented on a single FPGA chip. It indicated that the proposed HS-IFFT could support the DMT transmitter to work up to 38.4 GS/s. Meanwhile, the on-chip resources usage and power consumption are analyzed and discussed. It exhibited that the proposed 128-point fully-parallel HS-IFFT can save 49% registers, 43% LUTs and 35% multipliers compared to that of CC-IFFT. As a result, the on-chip power consumption of the HS-IFFT enabled DMT transmitter can be reduced by up to 32%, compared to that of the CC-IFFT enabled one. Furthermore, the real-time DMT transmitters are also experimentally demonstrated in a short-reach direct-detection system. The results indicated that the proposed HS-IFFT enabled DMT has a similar EVM and BER performance under EB2B, OB2B and post-20 km SMF transmission cases. It is expected that the proposed hardware-efficient HS-IFFT can be applied to the cost- and power-sensitive DMT-based applications.

Funding

National Natural Science Foundation of China (61701180, 61805079); Natural Science Foundation of Hunan Province (2016JJ6097, 2017JJ3212); Scientific Research Foundation of Hunan Provincial Education Department (17C0957, 18B026, 18C0520, 18C0588).

Disclosures

The authors declare no conflicts of interest.

References

1. W. Shieh, Q. Yang, and Y. Ma, “107 Gb/s coherent optical OFDM transmission over 1000-km SSMF fiber using orthogonal band multiplexing,” Opt. Express 16(9), 6378–6386 (2008). [CrossRef]

2. N. Kaneda, T. Pfau, H. Zhang, J. Lee, Y. K. Chen, C. J. Youn, Y. H. Kwon, E. S. Num, and S. Chandrasekhar, “Field demonstration of 100-Gb/s real-time coherent optical OFDM detection,” J. Lightwave Technol. 33(7), 1365–1372 (2015). [CrossRef]

3. F. Li, Z. Cao, M. Chen, X. Li, J. Yu, S. Shi, Y. Xia, and Y. Chen, “Demonstration of four channel CWDM 560 Gbit/s 128QAM-OFDM for optical inter-connection,” in Proc. OFC 2016, paper W4J.2.

4. S. You, Y. Wang, W. Liu, Y. Shen, J. Pang, X. Li, and M. Luo, “400-Gb/s Single-sideband direct detection over 7-Core fiber with SSBI cancellation,” IEEE Photonics Technol. Lett. 31(9), 669–672 (2019). [CrossRef]

5. D. Nesset, “NG-PON2 technology and standards,” J. Lightwave Technol. 33(5), 1136–1143 (2015). [CrossRef]

6. R. Giddings, “Real-time digital signal processing for optical OFDM-based future optical access networks,” J. Lightwave Technol. 32(4), 553–570 (2014). [CrossRef]

7. L. Nadal, M. Svaluto Moreolo, J. M. Fabrega, A. Dochhan, H. Griesser, M. Eiselt, and J. P. Elbers, “DMT modulation with adaptive loading for high bit rate transmission over directly detected optical channels,” J. Lightwave Technol. 32(21), 4143–4153 (2014). [CrossRef]

8. Y. Benlachtar, P. M. Watts, R. Bouziane, P. Milder, D. Rangaraj, A. Cartolano, R. Koutsoyannis, J. C. Hoe, M. Püschel, M. Glick, and R. I. Killey, “Generation of optical OFDM signals using 21.4 GS/s real time digital signal processing,” Opt. Express 17(20), 17658–17668 (2009). [CrossRef]

9. R. P. Giddings, E. Hugues-Salas, and J. M. Tang, “Experimental demonstration of record high 19.125Gb/s real-time end-to-end dual-band optical OFDM transmission over 25 km SMF in a simple EML-based IMDD system,” Opt. Express 20(18), 20666–20679 (2012). [CrossRef]

10. S.-H. Cho, K. W. Doo, J. H. Lee, J. Lee, S. I. Myong, and S. S. Lee, “Demonstration of a real-time 16 QAM encoded 11.52 Gb/s OFDM transceiver for IM/DD OFDMA-PON systems,” Proc. OECC, 2013, paper WP2-3.

11. X. Xiao, F. Li, J. Yu, Y. Xia, and Y. Chen, “Real-time demonstration of 100Gbps class dual-carrier DDO-16QAM-DMT transmission with directly modulated laser,” Proc. OFC 2014, paper M2E.6.

12. M. Chen, J. He, Q. Fan, Z. Dong, and L. Chen, “Experimental demonstration of real-time high-level QAM-encoded direct-detection optical OFDM systems,” J. Lightwave Technol. 33(22), 4632–4639 (2015). [CrossRef]

13. Y. Wu, C. He, Q. Zhang, Y. Sun, and T. Wang, “Low-complexity recombined SLM scheme for PAPR reduction in IM/DD optical OFDM systems,” Opt. Express 26(24), 32237–32247 (2018). [CrossRef]

14. M. Jaber and D. Massicotte, “A new FFT concept for efficient VLSI implemantation: Part II-parallel pipelined processing,” Proc. IEEE Int. Conf. Digital Signal Processing, pp. 1–6 (2009).

15. J. Chen, Q. Cao, and S. Shen, “DFT, FFT Algorithm for a complex conjugate-symmetric sequence,” Journal of Electronics and Information Technology 23(2), 197–202 (2001). (in Chinese)

16. H. F. Chi and Z. H. Lai, “A cost-effective memory-based real-valued FFT and Hermitian symmetric IFFT processor for DMT-based wire-line transmission systems,” Proc. IEEE Int. Symp. Circuits Syst. 6, 6006–6009 (2005). [CrossRef]

17. S. A. Salehi, R. Amirfattahi, and K. K. Parhi, “Pipelined architectures for real-valued FFT and Hermitian-symmetric IFFT with real datapaths,” IEEE Trans. Circuits Syst. II 60(8), 507–511 (2013). [CrossRef]

18. R. N. Bracewell, The Fourier transform and its applications, Third Edition (McGraw-Hill, 2000), Chap. 11.

19. R. P. Giddings, X. Q. Jin, and J. M. Tang, “Experimental demonstration of real-time 3Gb/s optical OFDM transceivers,” Opt. Express 17(19), 16654–16665 (2009). [CrossRef]

20. Spiral DFT/FFT IP Core Generator, http://www.spiral.net/hardware/dftgen.html.

21. X. Liu and F. Buchali, “Intra-symbol frequency-domain averaging based channel estimation for coherent optical OFDM,” Opt. Express 16(26), 21944–21957 (2008). [CrossRef]

Computing Unit (CU)	CUs in the l-th stage	CUs in total	Multipliers	Adders
CUA w/ $W_{N}^{0}$	2^l^-1, $l \in [1, L]$	2^L-1	0	2^L⁺²-4
CUA w/ $W_{N}^{- N / 8}$	2^l^-1, $l \in [1, L - 2]$	2^L^-2-1	2^L^-1-2	3·2^L^-1-6
Other CUAs	2^L^-2-2^l, $l \in [1, L - 3]$	(L-4)·2^L^-2+2	(L-4)·2^L+8	3·(L-4)·2^L^-1+12
CUB	2^l^-1, $l \in [1, L - 1]$	2^L^-1-1	0	2^L^-1-1
Total			(L-4)2^L+2^L^-1+6	3L·2^L^-1+1

Butterfly (BF)	BFs in the l-th stage	BFs in total	Multipliers	Adders
BF w/ $W_{N}^{0}$	2^l^-1, $l \in [1, L]$	2^L-1	0	2^L⁺²-4
BF w/ $W_{N}^{- N / 8}, W_{N}^{- 3 N / 8}$	2^l, $l \in [1, L - 2]$	2^L^-1-2	2^L-4	3·2^L-12
BF w/ $W_{N}^{- N / 4}$	2^l^-1, $l \in [1, L - 1]$	2^L^-1-1	0	2^L⁺¹-4
Other BFs	2^L^-1-2^l⁺¹, $l \in [1, L - 3]$	(L-4)·2^L^-1+4	(L-4)·2^L⁺¹+16	3·(L-4)·2^L+24
Total			(L-4)2^L⁺¹+2^L+12	(3L-3)·2^L+4

Module Name	Slices	Slice Registers	Slices LUTs	RAMB18/36E1	DSP48E1
PRBS ROM&1^st FIFO	86	320	172	8	0
Mapper	111	452	227	0	0
HS-IFFT	9,790	33,665	24,488	0	454
Clip&Scale	2,036	0	5,751	0	0
CP&2^nd FIFO	317	1,196	641	66	0
DAC Interface	152	387	192	0	0
Control FSM	8	19	20	0	0
Total Used	12,501	36,039	31,492	74	454

Module Name	Slices	Slice Registers	Slices LUTs	RAMB18/36E1	DSP48E1
PRBS ROM&1^st FIFO	90	320	172	8	0
Mapper	115	452	227	0	0
HS	240	0	960	0	0
CC-IFFT	16,323	66,416	42,373	0	700
Clip&Scale	3,468	0	6,421	0	0
CP&2^nd FIFO	320	1,196	654	66	0
DAC Interface	152	387	192	0	0
Control FSM	9	19	20	0	0
Total Used	20,718	68,790	51,020	74	700

	Estimated Power Consumption (W)
On-Chip	HS-IFFT-DMT Transmitter	CC-IFFT-DMT Transmitter
Clocks	0.250	0.490
Logic	0.073	0.267
Signals	0.140	0.498
DSPs	0.118	0.211
BRAMs	0.098	0.184
MMCMs	0.240	0.240
IOs	0.972	0.972
Leakage	0.222	0.230
Total	2.112	3.092

Hardware-efficient implementation and experimental demonstration of Hermitian-symmetric IFFT for optical DMT transmitter

Abstract

1. Introduction

2. The principle of the proposed hardware-efficient fully parallel HS-IFFT

3. Complexity comparison between the proposed HS-IFFT and CC-IFFT

4. Hardware implementation, on-chip resources usage and power estimation

5. Experimental setup, results and discussion

6. Conclusion

Funding

Disclosures

References

Cited By

Figures (8)

Tables (5)

Equations (7)

Optics Express