Multi-format carrier recovery for coherent real-time reception with processing in polar coordinates

Benedikt Baeuerle; Arne Josten; Felix Abrecht; Marco Eppenberger; Edwin Dornbierer; David Hillerkuss; Juerg Leuthold

doi:10.1364/OE.24.025629

1. Introduction

An efficient multi-format carrier recovery is an essential processing block in coherent receivers for future elastic optical networks (EON) [1,2]. A particular challenge in such networks is that transceivers need to adapt flexibly to actual traffic demands. Thus, coherent receivers need to dynamically follow the modulation format and signal bandwidth of the transmitters. For this, transceivers need efficient, universal, and flexible digital signal processing (DSP) units.

Two vital blocks in such DSP units are the carrier frequency recovery (CFR) and carrier phase recovery (CPR) [3]. The CFR detects, tracks, and corrects for the carrier frequency offset (CFO) between transmitter and receiver laser. The CPR compensates for laser phase noise and phase offsets. The phase noise originating from the lasers` finite linewidth is varying much faster than the frequency drift. For the CFR, only slow changes have to be tracked which simplifies the implementation. Here, we confine the discussion to non-data-aided algorithms as they achieve higher spectral efficiencies because they avoid additional training symbols or pilot tones.

There are mainly two categories of CFR algorithms. The M^th power algorithm that can be implemented either in time [4] or in frequency domain [5] and the blind frequency search (BFS) [6]. CPR algorithms can also be subdivided in two big groups. The most common algorithm for QPSK is the Viterbi-Viterbi phase estimation (VVPE) algorithm [7]. The VVPE algorithm has been adapted with QPSK-partitioning [8], multi-stage approaches [9], and for higher order modulation formats [10]. A second group of CPR algorithms uses the blind phase search (BPS) [11]. The BPS typically suffers from a large complexity. Meanwhile, the complexity has been reduced by multi-stage approaches [12,13] and with a simplified cost function [14]. Both approaches have already been demonstrated in real-time at low symbol rates [15]. Another group of CPR algorithms takes advantage of a nonlinear transformation of the received signal. They use either a harmonic decomposition [16,17] or the nonlinear least square [18]. Recently, an additional approach has been shown for a real-time implementation at 25 GBd [19] where a multi-symbol delay detection (MSDD) scheme is used [20].

However, a flexible, blind, and joint CFR and CPR that operates in real-time with low hardware complexity at highest data rates has not yet been shown. This in part is because such hardware implementations in coherent optical communication links with >100 Gbit/s are challenging as the hardware has to process several 100 Gbit/s to Tbit/s of raw data at DSP clock frequencies that typically operate below 1 GHz. Therefore, massive parallel processing is required [21]. Thus, big blocks of data need to be processed simultaneously. Yet, due to the large number of parallel operations, a real-time capable hardware implementation can only be implemented if the processing complexity is minimized. Minimum complexity is also necessary to achieve a reasonable low power consumption.

In this paper, we present a real-time, flexible, blind, and joint CPR [22] and CFR [23] algorithm that operates in polar coordinates for operation with low complexity. The algorithms are based on the BFS and BPS techniques. We explain the operation principle, show the performance through simulations and demonstrate a real-time hardware implementation for the 4QAM, 8QAM, and 16QAM modulation formats. The implementation is realized without using hardware multiplications on the FPGA. The resilience of the algorithms to impairments such as laser phase noise (LPN), low signal-to-noise ratio (SNR), and a static and varying carrier frequency offset (CFO) has been analyzed using MATLAB and ModelSim simulations. The complexity and hardware requirements have been studied for an FPGA-based prototyping platform with Vivado design tools (Xilinx).

2. Operation principle and implementation

The multi-format carrier recovery algorithm comprises two main blocks. The first part is a CFR algorithm that compensates for CFO and CFO drifts between transmitter and receiver. The second part is a CPR algorithm that compensates for the laser phase noise. The CFR is adapted in parts from [6] and has been modified for lowest hardware complexity as introduced in [20]. The CPR is partially based on [11] and a real-time implementation with a modified metric is presented in [19]. Key for lowest processing complexity and therefore a hardware implementation of our CR are processing in polar coordinates and new simplified metrics.

The general principle for the CFR and CPR algorithms is shown in Fig. 1(a). For both algorithms, the input data is needed in polar coordinates. If the data is provided in Cartesian coordinates, a coordinate transformation block is needed. Such a coordinate transformation can be implemented with little hardware resources using the CORDIC algorithm [24]. Both algorithms consist of three steps. In the first step, test frequencies or test phases are used to blindly correct the received signal. In the second step, the corrected signal is evaluated with a cost function to judge the quality of the applied correction. The cost function becomes minimal when the optimum correction frequency or phase has been found. For different modulation formats like 4QAM, 8QAM, and 16QAM, see Fig. 1(b), different cost functions are used. The cost function can also be adapted for other modulation formats especially if their symbols can adequately be described in polar coordinates. This applies for modulation formats such as phase shift keying (m-PSK) and amplitude and phase shift keying (m-APSK). In the last step, the selected correction frequency or phase is applied to the signal. For the CFR, the frequency error changes slowly. Therefore, the test frequencies can be applied in sequence over a large number of clock cycles. However, the phase error varies quickly so that the test phases in the CPR have to be applied and evaluated in parallel. In the following sections, we describe the implementations of CFR and CPR in detail.

Fig. 1 (a) Schematic of the carrier recovery operation principle. The two important blocks are the carrier frequency recovery (CFR) and the carrier phase recovery (CPR). The CFO varies slowly with time so that testing of the proper frequency may be performed in a serial manner, i.e. with one test frequency at a time. The CFR algorithm then evaluates a cost function with the defined test frequency to select the correct frequency offset and corrects the CFO. The CPR adds test phases in parallel, evaluates the cost function for each and corrects the phase. (b) The three modulation formats (4QAM, 8QAM, and 16QAM) that are subsequently recovered by the proposed algorithm. The different constellation points are subdivided into groups with identical magnitude (r₀, r₁, and r₂).

Download Full Size | PDF

2.1 Multi-format frequency recovery implementation

The schematic of the CFR is depicted in Fig. 2. As the frequency drift of typically 0.2 MHz/μs [25] is significantly smaller than the target symbol rate, the CFR can operate at a lower speed. Therefore, it is sufficient for our implementation of the CFR to evaluate only one test frequency per clock cycle and select the correct frequency offset once all test frequencies have been evaluated.

Fig. 2 The proposed carrier frequency recovery corrects the frequency offset in four steps. To provide the samples in polar coordinates, the CORDIC algorithm [21] transforms L_CFR complex time samples r_l from Cartesian to polar coordinates. (1) In the first step, K test frequencies f_k are applied sequentially to the incoming signal by adding the corresponding linear phase ramp (φ_k,1,…, φ_k,L). (2) In the second step, the cost function J_CFR(f_k) is calculated for the current test frequency f_k. (3) In the third step, the results are stored in a buffer and evaluated by the min(∙) block, which selects for the correct CFO among the K test-frequencies. An exponential average function is applied to improve the estimation. (4) The frequency offset is corrected by applying the correction with a vector of linearly increasing phase from the selected CFO.

Download Full Size | PDF

If the signal is not available in polar coordinates, it is first converted from Cartesian coordinates to polar coordinates to reduce the complexity of the subsequent steps. Here, we exploit the CORDIC algorithm [24] that converts the incoming complex time samples $r_{l}$ (sampled at times $t_{l}$ with $l = 1, \dots, L_{CFR}$ ), to polar coordinates with amplitude ( $| r_{l} |$ ) and phase( $∠ r_{l}$ ) without the use of multiplications.

The CFR itself operates on a block of $L_{CFR}$ samples per clock cycle and comprises four steps: A test frequency is applied to the signal, a cost function is evaluated, the optimal test frequency is detected, and the CFO is corrected. In the following, we describe the respective steps in more detail.

We first apply one of $K$ test frequencies $f_{k}$ ( $k = 1, ..., K$ ) per clock cycle to the received signal. This is implemented by adding to the signal phase ( $∠ r_{l}$ ) a linear phase ramp $φ_{k, 1}, \dots, φ_{k, L}$ that corresponds to the test frequency $f_{k} = d φ_{k} / d t$ . For the correct test frequency, this results in a sequence of $L_{CFR}$ samples with a minimal frequency offset.

In the second step, we calculate the cost function J_CFR(f_k) to obtain a measure for the residual frequency offset. For the minimal frequency offset that corresponds to the correct test frequency, the phase of the samples should now be nearly constant. As a result, the variance of the phase, which is a superposition of the added noise of the signal and the phase drift due to the frequency offset, will be minimal as well. To calculate the variance of the phase, additional processing steps are required to remove the symbol information. Depending on the received format (4QAM, 8QAM, or 16QAM), we remove the symbol information differently. For 4QAM we map the four symbols to the first quadrant by performing a modulo $π / 2$ operation on the phase, see Fig. 3(a)

ϑ_{test, k, l} = \mod (∠ r_{l} + φ_{k, l}, \frac{π}{2}) .

In a hardware implementation, this is easily realized by neglecting the two most significant bits of the phase information of the signal. For 8QAM, we perform the same modulo

π / 2

operation but additionally rotate the outer ring symbols on the radius

r_{1}

by

π / 4

, see Fig. 3(b). For 16QAM, only symbols on the inner

r_{0}

and the outer ring

r_{2}

are selected through amplitude partitioning and subsequently mapped to the first quadrant with the modulo

π / 2

operation, Fig. 3(c). In [23], we used additionally the middle ring

r_{1}

for frequency estimation. Here, the middle ring

r_{1}

is neglected to reduce the hardware complexity.

Fig. 3 Partitioning and remapping of the symbols to remove data information from the signal and therefore align the symbols to an axis with identical relative phase φ_ref. Here, the reference phase φ_ref is π / 4. The process is adapted for (a) 4QAM, (b) 8QAM, and (c) 16QAM. For 4QAM, a modulo π / 2 suffices to align all symbols to one phase. For 8QAM and 16QAM, the constellation points have to be separated in groups with identical magnitude (r₀, r₁, and r₂). Subsequently, the grayed out constellation points in (b) and (c) are either shifted by π / 4 for 8QAM (b) or neglected for 16QAM (c).

Download Full Size | PDF

We then calculate the phase difference $Δ φ_{k, l}$ to the reference phase at $φ_{ref} = π / 4$

Δ φ_{k, l} = | ϑ_{test, k, l} - φ_{ref} |,

where the

φ_{ref} = π / 4

angle corresponds to the angle where an ideal symbol should be found after performing the modulo

π / 2

operation.

Finally, the variance of the phase differences $Δ φ_{k, l}$ has to be calculated

J_{CFR} (f_{k}) = \frac{1}{L_{CFR}^{2}} \sum_{i = 1}^{L_{CFR}} \sum_{l = 1}^{L_{CFR}} \frac{1}{2} {(Δ φ_{k, i} - Δ φ_{k, l})}^{2} .

To save hardware resources for an efficient implementation we replace the demanding variance operation by a simpler expression

J_{CFR} (f_{k}) = \sum_{i = 1}^{I < L_{CFR}} \sum_{l = 1}^{L_{CFR}} | Δ φ_{k, i} - Δ φ_{k, l} | .

Here, the complexity is reduced by reducing

I

and neglecting squaring and normalization.

In the third step, all results of the cost function are evaluated to determine the optimum test frequency and therefore the frequency offset. As only one test frequency is evaluated per clock cycle, the results are buffered. After calculating the result of the cost function for all $K$ test frequencies, the buffer contains results of $J_{CFR} (f_{k})$ for all test frequencies and will be continuously updated. The estimated optimal offset frequency is found by the minimum of the cost function $J_{CFR} (f_{k})$ . The shape of the cost function for 4QAM, 8QAM, and 16QAM is displayed in Fig. 4(a). Here, we neglected the laser phase noise to clearly show the unperturbed cost function. It can be observed that a larger block of samples leads to a steeper cost function. In our implementation, an exponential average on the selected CFO reduces the impact of noise and decreased precision due to approximations in the implementation. The exponential average is implemented by $i_{\min, t} = α \cdot i_{\min, t} + (1 - α) \cdot i_{\min, t - 1}$ , where $i_{\min, t}$ is the recent and $i_{\min, t - 1}$ the previous minimum index of the cost function. The variable $α$ is the weighting factor and is set to $α = 0.125$ . Figure 4(b) shows the CFR without (blue) and with exponential averaging (green) tracking the CFO of a 16QAM signal (red, drift 1 MHz/μs).

Fig. 4 Performance of the CFR for different formats. (a) Normalized J_CFR as a function of test frequencies f_k for 4QAM, 8QAM, and 16QAM and for L_CFR = 64 and L_CFR = 128. (b) The estimated carrier frequency offset (CFO) as a function over time for 16QAM. The red curve shows the actual CFO which drifts by 1 MHz/μs. The blue dots represent the estimated CFO values without exponential average and the green dots represent the estimated CFO value with exponential average.

Download Full Size | PDF

In the last step, the CFO is corrected by adding the respective phase vector of the selected test frequency to the signal. After CFO compensation, the CPR stage described in the following section corrects the remaining phase offset.

2.2 Multiplier-free phase recovery implementation

Figure 5 shows the block diagram of the multi-format carrier phase recovery (CPR). The CPR algorithm uses the same processing structure as the CFR with the main difference that all test phases have to be applied in parallel to track the quickly changing carrier phase. Similar to the CFR, we use polar instead of Cartesian coordinates in order to implement a multiplier free system. Since the received samples $r_{l}$ have already been converted to polar coordinates for the CFR block, no further CORDIC stage is needed in the CPR block. While the phase information is used for processing in all of the following steps, the amplitude is only needed for the partitioning in case of 8QAM and 16QAM.

Fig. 5 Block diagram of the proposed carrier phase recovery (CPR) algorithm consisting of four steps. (1) Parallel summation of B test phases φ_b to the received phase ∠r_l, (2) computation of the cost function J_CPR(φ_b), (3) selection of the smallest cost function value to determine the optimum test phase, and (4) correction of the received phase ∠r_l with the optimal test phase φ_b.

Download Full Size | PDF

The CPR algorithm corrects the phase offset in four steps. First, test phases are added. Second, the quality of the test phases is calculated with the cost function. Third, the optimal phase is detected by minimizing the cost function. Finally, the signal phase is corrected. We will now describe these processing blocks in more detail.

First, we add a number of $B$ test phases ( $φ_{b} = φ_{1}, \dots φ_{B}$ ) to $B$ copies of the received phase values ( $∠ r_{l}$ ) in a block of the length L_CPR. This results in B blocks of L_CPR estimated phases $ϑ_{test}$ , out of which the correct sampling phase has to be selected.

Second, we analyze the B blocks of samples with a cost function $J_{CPR} (φ_{b})$ to detect the correct phase offset. The cost function performs an averaging over the phase difference to one or multiple reference phases $φ_{ref}$ :

\begin{array}{l} J_{CPR} (φ_{b}) = \frac{1}{L_{CPR}} \sum_{l = 1}^{L_{CPR}} (| ϑ_{t est, l, b} - φ_{ref} (| r_{l} |, ϑ_{test, l, b}) |) \\ with ​ ​ ​ ​ ​ ​ ϑ_{test, l, b} = \mod (∠ r_{l} + φ_{b}, \frac{π}{2}) \end{array}

The reference phases

φ_{ref}

are selected differently depending on the modulation format. For 4QAM, no partitioning is needed and the reference phase is

φ_{ref} = π / 4

. For 8QAM, the outer ring (

r_{1}

) is shifted by a phase of

π / 4

. Thus the reference phase can also be chosen as

φ_{ref} = π / 4.

In case of 16QAM, three reference phases are needed. A reference phase of

φ_{ref} = π / 4

for the inner and the outer ring (Fig. 1(b)) and two reference phases of

φ_{ref} = atan (1 / 3), atan (3 / 1)

for the middle ring. The reference phases are selected by an amplitude and subsequent phase partitioning as illustrated in Fig. 6(b). First, a decision is made if the symbols are part of the inner (

r_{0}

), middle (

r_{1}

), or outer ring (

r_{2}

). For the middle ring a second stage phase decision with a phase threshold of

π / 4

is needed. As we use a block length of

L_{CPR} = 2^{n}

, the phase estimation can be calculated by additions and bit-shifting only. For a correct estimate, an average over a sufficiently long sequence of symbols is needed. Even though averaging using a Wiener filter would promise optimal performance [26, 27] we chose to implement a simple block wise averaging which allows for lowest hardware complexity.

Fig. 6 Performance of the CPR for different formats and the partitioning for 16QAM. (a) Normalized J_CPR as a function of the test phases φ_b for 4QAM, 8QAM and 16QAM. (b) Portioning concept for 16QAM to calculate the cost function J_CPR(φ_b). Symbols are mapped to the first quadrant and are partitioned according to their amplitude and phase (dotted red lines).

Download Full Size | PDF

In the third step, we select the correct phase out of all test phases. For the correct carrier phase, the cost function $J_{CPR} (φ_{b})$ becomes minimal as depicted in Fig. 6(a) with a block size of $L_{CPR} = 128$ and a number of test angles of $B = 99$ .

In the final step, the carrier phase is corrected by adding the selected test phase to the received phase value. As all values are available in polar coordinates, this can be realized with minimum effort.

3. Complexity and hardware utilization

We implemented the proposed multi-format carrier recovery in VHDL and synthesized the design in order to evaluate the hardware consumption on an FPGA chip. The implementation includes the CORDIC algorithm for conversion from Cartesian to polar coordinates, the CFR algorithm, and the CPR algorithm. The implementations of CFR and CPR are multi-format capable and operate for 4QAM, 8QAM, and 16QAM. The design was evaluated with the Vivado design tools. The hardware design processes 128 samples in parallel with a clock frequency of 250 MHz, which results in a symbol rate of 32 GBd. Each sample represents one symbol and has a resolution of 8 bit per in-phase and quadrature component. Each sample represents one symbol and has a resolution of 8 bit per in-phase and quadrature component. For the CFR we choose L_CFR = 128, I = 24, and K = 51. The CPR is implemented with L_CPR = 32 which results in four parallel CPR blocks in order to process 128 symbols per clock cycle. The size of L_CPR has no significant influence on the hardware complexity as long as the overall amount of processed symbols remains the same. We use a Xilinx Virtex 7 FPGA chip for our design considerations. The hardware requirements for designs with different number of test phases is presented in Table 1. As expected, the implementations utilize no DSP units.

Table 1. FPGA chip utilization (% of Xilinx xc7vx690t).

View Table

4. Simulations and performance evaluation

We studied the performance of our algorithm by numerical simulations. As impairment factors, we modeled additive white Gaussian noise (AWGN), combined laser phase noise (LPN) from the transmitter and receiver, carrier frequency offset (CFO), and a CFO drift. The laser phase noise was modeled as Wiener process as suggested in [11]. To focus on the limiting factors of the CR, other impairments like timing offset, chromatic dispersion, and polarization mode dispersion were excluded from the simulation. The simulation was performed with a PRBS sequence of length 2¹⁵ – 1, which was repeated to generate a sequence with more than 10⁶symbols. The algorithm was analyzed for 4QAM, 8QAM, and 16QAM. We considered a symbol rate of 32 GBd and we used differential encoding [26]. We studied the performance in two steps. First, we investigated the performance of the CPR and in the second step the performance of the combined CFR and CPR algorithm.

The simulation results of the CPR without CFR and neglected impairments like CFO and CFO drift are presented in Fig. 7. In each case, we show a penalty for the required SNR to achieve a BER of 10^-3. The penalty is calculated in relation to the theoretical limit. The theoretical limits assumed for differentially encoded 4QAM, 8QAM, and 16QAM are 10.35 dB, 14.08 dB, and 16.97 dB, respectively [26]. The amount of test phases was chosen to be B_4QAM = 31 for 4QAM and B_8QAM,16QAM = 51 for 8QAM and 16QAM to avoid the influence of a limited resolution of the test phases.

Fig. 7 Simulation results for the CPR processing 32 GBd signals with 4QAM, 8QAM, and 16QAM. SNR penalty at a BER of 10^-3 as a function of (a) the block size L_CPR under influence of a laser linewidth of 100 kHz and 1 MHz, (b) the laser linewidth with block sizes of L_CPR=32, 64, (c) the number of test angles B under the influence of a laser linewidth of 100 kHz, (d) the ADC word width in number of bits under the influence of a laser linewidth of 100 kHz.

Download Full Size | PDF

Figure 7(a) shows the influence of the processing block size $L_{CPR}$ of the CPR. For different LPN, different block sizes offer advantages in performance. For low LPN, long window lengths $L_{CPR}$ are beneficial as AWGN related errors average out. For larger LPN, shorter window lengths $L_{CPR}$ are required to avoid phase fluctuations within the window length $L_{CPR}$ . For smallest window lengths, the estimate of the algorithm degrades significantly, as the AWGN cannot be reduced by averaging anymore. For the following simulation, we considered a window length of 32 and 64. Figure 7(b) shows the algorithm’s performance under influence of an increasing LPN. The shorter window length $L_{CPR} = 32$ shows better tolerance to LPN, as is expected from Fig. 7(a). Considering an SNR penalty smaller than 0.5 dB, the algorithm tolerates a laser linewidth of up to 4 MHz, 3 MHz, and 800 kHz for 4QAM, 8QAM, and 16QAM, respectively. Figure 7(c) presents the SNR penalty for different numbers of test angles B. The block size was fixed to $L_{CPR} = 32$ and the combined laser linewidth was assumed to be 100 kHz. The amount of test angles has a direct influence on the complexity of the implementation since the different test angles are applied in parallel. Therefore, it is important to investigate the minimal amount of test phases required for a reasonable performance. For 4QAM, 8QAM, and 16QAM the performance degradation is below 0.25 dB for $B \geq 9$ , $B \geq 12$ and $B \geq 24$ test angles, respectively. In Fig. 7(d), we show the performance degradation under influence of a limited ADC word width. The signal is impaired by AWGN and a combined laser phase noise of 100 kHz. The block size is fixed to $L_{CPR} = 32$ . At a limited ADC word width of 6 bit, we observe an SNR penalty of 0.16 dB, 0.22 dB, and 0.36 dB for 4QAM, 8QAM, and 16QAM, respectively. For word width larger than 8 bit, only a minor performance improvement is visible for the three modulation formats.

In the next step, we added our CFR unit to the simulation model. We evaluated the performance of the CFR for varying CFOs and drifting CFOs under the influence of a fixed combined LPN of 100 kHz. The CPR has a fixed processing length of $L_{CPR} = 32$ and $B = 51$ test. We used a large number of test angles to neglect the influence from a limited amount of test angles. The CFR is implemented with a fixed number of $K = 51$ test frequencies in a range of ± 150 MHz. For larger frequency ranges, the number of test frequencies K can be increased. Since the different test frequencies are applied sequentially the amount of parallel processing steps is not increasing with an increasing K, only a larger memory for the stored test frequencies is needed. The speed of tracking a drifting CFO, however, will decrease with an increasing K since it takes a longer time to apply all the test frequencies sequentially.

Figure 8(a) depicts the results of our simulations with different CFO values and a fixed L_CFR = 128. We present the performance difference between processing without and with CFR. Without CFR stages, a CFO of 37.5 MHz, 25 MHz and 12.5 MHz. for 4QAM, 8QAM, and 16QAM is still tolerable with an SNR-penalty below 0.5 dB. Thus, the single CPR is also able to track and correct a phase development such as a CFO to a certain amount. With the CFR stage no performance dependency for different CFOs can be observed. The influence of CFO drifts to the proposed CR is presented in Fig. 8(b). The results are presented for a processing length of $L_{CFR} = 128$ and $L_{CFR} = 256$ . A larger processing length results in a larger performance penalty with increasing speeds of a drifting CFO. It is associated with the larger amount of time samples which are needed to calculate a CFO estimate. Consequently, the CFO estimates are not calculated fast enough to track the drifting CFO.

Fig. 8 Simulation results for the CFR processing 32 GBd signals with 4QAM, 8QAM, and 16QAM. (a) SNR penalty as a function of carrier frequency offsets (CFOs) with CFR and without CFR. (b) SNR penalty as a function of carrier frequency offset (CFO) drifts for processing lengths of L_CFR = 128 and L_CFR = 256.

Download Full Size | PDF

To investigate the performance of the hardware implementation (HW) and the floating point Matlab simulation (SW) in comparison, we implemented a Matlab-Modelsim co-simulation. We compared the BER performance of the SW and the HW simulation for 24 test angles, a CFO of 80 MHz, a CFO drift of −1 MHz/µs, and a laser linewidth of 300 kHz. We increased the linewidth to 300 kHz to stress test the implementation. The hardware implementation, including the CORDIC algorithm, was designed with a word width of 8 bit for the in-phase and quadrature components or amplitude and phase, respectively. The different block sizes for the CFR and the CPR are set to L_CFR = 128 and L_CPR = 32. We simulated >10⁶ symbols for each BER value and investigated the dependence on the SNR for 4QAM, 8QAM, and 16QAM. The same multi-format capable hardware design was used for all modulation formats. Figure 9(a) shows the results for HW and SW simulation in relation to the differentially coded theoretical limits of the respective modulation format. We observe minimal penalties for the required SNR for a BER of 10⁻³ for SW and HW simulations. For 4QAM and 8QAM, the penalty is below 0.2 dB, for 16QAM, the penalty is below 0.4 dB. The minor SNR penalty between SW and HW originates from the fixed point calculation in case of HW processing. The larger SNR penalty of 16QAM can be attributed to the required partitioning which is not ideal in polar coordinates. Constellation diagrams of the output of the HW simulation for all formats are shown in Fig. 9(b). In each case, the SNR that is theoretically required for a BER of 10⁻³ is used for the simulation. Zooming in, one may observe the 256 quantized phase states allowed by the 8 bit resolution.

Fig. 9 Results of software simulation and hardware realization for 4QAM, 8QAM, and 16QAM (a) BER performance as a function of SNR of the software (SW) and hardware (HW) implementation compared to the theoretical limit of differential encoding. (b) Constellation diagrams which are modulated by the carrier recovery hardware implementation

Download Full Size | PDF

5. Conclusion

We have introduced a joint multi-format frequency and phase recovery algorithm relying on processing in polar coordinates for low hardware complexity and demonstrated its operation at 32 GBd. Processing in polar coordinates is especially beneficial for the parallel application of test phases and calculation of the cost function in the BPS based CPR. The performance of our CR algorithm, working for 4QAM, 8QAM, and 16QAM, has been tested under influence of laser linewidths, carrier frequency offsets and carrier frequency offset drifts. For the hardware implementation, we investigated the influence of design parameters like processing block length, resolution of the test phases and limited word widths. The CR algorithm has been implemented in VHDL and the chip utilization of an FPGA implementation shows its feasibility. The algorithm can dynamically switch the modulation format after each clock cycle (250 MHz, 128 Symbols). We compared the BER performance of our hardware implementation with the software simulation under the influence of a 300 kHz linewidth laser, a CFO of 80 MHz and a CFO drift of −1 MHz/µs. The SNR penalty for the hardware implementation when compared to the theoretical limit is negligible (<0.2 dB for 4QAM and 8QAM, <0.4 dB for 16QAM).

Funding

We acknowledge financial support by the European Commission under FP7 program, project FOX-C (grant no. 318415) and by the Xilinx University Program (XUP).

References and links

1. O. Gerstel, M. Jinno, A. Lord, and S. J. B. Yoo, “Elastic optical networking: a new dawn for the optical layer?” IEEE Commun. Mag. 50(2), 12–20 (2012). [CrossRef]

2. A. Lau, Y. Gao, Q. Sui, D. Wang, Q. Zhuge, M. Morsy-Osman, M. Chagnon, X. Xu, C. Lu, and D. Plant, “Advanced DSP techniques enabling high spectral efficiency and flexible transmissions: toward elastic optical networks,” IEEE Signal Process. Mag. 31(2), 82–92 (2014). [CrossRef]

3. S. J. Savory, “Digital coherent optical receivers: algorithms and subsystems,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1164–1179 (2010). [CrossRef]

4. A. Leven, N. Kaneda, U.-V. Koc, and Y.-K. Chen, “Frequency estimation in intradyne reception,” IEEE Photonics Technol. Lett. 19(6), 366–368 (2007). [CrossRef]

5. M. Selmi, Y. Jaouen, and P. Ciblat, “Accurate digital frequency offset estimator for coherent PolMux QAM transmission systems,” in Proc. ECOC (2009), paper P3.08.

6. X. Zhou, J. Yu, M.-F. Huang, Y. Shao, T. Wang, L. Nelson, P. Magill, M. Birk, P. I. Borel, D. W. Peckham, R. Lingle, and B. Zhu, “64-Tb/s, 8 b/s/Hz, PDM-36QAM transmission over 320 km using both pre- and post-transmission digital signal processing,” J. Lightwave Technol. 29(4), 571–577 (2011). [CrossRef]

7. A. M. Viterbi, “Nonlinear estimation of PSK-modulated carrier phase with application to burst digital transmission,” IEEE Trans. Inf. Theory 29(4), 543–551 (1983). [CrossRef]

8. I. Fatadin, D. Ives, and S. J. Savory, “Laser linewidth tolerance for 16-QAM coherent optical systems using QPSK partitioning,” IEEE Photonics Technol. Lett. 22(9), 631–633 (2010). [CrossRef]

9. K. P. Zhong, J. H. Ke, Y. Gao, and J. C. Cartledge, “Linewidth-tolerant and low-complexity two-stage carrier phase estimation based on modified QPSK partitioning for dual-polarization 16-QAM systems,” J. Lightwave Technol. 31(1), 50–57 (2013). [CrossRef]

10. S. M. Bilal, C. R. S. Fludger, V. Curri, and G. Bosco, “Multistage carrier phase estimation algorithms for phase noise mitigation in 64-quadrature amplitude modulation optical systems,” J. Lightwave Technol. 32(17), 2973–2980 (2014). [CrossRef]

11. T. Pfau, S. Hoffmann, and R. Noe, “Hardware-efficient coherent digital receiver concept with feedforward carrier recovery for M-QAM constellations,” J. Lightwave Technol. 27(8), 989–999 (2009). [CrossRef]

12. X. Zhou, “An improved feed-forward carrier recovery algorithm for coherent receivers with M-QAM modulation format,” IEEE Photonics Technol. Lett. 22(14), 1051–1053 (2010). [CrossRef]

13. J. Li, L. Li, Z. Tao, T. Hoshida, and J. C. Rasmussen, “Laser-linewidth-tolerant feed-forward carrier phase estimator with reduced complexity for QAM,” J. Lightwave Technol. 29(16), 2358–2364 (2011). [CrossRef]

14. H. Zhou, J. Dong, S. Yan, Y. Zhou, and X. Zhang, “Low-complexity carrier phase recovery for square M-QAM based on S-BPS algorithm,” IEEE Photonics Technol. Lett. 26(18), 1 (2014). [CrossRef]

15. A. Al-Bermani, C. Wördehoff, K. Puntsri, O. Jan, U. Rückert, and R. Noé, “Real-time synchronous 16-QAM optical transmission system using blind phase search and QPSK partitioning carrier recovery techniques,” in ITG-Fachtagung Photonische Netze (Leipzig, Germany, 2012).

16. T.-H. Nguyen, M. Joindot, P. Scalart, M. Gay, L. Bramerie, O. Sentieys, J.-C. Simon, and C. Peucheret, “Carrier phase recovery for optical coherent M-QAM communication systems using harmonic decomposition-based maximum loglikelihood estimators,” in Proc. SPPCom, Advanced Photonics (2015), paper SpT4D.3.

17. T.-h. Nguyen, P. Scalart, M. Gay, L. Bramerie, C. Peucheret, O. Sentieys, J.-C. Simon, and M. Joindot, “Bi-harmonic decomposition-based maximum loglikelihood estimator for carrier phase estimation of coherent optical M-QAM,” in Proc. OFC, (2016), paper Tu3K.3. [CrossRef]

18. N. Argyris, S. Dris, C. Spatharakis, and H. Avramopoulos, “High performance carrier phase recovery for coherent optical QAM,” in Proc. OFC, (2015), paper W1E.1. [CrossRef]

19. A. Tolmachev, I. Tselniker, M. Meltsin, I. Sigron, D. Dahan, A. Shalom, and M. Nazarathy, “Multiplier-free phase recovery with polar-domain multisymbol-delay-detector,” J. Lightwave Technol. 31(23), 3638–3650 (2013). [CrossRef]

20. I. Tselniker, N. Sigron, and M. Nazarathy, “Joint phase noise and frequency offset estimation and mitigation for optically coherent QAM based on adaptive multi-symbol delay detection (MSDD),” Opt. Express 20(10), 10944–10962 (2012). [CrossRef] [PubMed]

21. A. Leven, N. Kaneda, and S. Corteselli, “Real-time implementation of digital signal processing for coherent optical digital communication systems,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1227–1234 (2010). [CrossRef]

22. B. Baeuerle, A. Josten, F. C. Abrecht, E. Dornbierer, J. Boesser, M. Dreschmann, J. Becker, J. Leuthold, and D. Hillerkuss, “Multiplier-free, carrier-phase recovery for real-time receivers using processing in polar coordinates,” in Proc. OFC, (2015), paper W1E.2. [CrossRef]

23. B. Baeuerle, A. Josten, F. Abrecht, E. Dornbierer, D. Hillerkuss, and J. Leuthold, “Blind real-time multi-format carrier recovery for flexible optical networks,” in Proc. SPPCom, Advanced Photonics, (2015), paper SpT4D.5.

24. J. E. Volder, “The CORDIC trigonometric computing technique,” IRE Trans. Electron. Comput. EC-8(3), 330–334 (1959). [CrossRef]

25. S.-H. Fan, J. Yu, D. Qian, and G.-K. Chang, “A fast and efficient frequency offset correction technique for coherent optical orthogonal frequency division multiplexing,” J. Lightwave Technol. 29(13), 1997–2004 (2011). [CrossRef]

26. E. Ip and J. M. Kahn, “Feedforward carrier recovery for coherent optical communications,” J. Lightwave Technol. 25(9), 2675–2692 (2007). [CrossRef]

27. L. M. Pessoa, H. M. Salgado, and I. Darwazeh, “Performance evaluation of phase estimation algorithms in equalized coherent optical systems,” IEEE Photonics Technol. Lett. 21(17), 1181–1183 (2009). [CrossRef]

Test Phases	Slice Register	Slice LUT
16	175911 (40.61%)	297428 (34.33%)
20	202336 (46.71%)	325634 (37.58%)
24	221647 (51.17%)	349513 (40.34%)
28	244631 (56.47%)	382763 (44.18%)
32	255650 (59.01%)	398574 (46.00%)

Multi-format carrier recovery for coherent real-time reception with processing in polar coordinates

Abstract

1. Introduction

2. Operation principle and implementation

2.1 Multi-format frequency recovery implementation

2.2 Multiplier-free phase recovery implementation

3. Complexity and hardware utilization

4. Simulations and performance evaluation

5. Conclusion

Funding

References and links

Cited By

Figures (9)

Tables (1)

Equations (5)

Optics Express