## Abstract

A low-complexity feed-forward carrier phase estimation (CPE) technique is presented for dual-polarization (DP)-16-QAM transmission systems. By combining QPSK partitioning, maximum likelihood (ML) detection and phase offset estimation between signals in different polarizations, simulation and experimental results for a 200Gb/s DP-16-QAM system demonstrate similar linewidth tolerance to the best feed-forward CPE reported to date while the computational complexity is at least three times lower compared with other simplified feed-forward CPE techniques.

© 2011 OSA

## 1. Introduction

The increasing demand for data traffic has continued to motivate research on more spectrally efficient optical transmission systems to better utilize the valuable bandwidth resources of the optical fiber [1, 2]. Dual polarization (DP) quadrature phase shift keying (QPSK) operating at 100 Gb/s with receiver digital signal processing (DSP) are now commercially available [3–5]. In addition, 16-ary quadrature amplitude modulation (16-QAM) with its higher spectral efficiency (SE) of 4 bit/s/Hz has become the natural choice and thus promising candidate for next generation optical transmission system beyond 100Gb/s per channel [6–8].

Carrier phase estimation (CPE) is an integral part of DSP-based receiver through which laser phase noise is compensated. For DSP-based receivers, blind and feed-forward CPE are more desirable due to their algorithmic and implementation simplicity [9]. The tolerance of laser phase noise and hence performance of CPE generally degrades for systems using high spectral efficient modulation formats and/or lasers with large linewidths. Consequently, the vast majority of 16-QAM transmission experiments demonstrated in recent years used external cavity lasers (ECL) instead of the more cost effective distributed-feedback (DFB) lasers because of their narrow linewidths [7]. Consequently, linewidth-tolerant and low-complexity CPE is critical for practical realization of 16-QAM transmission systems in future optical communication systems. To this end, various feed-forward CPE algorithms for 16-QAM systems proposed to date stems from two fundamental approaches: 1) QPSK partitioning schemes [10, 11] which were derived from classical Viterbi and Viterbi phase estimation (VVPE) approach for QPSK signals [12]. However, QPSK partitioning for 16-QAM systems introduces a more stringent linewidth requirement compared to VVPE for QPSK systems; 2) blind-phase-search (BPS, also called the minimum distance method) [13] that was originally introduced for more general synchronous communication systems [14, 15]. BPS demonstrates higher linewidth tolerance but comes with an expense of high computational complexity. Such complexity can be somewhat lowered by reducing the number of ‘trial phases’ [16–20]. In these papers, two-stage strategies have been reported where BPS is used in only one of the two stages as fine [16] or coarse [17] carrier phase estimator or both stages [18, 19]. However, the computational complexity of such modified BPS is not reduced significantly [20].

In this paper, we extend our previous work [21] and propose a low-complexity and phase-noise tolerant feed-forward CPE for DP-16-QAM systems by using QPSK partitioning with maximum likelihood (ML) detection. In addition, as signals from both polarizations are impaired by identical laser phase noise (up to a constant phase offset due to path differences travelled by signals in different polarizations) in a canonical DP system [22, 23], phase information from both polarizations are jointly processed for better carrier phase estimation accuracy and hence improved overall transmission performance. Simulation results for a 200 Gb/s DP-16-QAM system demonstrates similar linewidth tolerance and a computational complexity reduction by a factor of at least three compared with other feed-forward CPE techniques with large linewidth tolerance reported in the literature. The performance of the proposed and other feed-forward CPE techniques are also experimentally verified and compared. Such comparisons also serve as a good experimental assessment of various feed-forward CPE for practical 16-QAM implementation of transmission systems.

## 2. Algorithm principle

Consider a DP-16-QAM system where the received signal is sampled and processed in a DSP. After chromatic dispersion (CD) (and possibly nonlinearity compensation), timing recovery, polarization demultiplexing, re-sampling to one sample per symbol and frequency offset compensation, the *n*^{th} received symbol of the *x*-polariztion (*y*-polarization) can be expressed as

*θ*

_{n}_{,}

_{x}_{(}

_{y}_{)}is the combined phase noise of the transmitter laser and local oscillator (LO) at the receiver and

*z*

_{n,x}_{(}

_{y}_{)}models the collective amplified spontaneous emission (ASE) noise generated from inline amplifiers which are complex circularly symmetric Gaussian random processes. Laser phase noise is typically modeled as a Wiener process in which the phase difference between two adjacent symbols ${\theta}_{n+1,x(y)}-{\theta}_{n,x(y)}$ can be modeled as zero-mean Gaussian random variable with variance ${\sigma}^{2}=2\pi \Delta v\cdot {T}_{s}$ where

*T*is the symbol period, $\Delta v$ is the combined linewidths of the transmitter laser and LO.

_{s}The proposed CPE is a multi-stage algorithm consisting of QPSK partitioning, phase offset (between signals in different polarization) compensation followed by an ML detection. The block diagram of the CPE is shown in Fig. 2 which is described below in more detail.

#### A. QPSK partitioning for 16-QAM signals

For a 16-QAM constellation, only the part of the symbols whose modulation phases can be eliminated by raising them to the 4th power are suitable for the commonly used Viterbi and Viterbi phase estimation (VVPE) for QPSK systems. The identification of such symbols for CPE is known as QPSK partitioning and is illustrated in Fig. 1
[11]. The symbols are classified into three rings (Class I (*C _{1}*), Class II (

*C*) and Class III (

_{2}*C*)) according to their amplitudes. In the inner and outer rings, the symbols belonging to

_{3}*C*for

_{1}*C*can be viewed as two QPSK constellation sets and their modulated phase can be eliminated by VVPE and carrier phase can be estimated [12].

_{3}To recover the phase of the *n*^{th} symbol *s _{n,x(y),}* a vector of

*2N+1*symbols

*s*are first normalized and partitioned and

_{n-N,x(y)…,}s_{n + N,x(y)}*N*is referred to as the filter half width for the rest of the paper. If the symbols belong to Class I symbols (

*C*) or Class III symbols (

_{1}*C*), they are first selected to be processed by modified VVPE [10]. Since the outcomes of the two VVPEs from both polarizations should suffer from identical phase noise up to a constant phase offset [23], this phase offset can be easily estimated and compensated without much increase in computational complexity. The VVPE results from both polarizations are then summed up to reduce the effect of ASE noise. The first stage carrier phase estimate ${\theta}_{n}^{est1}$is then given by

_{3}*N*is the filter half width of a sliding summing window. It should be noticed that our QPSK partitioning scheme does not require phase rotations on the Class II symbols, as suggested in [11], and those symbols will be processed in subsequent ML estimators instead.

#### B. Phase-offset estimation

The phase offsets between signals in both polarizations results in catastrophic error when signals on both polarizations are summed up to estimate the phase noise [23]. Fortunately, this phase-offset can be obtained by simply observing the difference between *x* and *y* polarizations of the VVPE outputs. However, ASE noise and incorrectly partitioned symbols can severely worsen the phase offset estimates. One way of eliminating these two impairments is by using the following recursive equation [23]

*x*and

*y*polarizations at the same time slot to either belong to

*C*or

_{1}*C*, which means only 1/4 of the symbols on average can be used to estimate${\theta}_{offset}$. As pointed out in [23], the phase offset is a constant or varying slowly over a time interval much longer than the time (or number of symbols) it takes for the estimation to converge. Therefore, it is unnecessary to continuously estimate the phase offsets.

_{3}Simulations are conducted and the residue phase offset and corresponding BER versus convergence time are shown in Fig. 3
. The linewidth times symbol duration product ($\Delta v\cdot {T}_{s}$) and OSNR penalty was set to be 2e-4 and 1dB, respectively, which are the same value as those required to achieve BER of 1e-3 when there is no phase–offset. Fifty independent trials with 2^{18} symbols on each polarization are performed. Both the residue phase offset and BER are well converged even only after 500 symbols. Thus, it is safe to set the convergence length at 1000 symbols. With this convergence length, the slow time varying feature of phase-offset is experimentally investigated as shown in Fig. 4
, where we use three phase-offset estimation methods: 1. continuously updating ${\theta}_{offset}$every symbol; 2. periodically updating ${\theta}_{offset}$; 3. only updating ${\theta}_{offset}$ at the beginning. The experimental setup is shown in Fig. 7. In Fig. 4 OSNR was set to 37.2 dB and $\Delta v\cdot {T}_{s}$equals to 2e-4, which is the largest tested linewidth in our setup. For the periodic updating method 2, ${\theta}_{offset}$ is only estimated using the first 1000-symbol time slot in every 10000 symbol block. When $\Delta v\cdot {T}_{s}$ is less than 2e-4, the residue phase-offset is within $\pm $1.5e^{−2} rad for all the three methods, as shown in the enlarged area in Fig. 4. Also, the maximum BER difference is so small (2e-5) that the three methods achieved the same OSNR penalty. Thus, we choose the one-time method for the rest of the paper since it has the lowest complexity.

#### C. ML detection

After all the received symbols are compensated by the estimated phase ${\theta}_{n}^{est1}$ and ${\theta}_{offset}$, they are fed into the second stage ML phase estimator. The 2^{nd} stage of the proposed CPE is an ML estimator shown as in Fig. 2(b)
. The ML estimation of the carrier phase ${\theta}_{n}^{est2}$ is given by

*u*

_{i}_{,}

*(*

_{p}*p = x, y*)and

*M*is the filter half width used in this 2

^{nd}stage ML estimator. The overall estimated phase noise ${\theta}_{n}^{est}$ is then given by

## 3. Simulation results and discussions

Simulations are conducted to study and compare the performance of the proposed CPE with others reported in the literature. In particular, 2^{18} 16-QAM symbol sequences on each polarization were used to obtain the bit error ratio (BER). The two most significant bits (MSB) of each symbol are differentially encoded to avoid cycle slips [13]. The laser phase noise is modeled as a Wiener process, and different amount of ASE noise is loaded to realize different OSNRs.

It turns out that for the 1^{st} stage estimator there exists an optimal filter half width that minimizes the BER. The optimal width is determined by a trade-off between additive ASE noise and laser phase noise: A larger filter width is preferred to average out the additive noise while the de-correlation of laser phase noise over different symbols favors a short filter width. To determine the optimal filter width, we performed extensive Monte Carlo simulations for different combination of laser linewidths and OSNR and the results are shown in Fig. 5
. The linewidth-symbol duration product $\Delta v\cdot {T}_{s}$ ranges from 1e-6 to 5e-4 which covered the typical range of currently used lasers in long haul transmission systems. Here, $\Delta v$ denotes the combined linewidths of the transmitter and receiver lasers and *T _{s}* denotes the symbol period.

Figure 5 suggests that the optimum width decreases when linewidths and/or SNR get larger and vice versa, in agreement with theoretical predictions. For $\Delta v\cdot {T}_{s}$ as large as 5e-4, the optimal filter half width *N* is found to be around six and vary slightly with SNR. On the other hand, the optimal *N* becomes larger and more sensitive to ASE noise when they are dominant, e.g. the optimal *N* ranges from 46 to 91 when $\Delta v\cdot {T}_{s}$ = 1e-6. For BER = 1E-3 with 1dB penalty (compared to a system using perfect laser with zero linewidth and gray encoding), the filter half width is found to be *N* = 12 when $\Delta v\cdot {T}_{s}$ is as large as 2E-4. Similarly, we can optimize the second-stage filter half width *M* using the same approach. However, since a considerate amount of the phase noise has already been compensated in the first stage estimator, the optimal half width *M* of the 2^{nd} stage estimator is found to be quite insensitive to $\Delta v\cdot {T}_{s}$ and/or OSNR. Consequently, we set the optimal *N* and *M* to be 12 and 3 for all the simulation and experimental results for the rest of the paper.

In Fig. 6 , various CPEs of single/dual polarization, sliding/block averaging and gray/differential encoded approaches are simulated. Unless specifically stated, the performances are for single polarization, sliding averaging and differentially encoded approaches. For fair comparisons, we mainly focus on sliding and differentially encoded techniques. As shown in Fig. 6, our proposed algorithm can tolerate $\Delta v\cdot {T}_{s}=2E-4$with OSNR penalty of 1dB.

## 4. Experimental results

The experimental setup for the investigation of the proposed CPE for a 200 Gb/s DP-16-QAM is shown in Fig. 7
. External cavity lasers (ECL) or distributed feedback (DFB) lasers with different linewidths ranging from 150kHz to 2.81MHz are used to investigate the linewidth-tolerance of various CPE algorithms. The linewidths of the lasers are measured using self-heterodyne spectrum measurement technique [24]. The laser source is split and used as local oscillator as well for self-homodyne detection. Here, a 12.5 Gb/s binary pseudo-random bit sequence (PRBS) of length 2^{15}-1 is obtained by driving an Anristu MP1763B pulse pattern generator (PPG) with one RF synthesizer operating at 12.5 GHz. The signals are then split by a 3dB electrical splitter, one delay line and a 2:1 Anritsu MU182020A-013 25Gbit/s Multiplexer (Mux) to generate two 25G two-level PRBS signals *D* and $\overline{D}$, which are further attenuated, relatively delayed and combined to generate two independent four-level signals to drive an integrated LiNbO_{3} Mach-Zehnder (I/Q) modulator.

An EDFA and a frequency-variable band-pass filter with 1nm bandwidth followed by a second amplifier and variable attenuator were used to generate variable amount of ASE noise to realize different OSNR values. An OSNR monitoring device comprising of one PC, one polarizer and an OSA is used to monitor the OSNR. With the appropriate amount of ASE noise, the single polarization 16-QAM signal is fed into a polarization multiplexer consisting of a PBS and PBC and with a path length difference of 116.83ps between polarizations, corresponding to 2.92 symbol period delays. It should be noted that to resemble real transmission system, the path length difference between signals in two polarizations are chosen to be relatively short so that one can obtain statistically independent information symbols but the laser phase noise are correlated across different polarizations. The polarization-multiplexed signals are then transmitted through 5km of SMF to de-correlate the laser phase between the transmitter and local oscillator in our self-homodyne detection scheme.

At the receiver, the dual-polarization signals are sampled by a 50G Sample/s real-time sampling scope at two samples per symbol and then processed offline by DSP. The block diagram for the DSP algorithms is shown in Fig. 8
. The samples are first processed with orthogonalization [25] for quadrature imbalance compensation and four fractionally-spaced (*T _{s}*/2) 13–taps time domain finite impulse response (FIR) adaptive filters for timing phase recovery, polarization de-multiplexing, differential group delay (DGD) mitigation and down-sampled to one sample per symbol. The FIR taps are updated using the standard constant modulus algorithm (CMA), which is simple, robust, and works independent of carrier phase. Since we used a self-homodyne scheme, frequency offset compensation can be omitted. The signals are then passed into various CPE techniques followed by symbol detection and BER calculation.

We experimentally compare the performance of five feed-forward CPEs including QPSK partitioning [11], single polarization BPS [13], single polarization BPS+ML [17], our proposed single polarization QPSK partitioning+ML and dual polarization QPSK partitioning+ML. For comparison, we utilized one ECL laser with linewidth of 150 kHz and five cost-effective DFB lasers with linewidths measured to be 0.45MHz, 1MHz, 1.5MHz, 2MHz and 2.81MHz [24].

The OSNR penalties at BER of 1E-3 for different lasers with different linewidths are shown in Fig. 9 . The OSNR penalties are obtained by varying the amount of ASE noise and recording the OSNR from the OSNR monitoring module when the BER equals to 1E-3. The penalty difference between BPS and dual pol. QPSK partitioning + ML increases from 0.01dB to 1.21 dB when $\Delta v\cdot {T}_{s}$ increases from 1.2e-5 to 1.7e-4. When $\Delta v\cdot {T}_{s}$ is increased to 2.25e-4 (corresponding to a combined laser linewidth of 5.63 MHz), a BER of 1E-3 can only be achieved using dual-polarization QPSK partitioning + ML and dual-polarization BPS. No other CPEs can achieve a BER of 1E-3 even at the highest OSNR (37.2dB) allowed in our setup.

Figure 10 shows the received signal distributions using various CPE techniques when the combined laser linewidth is 5.63 MHz and the OSNR is 37.2 dB. For QPSK partitioning, BPS, BPS + ML and the single polarization QPSK partitioning + ML, the received signal distributions have more residue phase noise as the outermost four distributions are more ellipse-like. On the contrary, the proposed dual polarization QPSK partitioning + ML and the dual-polarization BPS demonstrate better performance and result in a BER of less than 1E-3.

## 5. Computational complexity

The required hardware complexity of dual/single polarization BPS, BPS/ML, two stage BPS, QPSK partitioning and the proposed method is compared. Except for the QPSK partitioning method (originally proposed as block-based algorithm), all the methods are implemented using interleaving parallel and sliding averaging scheme for ease of comparison. Block-based complexity of our proposed technique can be found in the appendix. Here, *P* represents the number of parallelization paths, *B* is the number of trial phases for the BPS-like methods [26, 27]. In addition, although the second-stage half filter width *M* is always smaller than *N* in the proposed CPE, we still consider the worst case scenario and set 2*M+*1=2*N+*1*=L, L* denotes the smoothing filter length. For fair comparisons, we assume the same number of parallelization paths for all the CPE techniques considered.

Table 1
shows the complexity comparisons of the various feed-forward algorithms studied in this paper. When discussing the complexity in comparison with the BPS and BPS + ML schemes, we assume *B*=32 and 11 test phases respectively (suggested by the original publications [13] and [17]), for 1dB sensitivity penalty at BER = 1e-3. For the two-stage BPS scheme, the numbers of trial phases (*B*) are assumed to be 8 and 4 for the first and second estimation stages following [20]. From Table 1, it can be seen that for various types of operations, the proposed dual polarization QPSK partitioning + ML algorithm requires less than one third of the real multipliers used in two-stage BPS [20] and BPS + ML and nearly one ninth of those used in dual polarization BPS and single polarization BPS. For other operations, the proposed technique requires much less computations than other CPEs. Although the QPSK partitioning method [11] is slightly simpler in terms of the number of real adders, slicers and LUTs, it is evident from previous section that the performance of QPSK partitioning alone is noticeably worse than other feed-forward CPE techniques studied here.

## 6. Conclusions

In this paper, we proposed a low-complexity and phase noise tolerant feed-forward carrier phase estimation technique for DP-16-QAM systems using QPSK partitioning, estimation of phase offsets between signals in different polarizations, and ML detection. Simulation and experimental results showed that the proposed CPE can tolerate a linewidth times symbol duration product comparable with the best feed-forward CPE techniques while the computational complexity is at least three times lower than the simplest feed-forward CPE reported in the literature. High performance and simple techniques will favor real-time implementation of advanced feed-forward carrier phase estimation techniques in future systems using high spectral efficiency modulation formats.

## APPENDIX

#### A. Computational complexity in interleaving parallelization structure using slide averaging

The required processing complexity for the proposed DP-QPSK partitioning + ML scheme can be derived from Fig. 2 and separately calculated for each functional block. Here, we can separate the proposed algorithm mainly into 4 parts: partitions, VVPEs, first-stage compensation and ML estimators. As discussed in the paper the phase-offset estimation is obtained from the first 1000 symbols, or approximately 0.25% of the symbols in our experiment. Thus, its computational complexity is omitted in our calculation. Computation complexity can be evaluated by counting the required operations to process 2*P* paralleled symbols from both polarizations and are detailed as follows:

- 1. To achieve the partition for 2
*P*symbols, it requires 2*P*amplitude calculations and 2*P*amplitude comparisons with ring boundaries as shown in Fig. 1: - 2. In the two VVPEs, the
*P C*or_{1}*C*symbols (on average) from both polarizations need to be raised to their fourth power and be normalized._{3}- (1) Each of the
*P*fourth power operations is composed of two cascaded square operations, each requiring 4*P*real multipliers and 2*P*real adders; - (2) The
*P*normalization operations require*P*absolute value calculations (realized by 2*P*real multipliers,*P*real adders and*P*square-root operations using look-up tables), then divided with themselves using 2*P*real multipliers.

- 3. In the first stage carrier phase compensation, the
*P*outcomes from VVPE in the*y*polarization need to be rotated by the phase-offset ${e}^{j{\theta}_{offset}}$. Afterwards, 2*P*symbols in the*x*and*y*polarizations are summed up.*P*${\theta}_{n}^{est1}$ estimates are calculated, unwrapped and utilized to compensate the 2*P*symbols in the*x*and*y*polarization:- (1) Phase-offset rotation for the
*P*outcomes from VVPE in*y*the polarization: 4*P*real multiplier and 2*P*real adders. - (2) Summation after phase-offset rotation: (2
*L*-1)*P*real adders; - (3)
*P${\theta}_{n}^{est1}$*calculations:*P*‘arg(.)/4’ operations realized by*P*look-up tables and*P*real multipliers; - (4) Unwrapping:
*P*comparators and*P*real adders; - (5) First stage carrier phase compensation: 8
*P*real multipliers and 4*P*real adders.

- 4. In the ML estimator, 2
*P*outcomes from the first stage estimation are multiplied with the conjugate of their decisions and summed up to calculate the second stage phase noise estimates. After the 2*P*symbols are compensated, final decisions will be made:- (1) First stage decision: 2
*P*slicers; - (2) Multiply with first stage decision: 8
*P*real multipliers and 4*P*real adders; - (3) Second stage results summation: (2
*L*-1)*P*real adders; - (4) ${\theta}_{n}^{est2}$calculation:
*P*arg(.) realized by*P*look-up tables; - (5) Second stage carrier phase compensation: 8
*P*real multipliers and 4*P*real adders; - (6) Final decision: 2
*P*slicers.

Finally, the overall computational complexity of the proposed DP-QPSK partitioning+ML CPE is 43*P* real multipliers, 4*LP*+20*P* real adders, 4*P* slicers and 3*P* comparators and 3*P* LUTs.

#### B. Complexity of the proposed CPE in interleaving structure using block averaging

The calculation of complexity for block averaging is almost the same except for the summing process and phase noise calculations since block averaging only compute one estimated phase noise for each block. For a block of 2*P* symbols, calculating ${\theta}_{n}^{est1}$or ${\theta}_{n}^{est2}$ only require (2*P*-1) adders and 1 arg(.), realized by look-up table. Unwrapping is also reduced to 1 comparator and 1 real adder.

The computational complexity of block-based dual polarization QPSK partitioning + ML CPE is 40*P* + 1 real multipliers, 23*P*-1 real adders, 4*P* slicers, 2*P* + 1comparators and *P* + 2 LUTs.

## Acknowledgments

The authors would like to acknowledge the support of the Hong Kong Government General Research Fund (GRF) under project number PolyU 522009 and Hong Kong Polytechnic University project 4-ZZ7U.

## References and links

**1. **E. Ip and J. M. Kahn, “Feedforward carrier recovery for coherent optical communications,” J. Lightwave Technol. **25**(9), 2675–2692 (2007). [CrossRef]

**2. **A. H. Gnauck, R. W. Tkach, A. R. Chraplyvy, and T. Li, “High-capacity optical transmission systems,” J. Lightwave Technol. **26**(9), 1032–1045 (2008). [CrossRef]

**3. **E. Ip, A. P. T. Lau, D. J. Barros, and J. M. Kahn, “Coherent detection in optical fiber systems,” Opt. Express **16**(2), 753–791 (2008). [CrossRef] [PubMed]

**4. **G. Charlet, J. Renaudier, H. Mardoyan, P. Tran, O. Pardo, F. Verluise, M. Achouche, A. Boutin, F. Blache, J. Dupuy, and S. Bigo, “Transmission of 16.4 Tbit/s capacity over 2550 km using PDM QPSK modulation format and coherent receiver,” in *Proceeding OFC/NFOEC*, San Diego, CA, 2008, PDP3.

**5. **P. J. Winzer, “Beyond 100G Ethernet,” IEEE Commun. Mag. **48**(7), 26–30 (2010). [CrossRef]

**6. **P. Andrekson, “Metrology of Complex Optical Modulation Formats,” in *Proceeding OFC/NFOEC*, Los Angeles, CA, 2011, OWN1.

**7. **P. J. Winzer, A. H. Gnauck, C. R. Doerr, M. Magarini, and L. L. Buhl, “Spectrally efficient long-haul optical networking using 112-Gb/s polarization-multiplexed 16-QAM,” J. Lightwave Technol. **28**(4), 547–556 (2010). [CrossRef]

**8. **C. Yu, S. Zhang, P. Y. Kam, and J. Chen, “Bit-error rate performance of coherent optical *M*-ary PSK/QAM using decision-aided maximum likelihood phase estimation,” Opt. Express **18**(12), 12088–12103 (2010). [CrossRef] [PubMed]

**9. **M. G. Taylor, “Phase estimation methods for optical coherent detection using digital signal processing,” J. Lightwave Technol. **27**(7), 901–914 (2009). [CrossRef]

**10. **M. Seimetz, Laser linewidth limitations for optical systems with high-order modulation employing feed forward digital carrier phase estimation,” in Proceedings OFC/NFOEC, San Diego, CA, 2008, OTuM2.

**11. **I. Fatadin, D. Ives, and S. J. Savory, “Laser linewidth tolerance for 16-QAM coherent optical systems using QPSK partitioning,” IEEE Photon. Technol. Lett. **22**(9), 631–633 (2010). [CrossRef]

**12. **A. J. Viterbi and A. N. Viterbi, “Nonlinear estimation of PSK-modulated carrier phase with application to burst digital transmission,” IEEE Trans. Inf. Theory **29**(4), 543–551 (1983). [CrossRef]

**13. **T. Pfau, S. Hoffmann, and R. Noe, “Hardware-efficient coherent digital receiver concept with feed forward carrier recovery for *M*-QAM constellations,” J. Lightwave Technol. **27**(8), 989–999 (2009). [CrossRef]

**14. **S. K. Oh and S. P. Stapleton, “Blind phase recovery using finite alphabet properties in digital communications,” Electron. Lett. **33**(3), 175–176 (1997). [CrossRef]

**15. **F. Rice, B. Cowley, B. Moran, and M. Rice, “Cramer-Rao lower bounds for QAM phase and frequency estimation,” IEEE Trans. Commun. **49**(9), 1582–1591 (2001). [CrossRef]

**16. **T. Pfau and R. Noe, “Phase-noise-tolerant two-stage carrier recovery concept for higher order QAM formats,” IEEE J. Sel. Top. Quantum Electron. **16**(5), 1210–1216 (2010). [CrossRef]

**17. **X. Zhou, “An improved feed-forward carrier recovery algorithm for coherent receivers with M-QAM modulation format,” IEEE Photon. Technol. Lett. **22**(14), 1051–1053 (2010). [CrossRef]

**18. **X. Li, Y. Cao, S. Yu, W. Gu, and Y. Ji, “A simplified feed-forward carrier recovery algorithm for coherent optical QAM system,” J. Lightwave Technol. **29**(5), 801–807 (2011). [CrossRef]

**19. **Q. Zhuge, C. Chen, and D. V. Plant, “Low computation complexity two-stage feed forward carrier recovery algorithm for M-QAM,” in *Proceedings OFC/NFOEC*, Los Angeles, CA, 2011, Paper OMJ5.

**20. **J. Li, L. Li, Z. Tao, T. Hoshida, and J. C. Rasmussen, “Laser-linewidth-tolerant feed-forward carrier phase estimator with reduced complexity for QAM,” J. Lightwave Technol. **29**(16), 2358–2364 (2011).

**21. **Y. Gao, A. P. T. Lau, C. Lu, Y. Li, J. Wu, K. Xu, W. Li, and J. Lin, “Low-complexity two-stage carrier phase estimation for 16-QAM systems using QPSK partitioning and maximum likelihood detection,” in *Proceedings OFC/NFOEC*, Los Angeles, CA, 2011, Paper OMJ6.

**22. **A. H. Gnauck, G. Charlet, P. Tran, P. J. Winzer, C. R. Doerr, J. C. Centanni, E. C. Burrows, T. Kawanishi, T. Sakamoto, and K. Higuma, “25.6-Tb/s WDM transmission of polarization-multiplexed RZ-DQPSK signals,” J. Lightwave Technol. **26**(1), 79–84 (2008). [CrossRef]

**23. **R. R. Muller and D. A. D. A. Mello, “Phase-offset estimation for joint-polarization phase-recovery in DP-16-QAM systems,” IEEE Photon. Technol. Lett. **22**(20), 1515–1517 (2010). [CrossRef]

**24. **T. Okoshi, K. Kikuchi, and A. Nakayama, “Novel method for high resolution measurement of laser output spectrum,” Electron. Lett. **16**(16), 630 (1980). [CrossRef]

**25. **I. Fatadin, S. J. Savory, and D. Ives, “Compensation of quadrature imbalance in an optical QPSK coherent receiver,” IEEE Photon. Technol. Lett. **20**(20), 1733–1735 (2008). [CrossRef]

**26. **X. Zhou and Y. Sun, “Low-complexity, blind phase recovery for coherent receivers using QAM modulation,” in *Proceedings OFC/NFOEC*, Los Angeles, CA, 2011, Paper OMJ3.

**27. **K. Piyawanno, M. Kuschnerov, B. Spinnler, and B. Lankl, “Low complexity carrier recovery for coherent QAM using superscalar parallelization,” in *Proceeding ECOC* 2010, Torino, Italy, Paper We.7.A.3.