The performances of various blind timing phase estimators (TPE) for digital coherent receiver are analyzed. The equivalence among four TPE algorithms is analytically presented, showing that two TPE algorithms applying squaring pre-filters are in fact identical. Three TPE algorithms applicable to Nyquist signals are proposed based on the equivalence analysis. In addition, the impact of receiver bandwidths, spectrum weighting bandwidths and signal timing phases on TPE performance are investigated. The definition of sampling diversity and the analysis of sampling diversity gain for four pulse shapes are presented. The effect of sampling diversity is observed and verified via both simulations and experiments.
© 2014 Optical Society of America
Digital coherent receiver is a crucial part of an optical coherent transmission system, enabling techniques such as high-order modulation format and dual polarization multiplexing [1, 2]. In such receivers, fast and accurate acquisition of the timing phase is a critical issue for both long-haul transmission and short range access networks [3–6]. Specifically, after chromatic dispersion (CD) compensation, the timing phase error of the sampled signal should be correctly estimated and compensated prior to the subsequent DSP algorithms, in order to produce samples with optimal signal-to-noise ratio (SNR) and phase representation. The most common method to achieve this involves using a timing phase estimator (TPE) in conjunction with a digital interpolator. Various algorithms of TPE have been proposed for classical wireless communications and subsequently adopted in optical coherent systems, such as the well-known “square-and-filter” proposed by Oerder & Meyr  (also known as the square law of nonlinearity (SLN)), and those proposed by Gardner , Godard  and Lee . The “O&M” requires at least three samples per symbol (SPS) in principle (usually takes four (4SPS SLN) for practical reasons) while the others require two. Constrained by the sampling speed of analog-digital convertors (ADC), however, the signal is commonly digitalized with a two-fold oversampling rate. Consequently, the signal samples have to be converted up from 2SPS to 4SPS via digital interpolation before entering the SLN TPE. The SLN and Lee TPE can be applied with a feedforward structure as shown in Fig. 1(a), whereas Gardner’s and Godard’s TPEs are commonly running within a phase lock loop (PLL) as shown in Fig. 1(b). Note that Godard’s TPE can also be realized as in Fig. 1(a) if the timing phase estimation is performed alternatively to its original mode , as adopted in this paper. On the other hand, to implement a real-time processor, parallelization has to be introduced as the symbol’s rate of signal is normally much higher than the clocking speed of silicon devices. Therefore, the feedforward estimators are intuitively appealing as they process data block-by-block and incur no feedback delay. Note that, throughout the paper, the all-digital implementation of timing phase recovery is assumed as shown in Fig. 1, meaning that the estimated timing phases are not used to adjust the ADC but to drive a digital interpolator. Consequently, the ADC will run with a nominal sampling rate continuously without locking to the optimal sampling phase, and the signal is recovered by digital interpolation based on the original signal and estimated timing phases.
The aforementioned estimators are considered as the basic algorithms and are well studied in the literature. Additionally, a variety of modified algorithms and techniques have been reported to improve the TPE performance under certain conditions. For instance, pre-filtering is introduced to SLN and Lee TPE to improve their performance at medium and high SNR . An alternative algorithm is reported to remove the asymptotic bias of Lee TPE . In addition to SLN, absolute value nonlinearity (AVN) , fourth law nonlinearity (FLN) , and logarithm nonlinearity (LOGN)  are also utilized to estimate the timing phase. Moreover, Sun et al contributed to improving the TPE tolerance to the residual CD and PMD by compensating CD and polarization rotation prior to the Godard’s TPE .
With so many estimators, it is imperative to clarify their pros and cons and their applicability to different scenarios. In this paper, we limit ourselves to the class of 2SPS TPEs including Godard’s, Gardner’s, and Lee’s TPEs as well as the interpolation-based SLN. We investigate and compare their performance via computer simulations and experiments. By inspecting the frequency domain (FD) expression, we analytically show that the seemingly different algorithms are all equivalent with an explicable difference, thus leaving us greater confidence to focus more on the implementation issues. By applying this result, we propose three additional TPE algorithms applicable to the Nyquist signals. In addition, the effect of “sampling diversity” is demonstrated and the underlying principle is presented, i.e., performance gain can be obtained by over-sampling for signals with “flat” pulse shapes.
The rest of the paper is organized as follows: in Section 2, the equivalence and difference of various TPE is analytically presented and their computational complexities are compared. In Section 3, computer simulations are carried out to compare the timing jitter performance of these algorithms. The results are discussed and the oversampling diversity is investigated. Section 4 presents experimental results and the paper is concluded in Section 5.
2. Principle of different TPE and their equivalence
The expressions of the TPE algorithms considered in this paper are listed in the left column of Table 1. The parameter definitions are as follows: the data length is N during the observation window, xn denotes the nth sample entering the estimator, Xk is the kth element of the discrete Fourier transform (DFT) of the data block, and = τ/T∈(−0.5,0.5] is the estimation of the normalized symbol timing offset, where τ is the time delay from the optimal symbol timing instant and T denotes the symbol period. For SLN, a two-fold oversampling ratio is assumed such that the signal samples xn should be interpolated from 2SPS to 4SPS x'n by padding zeros in the frequency domain.Equations (2) and (3) are for down-sampling a T/2-spaced signal to a T-spaced signal in the frequency domain.
(I) Derivation of the frequency domain expression of SLN
whereis the 4SPS sequence andis the DFT of it. As 2SPS is assumed for the received sequence xn, the frequency domain interpolation is conducted to up-sample the signal. Thus, we haveEq. (1) yields
(II) Derivation of the frequency domain expression of Gardner:
(III) Derivation of the frequency domain expression of Lee
whereEq. (6) and Eq. (8) into Eq. (7), we getEquations (5), (6) and (9) are listed in the right column of Table 1.
It is clear that, unlike the 4SPS SLN , 2SPS TPEs explore the auto-correlation (AC) between the positive and negative signal band centered at a frequency of ± fb/2, respectively. From Table 1, the following conclusions can be drawn: (i) the interpolation-based SLN is equivalent to that of Godard’s, provided that FD zero padding is adopted; (ii) Gardner’s TPE is approximately equivalent to the imaginary part of Godard’s or to Godard’s original TPE in ; (iii) Lee’s TPE is approximately equivalent to Godard’s and consists of Gardner’s as the imaginary part; (iv) the difference between Gardner’s and Godard’s original TPEs  lies in the weighting of spectral auto-correlation, i.e., whether it is a sinusoidal or rectangular multiplicative window, as shown in Fig. 2(a) for a 28GBaud/s NRZ-QPSK signal.
The estimator characteristics (S-curves) of the various TPEs are shown in Fig. 2(b), using 256 symbols with 2SPS of an NRZ-QPSK signal for each point. The timing phase ranges within a unit interval (UI), i.e., the symbol period. It can be seen that the S-curves of the SLN and Godard’s completely overlap and are slightly biased off the true value, similarly to Lee’s, which was pointed out by Wang et al . The S-curve of Gardner’s TPE exhibits a sinusoidal shape which is consistent with the fact that it is approximately equal to the imaginary part of Godard’s TPE. Based on this, one can expect close estimation accuracy among these algorithms. In addition, any pre-filtering claimed to enhance the estimation performance of one TPE should be equally applicable to others. One example is the equivalence between two TPEs put forward in [18, 19]. The proof is derived as following.
(IV) Derivation of equivalence of TPEs for Nyquist system
The TPE in  is to apply Gardner’s TPE on the power received sequence,Eq. (10) can also be expressed in the frequency domain as
The other TPE is a 2SPS-based fourth power phase detector (4PPD) put forward in . The TPE is expressed asFig. 2(a), as well as a constant scale factor. Furthermore, by utilizing the equivalence among the TPEs we prove previously, another three feedforward based TPEs can be deduced to work on the Nyquist signals by applying Godard’s, Lee’s and interpolation based SLN TPEs on the power of received sequence:
In terms of computational complexity, exponential functions in TPE calculations are simply reduced to 1, −1, j, and -j, and the computational complexity of addition and subtraction is insignificant relative to that of multiplication. In this case, the computational complexity is measured with the number of real multiplications consumed per symbol. Note that, for Godard’s TPE, additional FFT is required if the sample sequences are in the time domain, and for the SLN, interpolation from 2SPS to 4SPS is needed. Nevertheless, assuming the corresponding sample sequences are available for each TPE algorithm, it requires 4 real multiplications in each symbol for Godard’s, and 16 for the SLN, 2 for Gardner’s and 12 for Lee’s.
3. Simulation and discussion
We conduct simulations to corroborate the theoretical analysis. The simulation system is set up with no frequency offset, laser phase noise or fiber impairments except for random polarization rotation. The receiver bandwidth is 21 GHz. 217 symbols were used to calculate the BER for 28Gbaud/s PDM-NRZ-QPSK, 16-QAM and 64-QAM systems. The results are averaged over 10 independent trials. The loop bandwidth of the TPR, normalized to baud rate, is set to be 2E-3, i.e., the block-size for feedforward algorithms are 512 samples and the step-size for feedback algorithms are 2.9e-3. The bandwidth of the loop determines the quality of timing estimate. A narrowband loop provides more averaging over the additive noise and improves the quality of the estimate, whereas an increased bandwidth of loop is needed to provide better tracking performance if the channel response is changing and/or the transmitter clock is drifting with time . After TPR, CMA equalization is adopted for polarization de-multiplexing and for equalizing the filtering effect caused by the linear interpolation process of resampling. One additional stage of the cascaded multi-modulus algorithm (CMMA) is used in the cases of 16-QAM and 64-QAM to achieve better convergence compared to CMA.
First, we use timing jitter in decibel (dB) as the metric to quantify the TPE accuracy. The normalized jitter variance is defined as the variance of zero-crossing positions of the TPE S-curve, i.e. jitter = 20log10(δt), where δt is the standard deviation of the zero-crossing positions normalized to symbol period . Figure 3 shows various TPE S-curves produced by scanning the sampling phases on bursts of received NRZ-QPSK signal with a block size of 512 samples at a 10dB optical signal-to-noise ratio (OSNR). The histograms of zero-crossing phases of the corresponding algorithms are also plotted. Timing jitter variance can be calculated and plotted against OSNR in 0.1 nm in Fig. 4, together with the Modified Cramer Rao Bound (MCRB) . The OSNR ranges for QPSK, 16-QAM and 64-QAM are 9-15dB, 16-22dB and 16-29dB, respectively. Clearly, the curves of Godard’s and the SLN overlap for all three modulation formats considered, and so are Gardner’s and Lee’s, though they exhibit about 1 dB lower jitter than the other two algorithms. This should be attributed to their sinusoidal weighting of the spectrum auto-correlation as pointed out in Fig. 2(a). In addition, the jitter performance seems insensitive to modulation format.
Next, the impact of the receiver bandwidth and the weighting window bandwidth on the timing estimation jitter are investigated. The results at various OSNRs for the NRZ-QPSK system are shown in Figs. 5(a) and 5(b), respectively. The jitter variance decreases by 1dB with the increasing normalized receiver bandwidth from 0.5 to 1, whereas an optimal value of around 0.7 × baud rate could be identified by varying the rectangular weighting filter bandwidth. Based on Fig. 4, one can expect that substituting the rectangular weighting filter with a sinusoidal one would yield slightly better results, and the optimal filter bandwidth would also be identified. In fact, this is nothing but improved pre-filtering. Actually, the optimal pre-filtering for SLN, Gardner’s and Lee’s TPEs are studied in ,  and  separately, and in fact their conclusions on pre-filter design are essentially the same, which is predictable because the three TPEs are equivalent according to our previous analysis.
The evolution curves of timing phase estimation are shown in Figs. 6(a) and 6(b) for the feedback and feedforward schemes, respectively. The step-size of feedback loop is 2.9e-3 and the block-size of feedforward loop is 512 samples, both at a 10dB OSNR. No sampling frequency drift is considered for this result. Compared to the feedforward scheme, there is an additional acquisition stage for the feedback scheme, consuming about 250 symbols in this case. Note that instead of locking the ADC to the optimal strobe points, which will lead to a near zero timing phase upon the loop convergence, the digital feedback loop essentially lock the timing phase estimation to its corresponding real value upon convergence, as shown for sampling phases of + 0.25, 0 and −0.25 in Fig. 6.
The OSNR penalty at BER = 3.8E-3 (2E-2) with respect to the sampling phase for the PDM-NRZ-QPSK (16-, 64-QAM) system is shown in Fig. 7. The reference OSNR is calculated according to the theoretical equation . The “1SPS” stands for the case of down-sampling the received signal to symbol rate directly without applying any TPE algorithm. Clearly, the impact of different TPE algorithms on system performance is hardly distinguishable, mostly because the subsequent adaptive equalizer could effectively remove the residual timing error. In fact, Kikuchi has demonstrated that the fractionally spaced adaptive equalizer is capable of compensating the timing phase error even without using the TPE . On the other hand, it is observed that all the scenarios for two-fold sampling exhibit OSNR penalty reduction compared to the optimal condition of “1 SPS” case. This is something we called the effect of “sampling diversity”. It can be defined as using two or more received samples per symbol to improve the quality and reliability of a communication link. And the “sampling diversity gain” is defined as the OSNR penalty reduction from the optimal condition of “1 SPS” case, as is indicated in Fig. 7.
The sampling diversity is the results of signal oversampling and can be comprehended by analogy with the classical receiver diversity in wireless communications , where two or more antennas are employed to receive multiple copies of signal with uncorrelated noise for each antenna, leading to a signal-to-noise ratio (SNR) gain for the combined signal. Assuming the digital coherent receiver applies two-fold oversampling to the signal and the TPE applies linear interpolation to combine the two samples, the SNR of the received signal can be expressed as:
Conclusion can be drawn from (17) that (c1) for cases of μ = 0 () and μ = 1 (), the SNR will be the same as that in “1SPS” case with corresponding sampling phase; (c2) The SNR is maximized when μ is 0.5 (), provided that s1 = s2 = s for all possible sampling phases, i.e., the signal is perfectly “flat”. The SNR in this case will be, i.e., the two-fold oversampling in conjunction with linear interpolation will gain 3dB on the SNR at sampling phase offor such signal. The in-phase/quadrature part of a NRZ-QPSK signal with infinite bandwidth suits this case and can be expressed as:
Figure 8 demonstrates the NRZ waveform containing 3 symbols (−1,1,1) and the calculated signal power, noise power and SNR according to (20). Clearly, thehave largest SNR. Figure 8(a) also shows the NRZ waveform filtered with 0.75 × baudrate bandwidth as a comparison to the perfect flat pulses. Note that the low-pass filter will introduce rising/falling edges of finite length in the waveform, whereas the consecutive 1s and −1s remains. This will change the expression of signal power and lead to a SNR that is highly related to the pulse shape and receiver bandwidth. Therefore, we investigate the OSNR penalty with respect to the sampling phase for (a) NRZ, (b) RZ 33%, (c) RZ 50% and (d) RZ 67% with different received bandwidths for a 112Gb/s PDM-QPSK system and the results are shown in Fig. 9. It can be observed that, for larger duty circles such as NRZ and RZ 67% with large receiver bandwidth such as 0.75 × baud rate, the diversity gain is similar with that in Fig. 8(b) since these signals are “nearly flat”, whereas for smaller duty circles such as RZ 50% and RZ 33% with small receiver bandwidth, the result is exactly opposite.
Similar to the analysis of NRZ signal, the SNR expressions of RZ 33%, RZ 50% and RZ 67% signal can be written as
Figure 10 depicts the RZ 33% waveform containing 3 symbols (−1,1,1) and the calculated signal power, noise power and SNR according to (21). Combining Fig. 8 and Fig. 10, one can see that the minimal noise power is always reached at, whereas signal power distribution is related to the pulse shapes, leading to a complex diversity gain distribution for various signal conditions (see Fig. 9). In reality, the common case is that both the optical transmitter and receiver are heavily band-limited and the overall bandwidth is even approaching the Nyquist limit for cases such as super-channel. Also, the ADC sampling clock will exhibit both sampling jitter and frequency drift. Thus sampling diversity gain and the OSNR penalty of system will be the results of averaging over all the possible sampling phases.
Furthermore, the impact of fiber nonlinearity on the timing estimation performance is investigated. Linewidth of both the transmitter and receiver laser are set to be 100 KHz and 13 spans of 80km SSMF transmission link is employed. The fiber loss is 0.2dB/km and the nonlinearity coefficient is 1.26/km/W. The noise figure of EDFA is 5dB. Before TPE, the cumulated CD is compensated by the overlap frequency domain equalization (OFDE) algorithm . Figure 11(a) shows the timing jitter variances as the function of fiber launch power for 28GBaud NRZ-PM-QPSK. At low launch power, the four TPE algorithms have similar performances and the jitter variance decrease with the increase of launch power. As the launch power goes up, fiber nonlinear effects cause excessive jitter variance and the curves of four TPE algorithms tend to overlap. Figure 11(b) shows the error vector magnitude (EVM) with respect to fiber launch power. Similar trend is observed as that of jitter variance and the performance almost overlap for all the launch power. Besides, the optimal launched power coincides at 0dBm for both jitter variance and EVM performance.
4. Experiment and discussion
The simulation results were substantiated by experiments. The experimental setup is depicted in Fig. 12. The 10 Gbaud/s QPSK signal was generated by an I/Q modulator, driven by electrical signals synthesized via a pulse pattern generator (PPG) with 215-1 PRBS. At the receiver, ASE noise was added to the signal to set the OSNR. Then the combined signal was sufficiently detected and converted to the baseband electrical signal. A digital storage oscilloscope (DSO) is used to capture the signal at 50 G samples/s followed by offline DSP. The digital signal was first up-sampled to 10SPS to offer 10 effective sampling phases. Then, the data sequence was down-sampled to 2SPS and processed successively by timing phase recovery (TPR), adaptive equalization (AEQ) with constant modulus algorithm (CMA), carrier phase recovery (CR) with Viterbi-Viterbi phase estimation algorithm, data recovery (DR) and bit error counting.
First, the 10SPS signal is utilized to produce S-curves and to calculate the jitter variance of the TPE. The S-curves at 8.3 dB OSNR are shown in Fig. 13(a), associated with the measured jitter variance as shown in Fig. 13(b). Similar to the simulation results, Gardner’s and Lee’s exhibit about 1 dB lower jitter variance than Godard’s and the SLN at the same OSNR. The BER curves are plotted in Fig. 14(a), showing that ~0.8 dB OSNR gain at BER = 1E-3 can be obtained by 2SPS oversampling diversity. The OSNR penalty at BER = 3.8E-3 with respect to sampling phases are shown in Fig. 14(b), which agrees well with simulation.
In this paper, the performances of four timing phase estimator algorithms are analytically investigated. By expressing them in the frequency domain, we proved that the four TPEs are all actually equivalent with a slight but yet explicable difference. Thus, the four algorithms are interchangeable, subject to their applicable structures and computational complexity. The impact of the bandwidth of the pre-filter to the timing phase estimation accuracy is presented. The performance dependence on the sampling phase is observed and can be explained with the effect of “sampling diversity”. The analysis has been verified by simulated and experimental results.
The work is partially supported by the Program of Zhejiang Leading Team (2010R50007) of Science and Technology Innovation (the Science and Technology Department of Zhejiang Province). The authors would also like to acknowledge the generous support of Hong Kong Polytechnic University under project G-YN28.
References and links
2. M. Kuschnerov, F. Hauske, K. Piyawanno, B. Spinnler, M. Alfiad, A. Napoli, and B. Lankl, “DSP for coherent single-carrier receivers,” J. Lightwave Technol. 27(16), 3614–3622 (2009). [CrossRef]
4. H. Sun and K. Wu, “Clock recovery and jitter sources in coherent transmission systems,” in Proceedings OFC/NFOEC, Los Angeles, CA (2012), Paper OTh4C.1. [CrossRef]
5. T. Tanimura, T. Hoshida, S. Oda, H. Nakashima, M. Yuki, Z. Tao, L. Liu, and J. C. Rasmussen, “Digital clock recovery algorithm for optical coherent receivers operating independent of laser frequency offset,” in Proceeding ECOC, Brussels, Belgium (2008), Paper Mo.3.D.2. [CrossRef]
6. N. Stojanovic, F. N. Hauske, C. Xie, and M. Chen, “Clock recovery in coherent optical receivers,” in Proceeding Photonic Networks 12.ITG Symposium, Leipzig, Germany (2011), Paper 9.
7. M. Oerder and H. Meyr, “Digital filter and square timing recovery,” IEEE Trans. Commun. 36(5), 605–612 (1988). [CrossRef]
8. F. Gardner, “A BPSK/QPSK timing-error detector for sampled receivers,” IEEE Trans. Commun. 34(5), 423–429 (1986). [CrossRef]
9. D. Godard, “Passband timing recovery in an all-digital modem receiver,” IEEE Trans. Commun. 26(5), 517–523 (1978). [CrossRef]
10. S. Lee, “A new non-data-aided feedforward symbol timing estimator using two samples per symbol,” IEEE Commun. Lett. 6(5), 205–207 (2002). [CrossRef]
11. K. Shi, Y. Wang, and E. Serpedin, “On the design of a digital blind feedforward, nearly jitter-free timing-recovery scheme for linear modulations,” IEEE Trans. Commun. 52(9), 1464–1469 (2004). [CrossRef]
12. Y. Wang, E. Serpedin, and P. Ciblat, “An alternative blind feedforward symbol timing estimator using two samples per symbol,” IEEE Trans. Commun. 51(9), 1451–1455 (2003). [CrossRef]
13. E. Panayirci and E. K. Bar-Ness, “A new approach for evaluating the performance of a symbol timing recovery system employing a general type of nonlinearity,” IEEE Trans. Commun. 44(1), 29–33 (1996). [CrossRef]
14. T. T. Fang, “Analysis of self-noise in a fourth-power clock regenerator,” IEEE Trans. Commun. 39(1), 133–140 (1991). [CrossRef]
15. M. Morelli, A. N. D’Andrea, and U. Mengali, “Feedforward ML-based timing estimation with PSK signals,” IEEE Commun. Lett. 1(3), 80–82 (1997). [CrossRef]
16. H. Sun and K. Wu, “A novel dispersion and PMD tolerant clock phase detector for coherent transmission systems,” in Proceeding OFC/NFOEC, Los Angeles, CA (2011), SPWC5. [CrossRef]
17. A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing, 2nd ed. (Prentice Hall, 1999).
18. M. Yan, Z. Tao, L. Dou, L. Li, Y. Zhao, T. Hoshida, and J. Rasmussen, “Digital Clock Recovery Algorithm for Nyquist Signal,” in Proceeding OFC/NFOEC, Anaheim, CA (2013), paper OTu2I.7. [CrossRef]
19. N. Stojanovic, N. G. Gonzalez, C. Xie, Y. Zhao, B. Mao, J. Qi, and L. M. Binh, “Timing recovery in nyquist coherent optical systems,” in Proceeding 20th Telecommunications Forum (TELFOR), Serbia, Belgrade (2012), pp. 895–898. [CrossRef]
20. J. G. Proakis and M. Salehi, Digital Communications, 5th ed. (McGraw Hill, 2007)
21. N. A. D’Andrea, U. Mengali, and R. Reggiannini, “The modified Cramer-Rao bound and its application to synchronization problems,” IEEE Trans. Commun. 42(2–4), 1391–1399 (1994). [CrossRef]
22. L. Franks and J. Bubrouski, “Statistical properties of timing jitter in a PAM timing recovery scheme,” IEEE Trans. Commun. 22(7), 913–920 (1974). [CrossRef]
23. N. A. D’Andrea and M. Luise, “Design and analysis of a jitter-free clock recovery scheme for QAM systems,” IEEE Trans. Commun. 41(9), 1296–1299 (1993). [CrossRef]
24. R. Kudo, T. Kobayashi, K. Ishihara, Y. Takatori, A. Sano, and Y. Miyamoto, “Coherent optical single carrier transmission using overlap frequency domain equalization for long-haul optical systems,” J. Lightwave Technol. 27(16), 3721–3728 (2009). [CrossRef]