The MSDD carrier phase estimation technique is derived here for optically coherent QPSK transmission, introducing the principle of operation while providing intuitive insight in terms of a multi-symbol extension of naïve delay-detection. We derive here for the first time Wiener-optimized and LMS-adapted versions of MSDD, introduce simplified hardware realizations, and evaluate complexity and numerical performance tradeoffs of this highly robust and low-complexity carrier phase recovery method. A multiplier-free carrier phase recovery version of the MSDD provides nearly optimal performance for linewidths up to ~0.5 MHz, whereas for wider linewidths, the Wiener or LMS versions provide optimal performance at about 9 taps, using 1 or 2 complex multipliers per tap.
© 2012 OSA
Carrier Recovery (CR) is a critical component in modern DSP-oriented coherent receivers (Rx) for 100-400G transmission and beyond. Multiple carrier phase estimation (CPE) methods have been heretofore considered for QPSK transmission, among them [1–10]. One of the most popular CPE techniques for QPSK coherent detection is the Viterbi&Viterbi algorithm , which is conceptually elegant, yet suffers from phase wrap-around effects, cycle slips and noise enhancement due to the non-linear M-th power and scaled argument extraction operations.
Our objective is a novel CR technique for QPSK optically coherent links, based on Multi-Symbol-Delay Detection (MSDD), called Multi-Symbol-Differential Detection (also with MSDD acronym) in the wireless literature, alternatively referred to by the synonym term Multi-Symbol-Phase Estimation (MSPE) which was also used in photonic applications.
Historically, MSDD was introduced in the electrical communications context more than two decades ago [11–14]. More recently, the MSDD method was applied to carrier phase estimation for coherent receivers under the name Maximum Likelihood Phase Estimation, by a group from the National Univ. of Singapore [15–17]. While those works applied the MSDD technique to coherent optical detection, prior applications of MSDD were already introduced in optical communication by multiple groups since 2005, in the related context of self-coherent detection (coherent-grade incoherent detection without a local oscillator) [18–28], on which topic a review chapter recently appeared in . Our interest here is MSDD for coherent rather than incoherent or self-coherent detection, but it should be mentioned that the mathematics of self-coherent and coherent MSDD are formally similar. The applicability of MSDD to coherent detection was recently previewed in our brief expositions ,  which explored applications to both QPSK and QAM coherent receivers. N. Kikuchi et al also recently ported their self-coherent or incoherent detection MSDD approach (called in their language “delay detection”) to the realization of a CR sub-system for coherent detection .
It turns out that, beyond QPSK, our MSDD methodology is also applicable to QAM coherent detection, as well as to carrier frequency offset (CFO) estimation in addition to phase estimation. Nevertheless, for ease of exposition of the initial concept, in this work we focus exclusively on thoroughly deriving and explaining MSDD carrier phase estimation (CPE) principle for QPSK coherent detection, relegating to a future publication the additional MSDD extensions to QAM and to CFO tracking and correction. The MSDD CPE method is theoretically derived and simulated here in the QPSK transmission context, however we emphasize that our method is actually “QAM-ready”—the block diagrams developed here will function for QAM as well, however QAM extensions our outside the QPSK-oriented paper scope.
We aim to establish MSDD as a preferred alternative for accurate yet simple QPSK carrier phase estimation and correction. Unlike prior methods, our MSDD method is optimal in the Minimum-Mean-Square-Error (MMSE) sense, in the wake of channel statistics consisting of a combination of ASE-induced phase noise (PN) and laser phase noise (LPN), i.e. the MSDD CR will exhibit the best possible OSNR performance and tolerance to laser linewidth (LW). The adaptive LMS version, as derived here in detail for the first time, requires no prior knowledge of channel statistics – it learns the channel whatever the relative strengths of ASE and LPN (OSNR vs. LW) are, automatically adjusting the taps for optimal performance.
Notice that we inevitably require multiple, L, taps in order to suppress the phase noise by an effective averaging effect. The computational complexity of our optimized algorithm is about one complex multiplier (CM) per tap for the Wiener-optimal version with fixed coefficients and about 2 CMs per tap for the LMS-adaptive version. However, at the expense of slight (or in some cases negligible) reduction in performance, if we give up optimized coefficients but rather make all tap coefficients equal to unity, we obtain an MSDD variant of ultimate simplicity: The CPE becomes multiplier-free. This version has negligible performance penalty relative to a fully optimized MSDD, in the prevalent scenario that for coherent-grade lasers with 100 KHz linewidth are used in the transmitter and for LO, and even up to 0.5 MHz linewidth for a parallelization factor of 16. In addition to performance and complexity metrics, we should also mention that the MSDD CR method is robust, providing uninterrupted operation, as MSDD processing is essentially linear time-varying, rather than non-linear, thus cycle slips and other non-linear phase-wrapping artifacts of the competing leading M-power (Viterbi&Viterbi) method for QPSK CR, are completely eliminated.
The paper is structured as follows: Section 2 reviews generic CR concepts and discusses the naïve Delay Detector (DD), which is extended in section 3 to the more advanced MSDD concept, explaining the MSDD principle of operation. Section 4 develops a Wiener filtering solution, optimizing the MSDD coefficients for a channel affected by both ASE-induced and laser source phase-noises. In Section 5 we derive an LMS adaptive algorithm for the MSDD coefficients. Section 6 introduces efficient implementations and evaluates computational complexity of the MSDD. Two hardware structures are derived: a very low complexity multiplier-free CPE which is non-adaptive and non-optimized (but displays nearly optimal performance for low linewidths) and a more complex optimally performing Wiener or LMS-adaptive version. Section 7 develops the polyphase hardware parallelization of the MSDD. Section 8 presents numeric simulation performance results and Section 9 concludes the paper.
Appendix A reviews some differential precoding mathematical properties, Appendix B details the derivation of the Wiener optimal solution and Appendix C collects the relevant abbreviations used in this paper.
2. Carrier recovery (CR) concepts – naïve delay detector (DD)
2.1 Differential precoding
Differential precoding is used in Direct Detection Differential Phase Shift Keying (DPSK) systems, yet here we are interested in CR for coherent rather than direct detection. Our motivation for reviewing and expanding the DP concept is that MSDD carrier recovery may be viewed as a generalization of DPSK, retaining some of the DPSK advantages while overcoming the sensitivity disadvantage of DPSK. A coherent QPSK transmitter (Tx) intended to operate with an MSDD based receiver (Rx), should include a Differential Precoder (DP) (Fig. 1 ). Each information symbol, as selected out of the QPSK complex alphabet, is mapped into a line symbol from the same alphabet, , according to following recursion which defines the DP mapping:
The DP recursion [Eq. (1)] amounts to an additive accumulator in the phase domain: The QPSK information phase sets the difference between two successive phases of the line symbols, i.e. information is encoded in the phase differences transmitted on the line.
A more mathematically abstract formulation of the DP, amenable to generalizing the current QPSK MSDD to a higher-order QAM constellation, is obtained in terms of the following unimodular normalization operation, referred to as “Uop”,
A modulus-preserving differential precoder (MP-DP) applicable to both QPSK and QAM was proposed by N. Kikuchi . In the polar domain this MP-DP is described as accumulating the phase, while preserving the modulus, . In the Uop based complex notation, the MP-DP is compactly represented as :
In this paper our exclusive focus is on QPSK coherent transmission. It is readily verified that the MP-DP transformation, Eq. (3), generally applicable to QAM, reduces to Eq. (1) in the special case of a QPSK constellation, wherein . Henceforth, for brevity, we use the term DP in the sense of MP-DP (such as in Fig. 1). Resorting to the complex description for the DP will facilitate of mean-square optimal MSDD coefficients derivation in section 4.
2.2 Link model including the CR
The QPSK Tx (Fig. 2 ) comprises a DP, generating the line symbols , differentially precoded as per Eq. (3). In the coherent Rx, our interest is in the carrier recovery module, the role of which is to “clean up” phase noise and frequency offsets. This paper is devoted to the specific MSDD embodiment of the CR. The input complex samples are assumed to have been polarization-demultiplexed and are essentially free of Inter-Symbol-Interference (ISI), but evidently still carry phase noise, which is to be mitigated by the CR:Eq. (4) reduces to the simple memoriless channel modelEq. (5) is also circular Gaussian, with scaled-down variance. The total phase noise effect is compactly encapsulated in the PN multiplicative noise sequence, comprising LPN and ASE additive contributions:
Figure 3 reviews the top-level internal structure of a generic CR module, essentially comprising a Carrier Phase Estimation (CPE) module followed by a demodulator.
2.3 Naïve delay detector
A simple CPE strategy is to use delay-detection (DD). The simple CR is variously referred to as delay detector, differential detector or delay demodulator (all abbreviated as DD). The received signal is then conjugate-multiplied with a unit-time delayed version of itself, , yieldingEq. (1) was used in the last equality. The rounded hat on the CR output indicates that this is an “analog” estimate of the transmission symbol , attempting to approximate at least the phase of as faithfully as possible. This noisy estimate, is then sliced (its phase is quantized) in order to extract the decision (pointed hat denotes decision, rounded hat denotes the CR output – noisy estimate of to be input into the slicer). As is well known, the naïve DD is too noisy (approximately doubles the input ASE noise power) thus fails to provide a useful CR for coherent detection. Nevertheless the delay-detection concept is the starting point leading to the high-performance MSDD CR realization, interpreted as a generalization of the naïve DD.
3. From the naïve DD to MSDD carrier recovery
3.1 MSDD principle: Generation of an improved reference from prior received samples
In a naive DD, the last sample,, is just too noisy a phase reference. Let us then also process the earlier samples, i.e. generate our CPE by acting on moving window of L past samples, , in order to form an improved reference, , and demodulate the received samples with it, forming an improved decision variable to be presented to the slicer (Fig. 4(a) ). Hopefully, the improved reference, , will be quieter than the original reference, . The problem we are facing in processing the earlier samples in order to form , is that the prior samples are not aligned with due to the data modulation, thus if we attempt to use the terms instead of in the delay demodulation, then the demodulation will not function properly, unless we first rotate each of the terms in order to make them aligned with . In Fig. 4(b) we show how this alignment process is applied one prior sample at a time. We already know we can accomplish proper (albeit noisy) delay detection with as phase reference. Next, let us consider as potential phase reference. The key is to revisit the DP recursion Eq. (1) for the transmitted line symbols. A similar recursion approximately holds between the received symbols, , as according to Eq. (5), the -s are just noisy versions of . Shifting the approximate recursion back one time-unit (substituting ) yields , indicating that if is rotated by the complex information symbol , we then obtain a rotated symbol, , which is roughly aligned with the DD reference, . Similarly, (taking yet another time-unit back) , thus compounding the last two equations yields the relation , which indicates that may also be alternatively used as phase reference, as it is also nearly aligned with . More generally,
At this point let us clarify the usage of decision feedback (DF). The partial reference Eq. (9) presumes that the transmission symbols are known at the Rx, which is evidently not the case (as then we would just use them for decisions, setting ). In the absence of a “genie” whispering to us what the transmitted symbols are, the next best approximation is to use the slicer decisions as estimates of the true . Thus, in an actual implementation, we replace the partial reference Eq. (9) by a decision-feedback derived one (just placing hats over s-es),
Which of the alternative phase references should be used for demodulation? It turns out that no particular one is preferred; however, the question arises whether we can take advantage of them all, combining these partial references into an improved reference generating a higher quality decision. In the case ASE-induced PN is a significant component of the overall PN (which usually holds when coherent-grade lasers are used), as white ASE noise is dominant, the partial references are essentially mutually independent. In this case it is advantageous to form a linear combination of these partial references (in the simplest case take their sum), generating an improved reference, as follows:Fig. 5(a) , is the MSDD.
The improved reference Eq. (11) is seen to be formed as a linear combination of L partial references, namely prior samples, phase rotated into alignment. A phasor diagram presenting the rotation (alignment) process of the various past samples is shown in Fig. 5(b). In this figure for simplicity, all linear combination coefficients are taken equal, , such that the linear combination forming the improved reference reduces to a sum of prior rotated samples. If there were no noise, the prior samples would become perfectly aligned with . If the PN is entirely white (ASE-induced, i.e. no LPN, i.e., Eq. (5) reduces to ), then these noisy phasors are no longer perfectly aligned, yet are nearly collinear with (which is itself perturbed by additive noise). Nevertheless, the additive white noise perturbations, , added to the noiseless symbols , to form the prior samples , are mutually uncorrelated, thus add up incoherently; in amplitude, the noises add up on an RMS basis, such that the total RMS noise grows as as the window size of past samples, L, is increased. In contrast, the noiseless components of the received samples, are all collinear and have equal lengths, thus the total signal component of the improved reference grows up linearly in , therefore the SNR of improved reference grows up as .
It appears advantageous to accrue the noise averaging effect over arbitrarily long windows (though in practice, we would get diminishing returns beyond a certain window size, and the computational complexity must also be taken into account). However, when LPN is present, an opposite effect is at work, namely the longer the record of past samples used in forming the improved reference, the worse the LPN induced degradation. Thus, a “block length” effect emerges – it does not pay to increase the block length L indefinitely, but there is an optimal block length, L, as determined by the balance of the ASE and laser phase noises. In this simplified analysis we assumed equal coefficients, (taken as unity without loss of generality), but more generally the linear combination coefficients may be arbitrarily selected, in the combined presence of ASE and LPN phase noise sources. In section 4 we apply Wiener filtering theory in order to determine unequal optimal coefficientswhich yield the best performance for any given block length L, striking the best balance between the opposing effects of ASE and LPN.
3.2 MSDD alternative formulation in terms of partial DD estimators
To derive an alternative point of view of the MSDD demodulation process, let us substitute the improved reference Eq. (11) into the demodulation relation, , yieldingEq. (9) for the i-th partial reference and introduced the i-th partial estimatorEq. (12) that the MSDD estimate of the transmitted symbol may be expressed as a linear combination of partial estimators , each obtained by demodulation with a partial reference, each of which could by itself provide a valid, albeit noisier, estimate for the information symbol, as described in the block diagram of Fig. 5(a). It turns out that this alternative equivalent realization of the MSDD would be less desirable for efficient computation than the original MSDD block diagram of Fig. 5(a), which generates first the improved reference, Eq. (11), as a linear combination of partial references, then demodulates with it. Nevertheless the alternative partial-estimators formulation Eq. (12) is more amenable to Wiener optimization, as pursued next.
4. Optimal Wiener-filtering based Minimum Mean Square Error (MMSE) solution
In this section we derive the MMSE optimal solution, which aims at minimizing the Mean Square Error (MSE) between the QPSK or QAM symbols, , and their estimates , as generated at the MSDD output (slicer input). Introducing the estimation error , we seek the optimal MSDD coefficients minimizing the MSE, .
Note that, for the purpose of QPSK detection, we have heretofore ignored the magnitude (modulus) of the improved estimate , which is generated by mixing the received symbol with the improved reference , in the process of the generation of which the magnitudes were not normalized. As the QPSK slicer essentially acts on the angle of , ignoring the magnitude does not pose a problem. A different length of the reference phasor will just scale the modulus of estimate without affecting its phase. However, once QPSK transmission is extended to QAM, the references magnitudes do become important. Even in the current QPSK context, proper processing of references magnitudes does become essential in the MMSE formulation and derivation. Indeed, although the phase of our slicer input generated by the MSDD tends to be close to that of the actual transmission symbol, , nevertheless if the magnitudes of and are disparate, then a large MSE deviation may still be generated, defeating the minimization process. Thus, in order to properly optimize the MSDD coefficients, it is imperative to properly scale magnitudes, such that the estimate be made to approach not only in phase but also in modulus, and a small residual estimation error may be generated. Here we use the Uop normalization Eq. (2) as a key step enabling to devise a modified MSDD structure for QPSK (also applicable to QAM), suitable for attaining the MMSE condition. To this end, we propose to apply the Uop to the partial references, , now to be replaced by Uop-normalized versions(which preserve the original angles of , i.e., are still nearly aligned with , hence are also suitable to form an improved reference):
The resulting MSDD improved reference is then formed by the linear combination
Using the U-notU magnitude normalizations proposed here, the modified “U-notU” MSDD is analyzed in Appendix B in terms of the phase-noisy memoriless channel model.Eq. (34) derived in Appendix A, namely . The resulting Eq. (16) indicates that indeed qualify as partial estimators for , as they essentially coincide with the transmitted symbols , apart from multiplicative phase noise perturbations .
Considering the U-notU modified MSDD structure, as introduced above, we now address the problem of optimizing the c-coefficients such as to minimize the Mean Square Error (MSE) between the transmitted symbol,, and its MSDD estimate . First, we compactly express the MSDD estimate in terms of inner products between a coefficients vector and vectors of partial estimates and partial references (here denotes the conjugate transpose, while the overbar is an alternative notation for the complex conjugate: ):Eq. (20) for our optical channel Eq. (5), resulting in expressions Eq. (48) which are substituted into Eq. (19), reducing to the following operational form of the W-H linear system of equations in the L unknowns :Eq. (21) is given by:Eq. (21) of the W-H equations for the U-notU optimal coefficients may be solved numerically offline, provided that the statistical/physical parameters (signal power, ASE noise variance and laser linewidth) have been estimated. A more practical approach pursued next is to derive an LMS adaptation scheme for the coefficients, such that the coefficients are iteratively adjusted, approximately converging to the optimal MMSE values mandated by the W-H equation, automatically learning the phase-noise channel statistics.
5. LMS algorithm for the MSDD coefficients
In practice, the channel phase-noise statistics (balance of laser phase noise, ASE, and also nonlinear phase noise contributions) is unknown and may even be time-varying. Therefore, it is advantageous to devise an adaptive method to approach the optimal MSDD coefficients automatically. Here we derive an LMS algorithm for the “U-notU” MSDD coefficients.
Conjugate-transposing the orthogonality relation Eq. (37), yieldsEq. (24) was obtained substituting Eq. (14). In light of Eq. (23), the updates vector has zero expectation, , whenever the coefficients are MMSE optimal. The elements of the update vector provide the coefficient updates for the LMS algorithm associated with the MMSE problem. When its expectation is not zero, the update vector tells us in which direction to adjust the coefficients in order to advance to zero expectation, i.e. to optimal coefficients. To verify that the proper coefficients update vector for the LMS algorithm is indeed given by Eq. (24), we evaluate the squared error (SE) gradient,33], the SE gradient with respect to the coefficient vector is derived as follows:Eq. (24) yields our final result, the LMS coefficients recursion:Eq. (15):Figs. 6 and 7 below, applicable to both QPSK and QAM, though our interest in this paper is in QPSK systems. The performance attainable with this LMS algorithm will be ascertained in section 8.
6. Efficient hardware implementations
In this section we derive an efficient hardware implementation for the MSDD sub-system, as illustrated in Fig. 6 below. In this block diagram, the number of complex multiplications is reduced below the one implied by Eq. (11), which indicates that the MSDD must calculate, in every clock cycle, the linear combination Eq. (15) of partial references. The direct evaluation of the i-th partial reference, Eq. (9),, seems to require i-1 multiplications by s-symbols to be applied to , per clock cycle. The diagram of Fig. 6 presents a more efficient realization, first disclosed in , wherein just a single complex multiplication of with an s-symbol is performed per clock cycle, rather than i-1 complex multiplications. This complexity reduction is achieved by a skillful arrangement of multipliers interspersed with delay line at the top of the figure, used to generate the partial references.
In addition, we modify the block diagram of Fig. 6 to incorporate the Uop required in the modified MSDD structure introduced in the last section, enabling either the MMSE optimal solution derived in section 4 or the LMS based adaptive solution, as per section 5. Uops are also required for extending QPSK to QAM transmission, to be pursued in future publication, thus our QPSK MSDD structure is “QAM-ready”.
The block diagram further features a coefficients control module tasked with generating the optimal coefficients, , whether by an offline MMSE calculation (solution of the W-H equation as derived in the last section, or alternatively (preferably) by means of the adaptive LMS algorithm Eq. (28). In addition, in order to implement the U-notU MSDD modification, a Uop acting on the received samples, , is inserted ahead of the partial references delay line at the top of the figure.
6.1 MSDD hardware realization complexity (excluding the adaptive coefficients control)
Inspecting Fig. 6, let us initially ignore the complexity of the coefficients control module. We then count, in the core MSDD system, L complex multipliers (CM) of the partial estimates with the coefficients, as well as L multipliers performing rotations by the QPSK decision symbols. As multiplications by the QPSK constellation points are trivial, those do not contribute complexity. In addition, we have an extra full-fledged CM for the demodulation, . We should also account for the Uop complexity, which was estimated in Appendix A to consist of four real-multipliers, amounting to CMs, expressing complexity in equivalent CM terms. Thus, overall there are complex multiplications to be performed per clock-cycle. We further note that the complexity of the L multiplications with the coefficients may be reduced by quantizing the coefficients to various degrees, setting a tradeoff between complexity and precision (i.e., CR performance).
A simplified system is obtained for , replacing the L coefficients multiplications by a single scaling multiplication performed prior to the demodulation, as indicated in Fig. 7. In this reduced-complexity MSDD version, as the coefficients are no longer optimized, the Uop may be discarded. Thus, this highly efficient non-optimal QPSK MSDD version, with uniform coefficients, requires just a single heavy complex multiplication (the one used for the demodulation). Evidently, as non-optimal coefficients are used, there is some performance degradation. Nevertheless, if the laser linewidth does not exceed 0.5 MHz, the resulting performance penalty will be quite small or even negligible (see section 8). This reduced complexity MSDD implementation is preferred for 100G QPSK low-cost applications.
6.2 MSDD with adaptive coefficients control and its total complexity
At the high-end extreme, consider a high performance system with its coefficients LMS-optimized, as described in Fig. 8 , which details the inner workings of an adaptive realization of the “coefficients control” module of Fig. 6. This diagram precisely implements the functionality of the adaptive algorithm described in section 5.
Accounting now for the contribution of the coefficients adaptation to complexity, we must consider additional CMs: Another full-fledged CM, , an easy CM generating fixed scaling by (which may be quantized to a convenient value, with few one-bits, which is simple to multiply by, so it will not be counted), then another full-fledged CM required for generating the error, Eq. (29), plus L full-fledged CMs generating the coefficient updates, . Thus L + 2 extra full-fledged multiplications are required for the adaptive part, which when added to the non-adaptive multiplications, yields a total of full-fledged complex multiplications for the high-end adaptive CR realization of Fig. 8 for our high-end L-taps MSDD system. As FIR taps are inevitable for optimized averaging of the noise, it is apparent that we must invest slightly more than 2 CMs per tap (one for the actual tap multiplication, and a second one for the adaptation) in order to attain the noise reduction with optimized coefficients. The alternative is to settle for the non-optimized version of Fig. 7, in which we just have a single CM overall, but performance is slightly degraded.
7. Polyphase parallelization
Due to its usage of decision–feedback, the MSDD algorithm poses an implementation challenge for coherent optical receivers operating at tens of GBd rates, given that the fastest multipliers currently available with state-of-the-art ASIC technology operate at the rate of 2 to 3 GHz. As shown in , decision feedback (DF) based algorithms are not directly amenable to parallelization. Indeed, DF creates a dependency between modules, precluding independent parallel operation of identical processing sub-modules. Thus, a polyphase decomposition, i.e., time-parallelization of the processing using identical processing units operating on the polyphase components, would not equivalent to (in fact would have reduced performance relative to) our nominal MSDD, hypothetically operating at the full high rate. Nevertheless, realization-wise we adopt such parallelization strategy as shown in Fig. 9(b) , having P MSDD sub-modules, each operating on one of P polyphases, enabling digital hardware processing at P times lower clock rate. Thus, by taking the number P of parallel blocks sufficiently large, the processing clock rate per sub-module is sufficiently reduced to accommodate the available ASIC processing speeds. For this parallelization method, as each of the P parallel units operates at a rate a factor of P slower, the total (low) hardware complexity of the MSDD is essentially retained – the number of CMs per second is still the same. However, this CR system will have somewhat worse performance than the nominal full-speed MSDD.
In order to enable MSDD polyphase operation at the Rx, the Tx is modified to also support a polyphase version of differential precoding, comprising P parallel MP-DPs modules, each operating at reduced rate by a factor of 1/P, as shown in Fig. 9(a). The output of all P units is interleaved in order to create a continuous stream at the high rate. Each DP module essentially accumulates the prior phases in jumps of P samples (operating on a particular polyphase).
7.1 The distant feedback (DF) problem in parallelized MSDD processing
When using the polyphase implementation just introduced, the inputs to each MSDD sub-module are in jumps of P. The larger separation between MSDD input samples does not affect the white ASE noise performance, as there is no correlation between distinct ASE samples of white noise no matter how far apart. However, LPN noise performance is degraded under the polyphase implementation, as samples further away from each other are less correlated, and their relative phase noise is increased. Since the laser phase noise is a Weiner process with independent increments, , with variance proportional to the time interval T between samples (processing latency), i.e. inversely proportional to the sampling rate, it follows that reduction in sampling rate by a factor of P, due to parallelization, increases the variance of the laser phase noise by a factor of P. This amounts to having an effective laser linewidth P times wider. We refer to this laser phase noise tolerance penalty as the distant feedback effect, exacting a penalty due to the multiple parallel processing paths, which are inevitable at current CMOS clock speeds. Thus, the LPN tolerance will be degraded by a factor of P due to the parallelization, nevertheless, as the normalized phase noise tolerance of the MSDD method is very high to begin with (unless the laser phase noise is dominant relative to the ASE), the penalty will be seen to be small.
8. Simulation results
The simple channel model of subsection 2.2 is assumed here (Fig. 10 ). This channel model does not address fiber non-linearities, Tx band-limitation, Rx optical and electrical filtering, the effect of the Rx CD equalizer (equalization enhanced phase noise ) nevertheless this simple model still captures the salient phase noise features, allowing meaningful comparison of the resulting MSDD performance vs. that of the Viterbi&Viterbi M-power CPE of Fig. 11 .
In all Monte-Carlo and LMS simulations we assume a 100G PDM-QPSK system at 28 GBd baudrate per polarization, simulating a single polarization. We also assume a parallelization factor of P = 16, i.e. the DP transmission and MSDD detection is parallelized, as per Fig. 9, into 16 processing sub-modules.
Figure 12 simulates the QPSK MSDD system for zero LPN, just in the presence of ASE white noise, for various window sizes. As L is increased, the MSDD performance is seen to approach the so-called coherent HDD limit, namely coherent QPSK with differential precoding and hard (logic) differential decoding, i.e. the final complex-valued decision is expressed as . Notice that the robustness of HDD is much higher than that of soft differential decoding, , which corresponds to L = 1 (i.e., the window of past samples just includes the last sample), as the HDD hard decision is in error when either of the hard decisions are in error, which occurs with probability double that of either of them being in error. A linear factor of 2 on the BER scale corresponds to about 0.8 dB penalty at BER = 10E-3, which is much smaller than the ~3 dB penalty of differential decoder, as derived in sub-section 2.4. The ~2.2 dB gap between soft and hard differential decoding is bridged over by the MSDD - the higher the window size L, the more the HDD limit is approached. Here, in the absence of LPN, the white-noise performance is monotonic increasing in L.
Figure 13 presents simulated aspects of the adaptive LMS performance. The number L of LMS coefficients is selected to range from 6 and 15 to get most of the benefit, as is apparent from Fig. 13-left. Figure 13-right illustrates that the adaptive coefficients almost perfectly converge onto the optimal Wiener solution (red sticks vs. blue points). It is also apparent that the coefficient amplitudes decay such as to optimize the noise decorrelation due to the LPN.
The final performance attainable with uniform (all equal to 1/L) coefficients, vs. Wiener-optimal and LMS coefficients is shown in Fig. 14 , for linewidths of 0.1, 0.5 and 1 MHz. Two important effects are apparent: (i): For all linewidth, the performances with LMS and Wiener-optimal coefficients are almost indistinguishable. (ii): For 0.1 MHZ LW, the performance with uniform coefficients (corresponding to multiplier-free CPE) is also indistinguishable from that with LMS and with Wiener-optimal coefficients, whereas for 0.5 MHz linewidth, the penalty due to uniform coefficients is just 0.25 dB. However, for 1 MHz linewidth, the uniform coefficients penalty is ~1 dB. This indicates that the ultra-low complexity multiplier-free CPE (with single full-fledged multiplier required for demodulation but no non-trivial multipliers required for the CPE) is the best choice up to about 0.5 MHz linewidth, as its performance penalty is very small (negligible for coherent-grade 100 KHz lasers), whereas the complexity savings in the multiplier-free version is very large. For linewidths above 0.5 MHz, the coefficients optimization, whether adaptive or Wiener-based, does improve performance, the more so the wider the linewidth, but then one has to invest either 1 CM (for Wiener) or 2 CMs (for LMS) per tap (6 to 9 taps would suffice for optimal performance) to optimize performance. In any case the MSDD performance exceeds that of M-power.
In this paper we introduced the MSDD principle, explaining in detail how a moving window of L prior symbols may be linearly processed in order to generate a cleaner demodulation reference, relative to other carrier-recovery methods. The two MSDD versions presented here (multiplier-free vs. optimized) provide the least complex CR system vs. the best performance, as borne by numeric simulations indicating up to 1.9 dB advantage over the Viterbi&Viterbi algorithms and ultra-low complexity multiplier-free CPE realization.
Moreover, the MSDD features linear (time-varying) processing hence is free of cycle-slips and other phase unwrapping impairments.
The only weakness of MSDD is its reliance on decision-feedback, which exacts a “distant-feedback” linewidth penalty upon polyphases parallelization. Nevertheless, the simulated performance indicates that the resulting degradation is negligible up to ~0.5 MHz linewidth, thus for practical coherent systems, the limited linewidth tolerance may not be an issue – it is the improved resilience in the lower OSNR regime that makes MSDD the preferred scheme.
This work was devoted to coherent QPSK transmission, yet the MSDD CPE method may be extended to higher modulation formats. MSDD QAM operation was previewed in  , however this extensive key topic will be fully elaborated in a future publication, covering unique aspects of adaptive MSDD for QAM: consolidation of carrier phase and carrier frequency estimation in a single MSDD system, seamless transition between QAM constellation sizes and automatic adaptive scaling of the received QAM constellation.
Despite the proliferation of CR techniques, e.g. [1–10], we are convinced that the MSDD approach features the best performance-complexity tradeoffs and will evolve to be increasingly adopted as the carrier recovery method of choice.
Appendix A: Uop and modulus preserving differential encoding math properties
Some Uop properties: The Uop distributes over products, i.e. the Uop of a product is the product of Uops: ; Uop is an idempotent operation: . The last two relations lead to .
Next, let us evaluate the computational complexity of generating the Uop,Eq. (32), as follows:
Appendix B: Derivation of the Wiener-Hopf equations for the optimal coefficients
To derive the MMSE solution, minimizing Eq. (18), we invoke the orthogonality principle of linear estimation. The optimal coefficients vector is obtained from the condition that be the projection of the estimation target onto the “observations” subspace, i.e. the estimation error be orthogonal to each of the “observations” (which correspond here to the inputs into the linear estimator ):Eq. (7),. Start with evaluating the conjugate product:35] the expected phase noise exponent in Eq. (40) is then given by:Eq. (40) yields the autocorrelation of the PN factors:Eq. (46) reduces to:Eqs. (44) and (47) of the p-sequence into Eqs. (38) and (39) respectively, yields our final results for the second-order statistics required in formulating the Wiener-Hopf equations:
Appendix C: Abbreviations used in this paper
The two leftmost columns list the 18 abbreviations specific to this paper – the third column contains abbreviations in general use.
|CFO = Carrier Frequency Offset||MMSE = Minimal Mean Square Error||ASE = Amplified Spontaneous Emission|
|CM = Complex Multiplier||MP-DP = Modulus Preserving Diff. Precoder||FIR = Finite Impulse Response|
|CPE = Carrier Phase Estimation||MSDD =
Multi-Symbol Delay/Differential Detection||QPSK = Quadrature Phase Shift Keying|
|CR = Carrier Recovery||MSE = Mean Square Error||QAM = Quadrature Amplitude Modulation|
|DD = Delay/Differential Detector/Demodulator||MSPE = Multi-Symbol Phase Estimation||OSNR – Optical Signal to Noise Ratio|
|DP = Differential Precoder||PN = Phase Noise||SNR = Signal to Noise Ratio|
|LMS = Least Mean Squares||SE = Squared Error|
|LPN = Laser Phase Noise||Uop = Unimodular Normalization (Eq. (2))|
|LW = Linewidth||W-H = Wiener-Hopf (Equations)|
This work was supported in part by the Israeli Science Foundation (ISF), and by the Chief Scientist Office of the Israeli Ministry of Industry, Trade and Labor within ‘Tera Santa’ consortium.
References and links
1. A. Viterbi and A. Viterbi, “Nonlinear estimation of PSK-modulated carrier phase with application to burst digital transmission,” IEEE Trans. Inf. Theory 29(4), 543–551 (1983). [CrossRef]
2. E. Ip and J. M. Kahn, “Carrier synchronization for 3- and 4-bit-per-symbol optical transmission,” J. Lightwave Technol. 23(12), 4110–4124 (2005). [CrossRef]
3. R. Noé, “PLL-free synchronous QPSK polarization multiplex / diversity receiver concept with digital I & Q baseband processing,” IEEE Photon. Technol. Lett. 17(4), 887–889 (2005). [CrossRef]
4. M. G. Taylor, “Accurate digital phase estimation process for coherent detection using a parallel digital processor,” in ECOC’05 European Conf. of Optical Communication, Tu 4.2.6 (2005).
5. E. Ip and J. M. Kahn, “Feedforward carrier recovery for coherent optical communications,” J. Lightwave Technol. 25(9), 2675–2692 (2007). [CrossRef]
6. S. Hoffmann, S. Bhandare, T. Pfau, O. Adamczyk, C. Wordehoff, R. Peveling, M. Porrmann, and R. Noe, “Frequency and phase estimation for coherent QPSK transmission with unlocked DFB lasers,” IEEE Photon. Technol. Lett. 20(18), 1569–1571 (2008). [CrossRef]
7. M. G. Taylor, “Detection using digital signal processing,” J. Lightwave Technol. 27(7), 901–914 (2009). [CrossRef]
8. M. G. Taylor, “Algorithms for coherent detection what can we learn from other fields?” in OFC/NFOEC’10, Conf. on Optical Fiber Communication, OThL4 (2010).
9. M. G. Taylor, “Phase estimation methods for optical coherent detection using digital signal processing,” J. Lightwave Technol. 27(7), 901–914 (2009). [CrossRef]
10. K. Piyawanno, M. Kuschnerov, B. Spinnler, and B. Lankl, “Low complexity carrier recovery for coherent QAM using superscalar parallelization,” in ECOC’10 European Conf. of Optical Communication, We.7.A.3 (2010).
11. D. Divsalar and M. K. Simon, “Multiple-symbol differential detection of MPSK,” IEEE Trans. Commun. 38(3), 300–308 (1990). [CrossRef]
12. F. Edbauer, “Bit error rate of binary and quaternary DPSK signals with multiple differential feedback detection,” IEEE Trans. Commun. 40(3), 457–460 (1992). [CrossRef]
13. M. Adachi and F. Sawahashi, “Decision feedback multiple-symbol differential detection for M-ary DPSK,” Electron. Lett. 29(15), 1385–1387 (1993). [CrossRef]
14. F. Adachi and M. Sawahashi, “Decision feedback differential phase detection of M-ary DPSK signals,” IEEE Trans. Vehicular Technol. 44(2), 203–210 (1995). [CrossRef]
16. C. Yu, S. Zhang, P. Y. Kam, and J. Chen, “Bit-error rate performance of coherent optical M-ary PSK/QAM using decision-aided maximum likelihood phase estimation,” Opt. Express 18(12), 12088–12103 (2010). [CrossRef]
17. S. Zhang, P. -yuen Kam, C. Yu, and J. Chen, “Decision-aided carrier phase estimation for coherent optical communications,” J. Lightwave Technol. 28(11), 1597–1607 (2010). [CrossRef]
18. D. van den Borne, S. Calabro, S. L. Jansen, E. Gottwald, G. D. Khoe, and H. de Waardt, “Differential quadrature phase shift keying with close to homodyne performance based on multi-symbol phase estimation,” in OFC’05 Conference on Optical Fiber Communication (2005).
19. M. Nazarathy and Y. Yadin, “Approaching coherent homodyne performance with direct detection low-complexity advanced modulation formats,” in COTA’06 Coherent Optical Technologies and Applications (2006).
20. X. Liu, “Data-aided multi-symbol phase estimation for receiver sensitivity enhancement in optical DQPSK, CThB4,” in COTA’06 Coherent Optical Techniques and Applications (2006).
21. M. Nazarathy and Y. Atzmon, “Approaching coherent homodyne performance with direct detection low-complexity advanced modulation formats,” in COTA’08 Coherent Optical Techniques and Applications (2008).
23. M. Nazarathy, X. Liu, L. Christen, Y. K. Lize, and A. E. Willner, “Self-coherent multisymbol detection of optical differential phase-shift keying,” J. Lightwave Technol. 26(13), 1921–1934 (2008). [CrossRef]
25. J. Li, R. Schmogrow, D. Hillerkuss, M. Lauermann, M. Winter, K. Worms, C. Schubert, C. Koos, W. Freude, and W. J. Leuthold, “Self-coherent receiver for PolMUX coherent signals,” in OFC’11 Conference on Optical Fiber Communication, OWV5 (2011).
26. N. Kikuchi and S. Sasaki, “Highly sensitive optical multilevel transmission of arbitrary quadrature-amplitude modulation (QAM) signals with direct detection,” J. Lightwave Technol. 28(1), 123–130 (2010). [CrossRef]
27. N. Kikuchi, “Chromatic dispersion-tolerant higher-order multilevel transmission with optical delay detection,” in SPPCom’11 Signal Processing in Photonic Communications - OSA Technical Digest (2011).
28. S. Adhikari, S. L. Jansen, M. Alfiad, B. Inan, V. A. J. M. Sleiffer, A. Lobato, P. Leoni, and W. Rosenkranz, “Self-coherent optical OFDM : an interesting alternative to direct or coherent detection” in ICTON’11 13th International Conference on Transparent Optical Networks (2011).
29. S. Kumar, Impact of Nonlinearities on Fiber Optic Communications, (Springer, 2011).
30. N. Sigron, I. Tselniker, M. Nazarathy, A. Gorshtein, D. Sadot, and I. Zelniker, “Ultimate single-carrier recovery for coherent detection,” in OFC’11 Conference on Optical Fiber Communication, OMJ2 (2011).
31. M. Nazarathy, N. Sigron, and I. Tselniker, “Integrated carrier phase and frequency estimation for coherent detection based on multi-symbol differential detection (MSDD),” in SPPCom’11 Signal Processing in Photonic Communications - OSA Technical Digest, Invited paper SPMC1 (2011).
32. N. Kikuchi, S. Sasaki, and T. Uda, “Phase-noise tolerant coherent polarization-multiplexed 16QAM Transmission with digital delay-detection, in ECOC’11 European Conference of Optical Communication (ECOC), Tu.3.A (2011).
33. T. Adali and S. Haykin, Adaptive Signal Processing—Next Generation Solutions (John Wiley, 2010).
35. Y. Atzmon and M. Nazarathy, “A Gaussian polar model for error rates of differential phase detection impaired by linear, nonlinear, and laser phase noises,” J. Lightwave Technol. 27(21), 4650–4659 (2009). [CrossRef]