## Abstract

We exploit pilot-aided (PA) transmission enabled by single-sideband (SSB) subcarrier modulation of both quadrature signals in the DSP domain to achieve fully feedforward carrier recovery (FFCR) in single-carrier (SC) coherent systems with arbitrary *M*-QAM constellations. A thorough mathematical description of the proposed PA-FFCR is presented, its linewidth tolerance is assessed by simulations and compared to other FFCR schemes in literature. Also, implementation and complexity issues of PA-FFCR are presented and briefly compared with other CR schemes. Simulation results show that PA-FFCR performs close to the best known CR technique in the literature with less computation complexity. Quantitatively, for 1 dB optical-signal-to-noise-ratio (OSNR) penalty at BER = 3.8 × 10^{−3}, PA-FFCR tolerates linewidth-symbol-duration products (Δ*f*.*T _{s}*) of 1.5 × 10

^{−4}(4-QAM), 4 × 10

^{−5}(16-QAM) and 1 × 10

^{−5}(64-QAM). Finally, we propose the use of maximum likelihood (ML) phase estimation next to pilot phase compensation. This significantly improves tolerable Δ

*f*.

*T*values to 7.5 × 10

_{s}^{−4}(4-QAM), 1.8 × 10

^{−4}(16-QAM) and 3.5 × 10

^{−5}(64-QAM). It turns out that PA-FFCR with ML always performs better or at least the same compared to other CR techniques known in literature with lower complexity in addition to the fact that pilot information can be as well exploited for tasks other than CR e.g., fiber nonlinearity compensation, with no extra complexity.

© 2011 OSA

## 1. Introduction

The ever-increasing data rate demand driven by new Internet applications and the exponential growth of electronic processing speeds are the main drivers behind coherent optical communications research [1,2]. Because application-specific integrated circuits (ASICs) can now operate at speeds commensurate with optical line rates, a DSP-based coherent transceiver can pre-compensate or post-compensate transmission impairments by processing the in-phase and quadrature (I and Q) signals on both polarizations [3–10]. Also, coherent detection combined with *M*-ary quadrature amplitude modulation (*M*-QAM) constellations can meet the growing need for higher spectral efficiencies in future optical transport systems [11]. Quadrature phase shift keying (QPSK) combined with polarization division multiplexing (PDM) has been the prevailing candidate for both 40 Gb/s and 100 Gb/s systems [12]. With 100 Gb/s systems recently commercialized and standardized [13], research is directed at scaling per-channel bit rates beyond 100 Gb/s, with the goal to develop optical interfaces for the next Ethernet standards, likely at 400 Gb/s and 1 Tb/s [11]. One way to support these high bit rates in a single-carrier (SC) scheme while maintaining moderate baud rates is to increase the *M*-QAM constellation order (e.g., 16-QAM, 64-QAM), thus maintaining reasonable bandwidth requirements for all electronic components. To achieve 400 Gb/s and 1Tb/s, 16-QAM, 32-QAM and 64-QAM are being considered. For example, in [14], Winzer *et al*. transmitted a 448 Gb/s PDM 16-QAM signal (400 Gb/s with 12% forward error correction overhead) over 1200 Km of ultra large area fiber (ULAF) at 56 Gbaud. In [15], Zhou *et al*. transmitted 8 *×* 450 Gb/s PDM 32-QAM signals over 400 Km of ultra large area fiber (ULAF) at 8.37 b/s/Hz spectral efficiency. Also in [16], Gnauck *et al*. transmitted a 21.4 Gbaud PDM 64-QAM signal over 400 Km of ULAF.

These highly dense *M*-QAM constellations have inherently stringent laser linewidth requirements because of the small distances between adjacent constellation points and hence, high performance linewidth tolerant carrier recovery (CR) algorithms are needed. As shown in [17], feedback-based CR schemes cannot fulfill these linewidth requirements because of inevitable feedback delays when the algorithm is implemented in a parallelized and pipelined architecture for real-time operation. Thus, the authors in [17] proposed a feed-forward carrier recovery (FFCR) scheme based on a blind phase search (BPS) algorithm for *M*-QAM constellations. BPS provides the best known linewidth tolerance in the literature at the expense of high computation complexity. Moreover, the algorithm's complexity increases as the QAM order (*M*) increases. Other FFCR algorithms with less complexity also exist in the literature (e.g., Viterbi and Viterbi phase estimation (VVPE) for QPSK [18] and QPSK partitioning for 16-QAM [19]), but with reduced linewidth tolerance. Recently, we proposed the use of pilot-aided (PA) transmission to compensate jointly laser phase noise (PN) and fiber nonlinearity (NL) in a SC PDM-QPSK coherent system [20]. We presented preliminary results for improved NL tolerance; however, linewidth tolerance was not investigated.

In this paper, we focus on laser PN compensation and extend the idea of PA transmission for FFCR in SC coherent transmission systems employing *arbitrary M*-QAM constellations. Pilot tone insertion is enabled by single-sideband (SSB) subcarrier modulation of both I and Q baseband signals in the DSP domain. We present the principles of the PA-FFCR scheme aided by a detailed mathematical description. The linewidth tolerance of the proposed PA-FFCR is evaluated through simulations for various *M*-QAM formats and compared to BPS, Viterbi and Viterbi phase estimation (VVPE) and QPSK partitioning. We find that at a 1 dB optical-signal-to-noise-ratio (OSNR) penalty at a BER of 3.8 × 10^{−3}, PA-FFCR tolerates linewidth-symbol-duration products Δ*f*.*T _{s}* of 1.5 × 10

^{−4}(4-QAM), 4 × 10

^{−5}(16-QAM) and 1 × 10

^{−5}(64-QAM) which is close to BPS but with less computation complexity. Finally, we propose using maximum likelihood (ML) phase estimation next to pilot phase compensation. With small additional complexity, this significantly improves tolerable values of Δ

*f*.

*T*to 7.5 × 10

_{s}^{−4}(4-QAM), 1.8 × 10

^{−4}(16-QAM) and 3.5 × 10

^{−5}(64-QAM). With low complexity, these tolerable Δ

*f*.

*T*values are better than, or similar to, the ones provided by other CR schemes in the literature. We note that pilot information can also be exploited for tasks other than CR including fiber nonlinearity compensation [20], with no additional complexity.

_{s}## 2. Principles of the non-PA SC coherent transmission system

#### 2.1 System architecture

A DSP-based SC coherent transmission system is shown in Fig. 1 . It is comprised of a DSP-based transmitter (Tx), an optical channel and a DSP-based receiver (Rx). It should be noted that the architecture in Fig. 1 is totally generic in the sense that it can accommodate any modulation format, pulse shape, baud rate and whether or not PDM is used.

The Tx-DSP tasks may vary from system to system to the extent that it can be totally omitted if we were not to pre-process the transmitted symbols before transmission. Of course, this would reduce the overall complexity of the system; however, it will also affect the system performance by reducing the flexibility of being able to do some pre-processing at the Tx-side. For example, a Tx-side DSP may perform pulse shaping for the transmitted symbols which allows to spectrally engineer the transmitted spectrum to improve the performance. A raised cosine (RC) pulse shape provides a compact spectrum with minimum out-of-band power and zero intersymbol interference (ISI) [21]. Often, the RC pulse shaping filter is split into two matched root-raised cosine (RRC) filters at the Tx and Rx DSPs to maximize the signal-to-noise-ratio (SNR) in an additive white Gaussian noise (AWGN) channel. For illustration, Fig. 2a shows the spectrum of the I-component on one polarization of a 224 Gb/s PDM 16-QAM signal at 28 Gbaud with an RRC pulse shape with a roll-off factor of 1. Other pulse shapes specifically proposed for the fiber channel also exist in the literature such as the one described in [22], which improves the system nonlinear (NL) tolerance. In addition to pulse shaping, the Tx-DSP may partially pre-compensate some of the fiber channel transmission impairments, e.g., chromatic dispersion (CD), which also improves the system NL tolerance [23]. Finally, other Tx-DSP tasks might include pre-equalization of the frequency response of optical filters, pre-compensation of the nonlinear transfer function of the Mach-Zehnder I-Q modulator, and NL pre-compensation. After the Tx-DSP, I-Q modulator(s) and polarization optics are used to modulate the two orthogonal light polarizations and the resulting PDM signal is launched into the transmission system.

At the Rx, a coherent front-end integrates polarization beam splitters (PBSs), an optical hybrid, a local oscillator and balanced photodetectors to provide four signals corresponding to the I and Q components on both polarizations. These baseband signals are then sampled at their Nyquist rate by ADCs and processed by the Rx-DSP. The Rx-DSP compensates for any imperfections in the Rx optical front-end, fiber channel transmission impairments (e.g., residual CD, polarization mode dispersion (PMD)) and fiber NL impairments. It can also perform matched filtering if pulse shaping is done at the Tx-DSP. Finally, it will perform frequency offset estimation and carrier phase recovery (CR). A detailed review of the Rx-DSP tasks can be found in [4].

In this paper, we are mainly interested in laser PN and various CR techniques. Since laser PN is caused by the fact that the instantaneous phases of the Tx and local oscillator lasers are not locked to one another, fiber channel impairments will be out of the scope of the paper. Hence, in the following section, we present the system mathematical formulation for the optical back-to-back configuration.

#### 2.2 System mathematical formulation and laser phase noise model

Figure 3
shows the canonical system model in presence of fiber linear time invariant (LTI) impairments, laser PN and AWGN. Throughout the paper, we only consider one field polarization since laser PN affects both polarizations similarly. If PDM is employed, CR can be applied independently on each polarization without any loss of performance. In Fig. 3, the *k*^{th} information symbol${s}_{k}={x}_{k}+j{y}_{k}$, where ${x}_{k},{y}_{k}\in \left\{-\sqrt{M}+1,\mathrm{...},-1,1,\mathrm{...},\sqrt{M}-1\right\}$ are the real and imaginary parts for square *M*-QAM constellations, is first pulse shaped by a bandlimited pulse shaping filter with impulse response *h _{ps}*(

*t*). Without loss of generality, an RRC pulse shaping filter with a roll-off factor of 1 whose

*h*(

_{ps}*t*) is given in [21] is assumed in all our simulations; however, the proposed PA-FFCR scheme should work equally well for other pulse shapes. The shaped signal

*s*(

*t*) is then filtered by another filter with an impulse response

*h*(

_{pre-comp}*t*) to pre-compensate any LTI transmission impairments. Next, the transmitted signal

*s*(

_{tr}*t*) is impaired by the transmitter laser PN at the electrical-to-optical (E/O) up-conversion stage which corresponds to a phase rotation of ${e}^{j{\varphi}_{Tx}(t)}$ in the complex plane. The transmitted signal then passes through the fiber channel with a response

*h*(

_{f}*t*), and the AWGN

*N*(

*t*) is then added prior to the Rx front-end. At the Rx, optical-to-electrical (O/E) down-conversion is performed which also impairs the signal with an extra rotation ${e}^{j{\varphi}_{Rx}(t)}$ from the local oscillator laser PN. The filter

*h*(

_{post-comp}*t*) is used to equalize the residual LTI impairments of the fiber channel

*h*(

_{f}*t*). Next, a filter with a response

*h*(

_{MF}*t*) matched to

*h*(

_{ps}*t*) is applied. Finally, the output signal

*s*(

_{o}*t*) is sampled at every symbol duration

*T*which produces the received symbol

_{s}*s*'. As mentioned earlier, we assume an optical back-to-back configuration which is equivalent to removing

_{k}*h*(

_{f}*t*) from the system model in Fig. 3 and hence,

*h*(

_{pre-comp}*t*) and

*h*(

_{post-comp}*t*) can be subsequently removed. Doing so, the signal

*r*(

*t*) after O/E conversion can be expressed as

Then, the output signal *s _{o}*(

*t*) can be written as

*h*(

_{MF}*t*) which is justified by the short duration of the impulse response

*h*(

_{MF}*t*). Finally after sampling at the optimum zero-ISI point, the received symbol

*s*' can be related to the transmitted one

_{k}*s*as follows

_{k}In Eq. (3), it is apparent that Tx and Rx PN contributions are lumped into one phase rotation term summing their individual phases. It should be noted that this is an approximation of the real scenario since Tx and Rx lasers cannot be lumped together because of CD. From this point on, we define the combined Tx and Rx laser PN as $\varphi (k{T}_{s})={\varphi}_{Tx}(k{T}_{s})+{\varphi}_{Rx}(k{T}_{s})$, which can be modeled as a Wiener process [3]

In Eq. (4), the increments *f _{i}* are independent and identically distributed Gaussian random variables with zero mean and variance

*f*is the sum of the linewidths of the Tx and local oscillator lasers.

## 3. Principles of the proposed PA-FFCR with and without ML phase estimation in a SC coherent transmission system

#### 3.1 Principles of PA-FFCR

The main idea behind the proposed PA scheme is to insert a pilot tone at the middle of the transmitted spectrum of the SC signal. After transmission, the pilot tone will be impaired by laser PN from both Tx and Rx lasers and will acquire a phase shift that, in a noise-free environment, should be the same as the PN acquired by the data symbols. At the Rx, the pilot is filtered out and its phase is determined. Assuming we know the phase reference of the transmitted pilot tone, the Rx can calculate the extra phase acquired due to laser PN, and hence can correct this phase in the data symbols as well. Prior to the detailed analysis, it should be noted that the proposed PA-FFCR does not require any extra hardware added to the SC system shown in Fig. 1; however, different tasks will be carried out by Tx and Rx DSPs.

Figure 4
shows a block diagram representation of the tasks performed by the Tx and Rx DSPs in the PA-FFCR scheme. In the figure, a discrete-time index *n* is adopted while we keep using a continuous-time variable *t* throughout the mathematical formulation. This is justified by the fact that discrete-time processing at the Nyquist rate is equivalent to continuous-time processing [24]. Similar to the non-PA system, the Tx-DSP first applies a pulse shape *h _{ps}*(

*t*) to the complex transmitted symbols

*s*. Then, in order to enable the pilot insertion, a spectral gap is opened at the middle of the spectrum of the SC signal. First, both quadrature signals $x(t)={\displaystyle {\sum}_{k=-\infty}^{\infty}{x}_{k}{h}_{ps}(t-k{T}_{s})}$ and $y(t)={\displaystyle {\sum}_{k=-\infty}^{\infty}{y}_{k}{h}_{ps}(t-k{T}_{s})}$ are separately SSB modulated on a subcarrier having a frequency

_{k}*f*as follows:

_{sc}In Eq. (7), $\widehat{x}(t)$ and $\widehat{y}(t)$are obtained by passing *x*(*t*) and *y*(*t*), respectively, through a Hilbert transformer with an impulse response ${h}_{Hilbert}(t)=\frac{1}{\pi t},-\infty <t<\infty $, and a frequency response ${H}_{Hilbert}(f)$ given by

Then, the spectra of the two SSB modulated signals *x _{SSB}*(

*t*) and

*y*(

_{SSB}*t*) can be easily written in terms of the baseband spectra

*X*(

*f*) and

*Y*(

*f*), and the unit-step function

*u*(

*f*) as

Following SSB subcarrier modulation, the Tx-DSP adds the pilot tone to both *x _{SSB}*(

*t*) and

*y*(

_{SSB}*t*), which fits in the gap at the middle of their spectra. The two signals after pilot insertion, denoted by

*x*(

_{SSB,PA}*t*) and

*y*(

_{SSB,PA}*t*), are written in terms of

*x*(

_{SSB}*t*) and

*y*(

_{SSB}*t*) as

*P*is the pilot tone power which is a design parameter set according to the required pilot-to-signal power ratio (

_{pilot}*PSR*) defined as

*PSR*(dB) = 10log

_{10}(

*P*/

_{pilot}*P*). For illustration, Fig. 2b shows the spectrum of the 28 Gbaud 16-QAM signal whose spectrum was shown in Fig. 2a, after SSB subcarrier modulation with

_{signal}*f*= 500 MHz (1.7% bandwidth overhead) and an inserted pilot with

_{sc}*PSR*= −14 dB. Also, a zoomed version of the spectral region around the pilot is shown in Fig. 2c. Then,

*x*(

_{SSB,PA}*t*) and

*y*(

_{SSB,PA}*t*) drive the two arms of the I-Q modulator where the complex transmitted signal

*s*(

_{tr}*t*) =

*x*(

_{SSB,PA}*t*) +

*jy*(

_{SSB,PA}*t*) can be written as

Hence, *s _{tr}*(

*t*) comprises data symbols and frequency multiplexed pilot symbols as can be seen by the illustrative constellations in Fig. 4a. Referring back to Fig. 3, one can see that both data and pilot terms in

*s*(

_{tr}*t*) will acquire the same combined PN from Tx and Rx lasers and the signal

*r*(

*t*) after O/E conversion can be written as

At the Rx, the pilot tone is filtered using a Gaussian low-pass filter (LPF) having a 3 dB bandwidth *B _{LPF}*. As long as

*f*and

_{sc}*B*are carefully chosen, there will be no spectral overlap between the pilot tone and the SSB subcarrier modulated data. Hence, the output of the Gaussian LPF will be equal to the pilot term in Eq. (13) in addition to some filtered noise. Then, the instantaneous pilot phase ${\varphi}_{pilot}(t)$can be written as follows:

_{LPF}*f*= 2 MHz is shown in Fig. 2d, where noise loading is performed to set the OSNR level to 16.7 dB (corresponding to 1 dB penalty at BER = 3.8 × 10

^{−3}). To further clarify how PN broadens the spectrum of the pilot, a zoomed version of the pilot spectral gap is shown in Fig. 2e, where we also show the response of a Gaussian LPF with

*B*= 80 MHz. In Fig. 4b, we show the corresponding constellation of the filtered pilot tone. The Rx can now easily use $({\varphi}_{pilot}(t)-\pi /4)$ as an estimate $\widehat{\varphi}(t)$ for the combined laser PN $\varphi (t)$, where the estimation error depends on ${\varphi}_{n}(t)$. Then, the Rx de-rotates the data term in Eq. (13) according to $\widehat{\varphi}(t)$. The following tasks are fairly straightforward and involve SSB subcarrier demodulation of the pilot compensated data signal, matched filtering, picking one sample per symbol to produce the received symbols after pilot compensation ${s}_{k}^{PA}$, and symbol decision to produce an estimate ${\widehat{s}}_{k}^{PA}$ depending on which

_{LPF}*M*-QAM constellation employed. Up to this point, PA-FFCR has been completed and ${\widehat{s}}_{k}^{PA}$ are the final symbol decisions. In the next subsection, we present a possible performance improvement by using ML phase estimation to provide more accurate symbol decisions.

#### 3.2 Principles of PA-FFCR with ML phase estimation

In order to refine the PN estimate $\widehat{\phi}(t)$obtained from the pilot phase, the decided symbol ${\widehat{s}}_{k}^{PA}$ after pilot phase compensation along with the symbol ${s}_{k}^{PA}$ before the decision are fed into a second stage where ML phase estimation is employed. The ML phase estimate is calculated as shown by the block enclosed by dotted lines in Fig. 4b. It evaluates an estimate ${\varphi}_{k}^{ML}$ for the residual phase noise uncompensated by the pilot phase as follows:

Theoretical details of ML phase estimation can be found in [21]. As seen in Eqs. (15) and (16), the ML estimate is calculated based on how much the original symbols after pilot phase compensation ${s}_{k}^{PA}$ are rotated from their designated constellation points ${\widehat{s}}_{k}^{PA}$ found after decision where averaging over 2*N _{ML}* symbols is done to reduce the estimation noise. The ML phase estimation will work well because most of the PN has been already compensated by the pilot phase, and hence the decided symbols ${\widehat{s}}_{k}^{PA}$ are most likely correct. As presented in the next section, the ML stage significantly improves the linewidth tolerance over the case where only pilot phase is compensated (PA-FFCR).

## 4. Simulation parameters, results and discussion

We divide this section into three subsections. In the first, we present the parameters used for our simulation that was conducted on MATLAB R2010a. We also show how some design parameters (e.g., *PSR* and *B _{LPF}*) should be optimized to guarantee the best performance from the proposed PA-FFCR scheme. Secondly, we show the linewidth tolerance results of both PA-FFCR and PA-FFCR with ML compared to other CR schemes. Finally, we investigate how the finite bit resolution of both DACs and ADCs affects the performance.

#### 4.1 Simulation parameters and optimization

In our simulations, a fixed baud rate of 28 Gbaud is used for all *M*-QAM formats which corresponds to a bit rate of (28 × *p* × log_{2}*M*) Gb/s where *p* = 2 or 1 depending on whether or not PDM is used. Hence, when Δ*f.T _{s}* is needed to be swept, we sweep Δ

*f*while maintaining the symbol duration constant. All our BER measurements are based on the simulation of 240,000 symbols. Furthermore, noise loading is used at the Rx to control the OSNR level. As mentioned earlier, two RRC filters with roll-off factors of 1 are used at the Tx and Rx. Each RRC filter is implemented in the time domain as a 32-tap FIR filter with

*T*/2 tap spacing. Finally, OSNR penalties for all

_{s}*M*-QAM formats are evaluated from the difference between the actual and theoretical required OSNR, obtained as in [17], to achieve a BER = 3.8 × 10

^{−3}. All OSNR values are evaluated based on a 0.1 nm reference bandwidth.

For the design parameters of PA-FFCR, we set the subcarrier frequency *f _{sc}* to 500 MHz in all our simulations, which corresponds to 1.7% bandwidth overhead.

*f*is an important parameter that has to be properly set to ensure that PA-FFCR performs well while maintaining a low bandwidth overhead.

_{sc}*f*= 500 MHz is a good value since it is large enough so that data and pilot symbols do not spectrally overlap while at the same time, it results in a small bandwidth overhead and ensures that the filtered noise after the Gaussian LPF is not too high to adversely affect the estimate $\widehat{\varphi}(t)$. The two remaining design parameters, namely

_{sc}*PSR*and

*B*, are also crucial and should be optimized in order to guarantee an optimum performance. First,

_{LPF}*PSR*determines how much power is allocated for the pilot tone. It should be chosen large enough to ensure that noise does not severely mask the pilot while at the same time, it should be small enough to maintain a sufficient signal power compared to noise. Second,

*B*needs to be small enough to ensure that the filtered noise does not adversely affect $\widehat{\varphi}(t)$ while at the same time, it should be large enough to pass the useful pilot phase information. Figure 5a illustrates how the values of

_{LPF}*PSR*and

*B*affect the performance of the system: we show in Fig. 5a two surfaces of the BER values versus both

_{LPF}*PSR*and

*B*are shown for Δ

_{LPF}*f.T*= 3 × 10

_{s}^{−5}(lower surface) and 1 × 10

^{−4}(upper surface). For the two surfaces, 16-QAM is taken as an example and a 16.7 dB OSNR (corresponding to 1 dB penalty) is used. For Δ

*f.T*= 3 × 10

_{s}^{−5}, optimum values of

*PSR*and

*B*are found to be −17 dB and 50 MHz, respectively, resulting in a BER = 2.1 × 10

_{LPF}^{−3}; for Δ

*f.T*= 1 × 10

_{s}^{−4}, optimum values of

*PSR*and

*B*are −15 dB and 120 MHz, respectively, resulting in a BER = 3 × 10

_{LPF}^{−3}. The main observation from Fig. 5a is that as Δ

*f.T*increases, the optimum values of

_{s}*PSR*and

*B*also increase. This is justified by the fact that as Δ

_{LPF}*f.T*increases, the pilot tone will carry much more useful information about the increased PN and its spectrum will be broader; hence, larger

_{s}*PSR*and

*B*should be chosen. Thus, for all our upcoming results, optimum values of

_{LPF}*PSR*and

*B*are always used.

_{PLF}#### 4.2 Phase noise tolerance

In this subsection, the performance of both PA-FFCR and PA-FFCR with ML is assessed and compared to other CR schemes in terms of linewidth tolerance for *M*-QAM constellations. Figures 5b, 5c and 5d show the OSNR penalty as a function of Δ*f.T _{s}* for QPSK, 16-QAM and 64-QAM respectively. The CR schemes that are compared are PA-FFCR, PA-FFCR with ML and BPS [17] for all

*M*-QAM formats, whereas VVPE [18] and QPSK partitioning [19] work only for QPSK and 16-QAM, respectively. As noticed in all three figures, PA-FFCR with or without ML outperforms the other techniques for small values of Δ

*f.T*since at these PN levels, most of the OSNR penalty in case of BPS, VVPE and QPSK partitioning is imposed by differential encoding used to remove the

_{s}*π*/4 angle ambiguity. On the other hand, there is no need to differentially encode the data in a PA system because there is no angle ambiguity. Also, as Δ

*f.T*increases, it is clear how much the ML phase estimation improves the linewidth tolerance when added to PA-FFCR. Table 1 shows the maximum tolerable Δ

_{s}*f.T*, defined as the tolerable Δ

_{s}*f.T*at 1 dB OSNR penalty, for all techniques and corresponding tolerable linewidth values Δ

_{s}*f*at 28 Gbaud. Clearly, either PA-FFCR, PA-FFCR with ML or BPS allow all

*M*-QAM formats up to 64-QAM to be implemented with inexpensive DFB lasers having linewidths in the range of 100 KHz < Δ

*f*< 10 MHz. As also observed from Table 1 or the three figures, PA-FFCR with ML always performs better than BPS (or similar in case of 64-QAM) which provides the best known linewidth tolerance in the literature. This excellent performance of PA-FFCR with ML comes with less computation complexity compared to BPS as will be explained in the next section.

#### 4.3 Effect of finite resolution of DACs and ADCs

In this subsection, we study the effect of finite bit resolution of both the DACs used after the Tx-DSP to drive the I-Q modulators and the ADCs used after the Rx coherent front-end on the performance of our proposed scheme. We consider only the PA-FFCR with ML for our results. Figure 6a
shows the OSNR penalty versus Δ*f.T _{s}* for the PA-FFCR with ML scheme for all

*M*-QAM formats in two cases: infinite resolution (same as the previous results in Fig. 5) and 6-bit DAC resolution. Similarly, Fig. 6b shows the performance when using a 6-bit ADC compared to the infinite resolution case. As expected, we observe that the OSNR penalty imposed by the finite resolution of DACs or ADCs is negligible for the QPSK case and starts to increase as the QAM order increases. Quantitatively, using a 6-bit DAC or ADC quantizes the analog voltages into 64 levels which leads to tolerable Δ

*f.T*values of 7.5 × 10

_{s}^{−4}for QPSK, 1.5 × 10

^{−4}for 16-QAM and 1.3 × 10

^{−5}for 64-QAM compared to the values in Table 1 for the infinite resolution case. However, it should be noted that the increasing penalty due to the finite DAC/ADC resolution as the QAM order increases originates from the inherent resolution requirement to represent the multilevel I and Q signals and not from a limitation of our scheme. This is justified by observing that the additional penalty imposed by using finite DAC/ADC resolution is not dependent on the laser PN, i.e. Δ

*f.T*, and it is even lower as Δ

_{s}*f.T*increases which indicates that it is not caused by the CR scheme.

_{s}## 5. Implementation and complexity issues of PA-FFCR

Figures 4a and 4b will be used to highlight some implementation and complexity issues of the proposed PA-FFCR scheme (with or without ML). As shown in Fig. 4, the complexity is distributed between the Tx and Rx sides and hence we address each of them separately. Finally, we briefly compare the overall complexity order of our proposed algorithm with BPS [17] since it is the only FFCR algorithm in the literature that works for all *M*-QAM formats with a performance comparable to our proposed technique. More importantly as shown in our work in [20], it should be carefully noted that the filtered pilot phase can be as well exploited to mitigate fiber NL impairments in addition to CR with the same complexity that will be presented hereafter.

At the Tx in Fig. 4a, the PA-FFCR scheme involves SSB subcarrier modulation and pilot insertion for each quadrature signal *x*[*n*] and *y*[*n*] separately. As indicated by Eq. (6), SSB modulation requires evaluating the complex analytic signals *x ^{+}*[

*n*] and

*y*[

^{+}*n*] and frequency up-converting them to a subcarrier frequency

*f*. As noticed in Eq. (7),

_{sc}*x*[

^{+}*n*] and

*y*[

^{+}*n*] have real parts equal to

*x*[

*n*] and

*y*[

*n*] and do not need to be computed as they are already available; the imaginary parts $\widehat{x}[n]$ and $\widehat{y}[n]$ can be evaluated by applying a Hilbert transformer to

*x*[

*n*] and

*y*[

*n*]. A Hilbert transformer can be implemented as type III or IV FIR filter having a phase response equal to

*π*/2 plus a linear portion corresponding to the filter group delay as given in [24]. Via simulations, we find that an FIR filter of order

*N*= 250 to 300 is sufficient for our system. Such a filter is most efficiently implemented in the frequency domain by the overlap-and-add or overlap-and-save methods [24] which requires complex multiplications and complex additions to evaluate the fast Fourier transform (FFT) of the signals

_{Hilbert}*x*[

*n*] and

*y*[

*n*] and then multiply their spectra

*X*[

*k*] and

*Y*[

*k*] by the frequency response of the filter

*H*[

_{Hilbert}*k*]. Since

*x*[

*n*] and

*y*[

*n*] are real signals,

*X*[

*k*] and

*Y*[

*k*] exhibit conjugate symmetry and hence, can be evaluated using FFT algorithms specialized for real signals. As indicated in [25], such real-valued FFT algorithms can be performed with roughly half the number of multiplications and additions compared to FFT algorithms for complex signals. Hence, the complexity of the two required real-valued FFT is equivalent to one complex-valued FFT. One other way to calculate the analytic signals $\widehat{x}[n]$ and $\widehat{y}[n]$ is to use the method described in [26], which is based on calculating the one sided FFT. Next, to do the frequency up-conversion, the evaluated spectra of the analytic signals

*x*[

^{+}*k*] and

*y*[

^{+}*k*] are shifted by the right number of frequency bins according to the desired

*f*. Finally, the pilot is added and any possible pre-compensation of LTI impairments is done. As noticed, the complexity of PA-FFCR at the Tx depends on whether or not pre-compensation of LTI impairments has been already employed in the non-PA system. If frequency domain pre-compensation of LTI impairments is already used, the required FFT/IFFT operations to carry out the Hilbert filtering will not add to the complexity of the system as they already exist; however, if no pre-compensation was used the required FFT/IFFT will add to the overall system complexity. Hence, for an FFT size of

_{sc}*N*, the additional number of operations (complex multiplications and additions) required to perform SSB and insert the pilot are

*O*(

*N*log

_{2}

*N*) assuming no pre-compensation is used in the non-PA system; whereas, the additional number of operations will be minimal if pre-compensation has already been used in the non-PA system.

At the Rx in Fig. 4b, the pilot tone is first filtered out by a Gaussian LPF. This can be done in either two ways. First, one can use a separate analog Gaussian LPF to filter out the pilot. Alternatively, it can be digitally implemented as an FIR filter with a number of taps that depends on *B _{LPF}* and the sample rate. For

*B*around 80 MHz and 56 GSa/s sampling rate, 300 taps is found to be sufficient. This can be done in the frequency domain by the overlap-and-add or overlap-and-save methods with

_{LPF}*O*(

*N*log

_{2}(

*Nβ*)) operations where the fraction

*β*is the ratio between the pilot spectral gap and the overall signal bandwidth which takes into account that only the frequency bins around the pilot spectral band are evaluated [27]. The remaining part of the Rx involves Hilbert filtering, frequency down-conversion and post-compensation, all of which can be done in the frequency domain as in the Tx side. However, the situation is different from the Tx since complex FFT/IFFT operations are certainly used in the non-PA system for frequency domain post-compensation of transmission impairments such as CD. For the PA system, the complex FFT is split into two separate real FFT operations for the I and Q signals with roughly the same complexity [25]. Thus, the additional complexity order at the Rx will be the one needed for filtering the pilot which equals

*O*(

*N*log

_{2}(

*Nβ*)). If ML phase estimation is used, additional

*O*(

*N*) multiplications and additions and one look-up table to evaluate the

*tan*

^{−1}function are needed. Finally, one or two decisions are needed depending on whether or not ML phase estimation is used.

In comparison with BPS [17], our PA-FFCR (with or without ML) scheme requires multiplications and additions of the same (or even larger) order. The real complexity reduction comes from the massive reduction in the number of decisions, comparators, selectors and look-up tables required. For BPS, 32 decisions are needed per output symbol in case of QPSK and 16-QAM, whereas 64 decisions are required for 64-QAM. Our scheme requires only one (without ML) or two decisions (with ML) per symbol independent of the QAM order. Also, no comparators and selectors are needed in PA-FFCR compared to the large number required in BPS. Taking into account the complexity of implementing one decision block especially for higher order *M*-QAM which requires numerous comparisons to determine the correct decision region, and the extra capability of our PA-FFCR to jointly compensate laser PN and fiber NL [20], the merit of using our technique becomes apparent especially for high order QAM formats.

## 6. Conclusion

In this paper, we demonstrated how PA transmission can be used for FFCR in SC coherent transmission systems employing arbitrary *M*-QAM constellations. We showed how the pilot tone is inserted at the middle of the spectrum of the SC signal by SSB subcarrier modulation of both I and Q signals at the Tx. We explained by the aid of mathematical analysis how the pilot tone is extracted at the Rx and used for CR. By simulations, the linewidth tolerance of the proposed PA-FFCR was assessed and compared to various CR schemes. In addition, we showed how introducing ML phase estimation next to pilot phase compensation improves the tolerable Δ*f*.*T _{s}* values to 7.5 × 10

^{−4}(4-QAM), 1.8 × 10

^{−4}(16-QAM) and 3.5 × 10

^{−5}(64-QAM) which are better or at least the same (for 64-QAM) compared to all CR techniques in literature. Also, the impact of finite DAC/ADC resolution on the performance of our scheme was studied. It was found that with either a 6-bit DAC or a 6-bit ADC, our PA-FFCR with ML scheme provides tolerable Δ

*f*.

*T*values of 7.5 × 10

_{s}^{−4}(4-QAM), 1.5 × 10

^{−4}(16-QAM) and 1.3 × 10

^{−5}(64-QAM). Finally, we highlighted some implementation and complexity issues of different DSP tasks required at both Tx and Rx in our scheme. Finally, we conclude that our proposed scheme can excellently mitigate laser PN, and operates for all

*M*-QAM formats at the expense of a slight bandwidth overhead.

## Acknowledgments

This research was supported in part by the Natural Sciences and Engineering Research Council (NSERC) Canada via the CREATE program on Next-Generation Optical Networks.

## References and links

**1. **E. Ip and J. M. Kahn, “Fiber impairment compensation using coherent detection and digital signal processing,” J. Lightwave Technol. **28**(4), 502–519 (2010). [CrossRef]

**2. **S. J. Savory, “Coherent detection—why is it back?” in *The 20th Annual Meeting of the IEEE Lasers and Electro-Optics Society, 2007. LEOS 2007*(IEEE/LEOS, 2007), pp. 212–213.

**3. **E. Ip, A. P. Lau, D. J. Barros, and J. M. Kahn, “Coherent detection in optical fiber systems,” Opt. Express **16**(2), 753–791 (2008). [CrossRef]

**4. **S. J. Savory, “Digital coherent optical receivers: algorithms and subsystems,” IEEE J. Sel. Top. Quantum Electron. **16**(5), 1164–1179 (2010). [CrossRef]

**5. **M. G. Taylor, “Coherent detection method using DSP for demodulation of signal and subsequent equalization of propagation impairments,” IEEE Photon. Technol. Lett. **16**(2), 674–676 (2004). [CrossRef]

**6. **E. Ip and J. M. Kahn, “Digital equalization of chromatic dispersion and polarization mode dispersion,” J. Lightwave Technol. **25**(8), 2033–2043 (2007). [CrossRef]

**7. **E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightwave Technol. **26**(20), 3416–3425 (2008). [CrossRef]

**8. **G. Li, “Recent advances in coherent optical communication,” Adv. Opt. Photonics **1**(2), 279–307 (2009). [CrossRef]

**9. **S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express **16**(2), 804–817 (2008). [CrossRef]

**10. **M. G. Taylor, “Phase estimation methods for optical coherent detection using digital signal processing,” J. Lightwave Technol. **27**(7), 901–914 (2009). [CrossRef]

**11. **P. J. Winzer, “Beyond 100G Ethernet,” IEEE Commun. Mag. **48**(7), 26–30 (2010). [CrossRef]

**12. **K. Roberts, M. O'Sullivan, K.-T. Wu, H. Sun, A. Awadalla, D. Krause, and C. Laperle, “Performance of dual-polarization QPSK for optical transport systems,” J. Lightwave Technol. **27**(16), 3546–3559 (2009). [CrossRef]

**13. **IEEE Std 802.3baTM-2010*,* Amendment 4: Media Access Control Parameters, Physical Layers, and Management Parameters for 40 Gb/s and 100 Gb/s Operation.

**14. **P. J. Winzer, A. Gnauck, S. Chandrasekhar, S. Draving, J. Evangelista, and B. Zhu, “Generation and 1200-km transmission of 448-Gb/s ETDM 56-Gbaud PDM 16-QAM using a single I/Q modulator,” in *2010 36th European Conference and Exhibition on Optical Communication (ECOC)* (2010), paper PD2.2.

**15. **X. Zhou, L. E. Nelson, P. Magill, B. Zhu, and D. W. Peckham, “8x450-Gb/s,50-GHz-spaced, PDM-32QAM transmission over 400km and one 50GHz-grid ROADM,” in *National Fiber Optic Engineers Conference*, OSA Technical Digest (CD) (Optical Society of America, 2011), paper PDPB3.

**16. **A. H. Gnauck, P. Winzer, A. Konczykowska, F. Jorge, J. Dupuy, M. Riet, G. Charlet, B. Zhu, and D. W. Peckham, “Generation and transmission of 21.4-Gbaud PDM 64-QAM using a high-power DAC driving a single I/Q modulator,” in *National Fiber Optic Engineers Conference*, OSA Technical Digest (CD) (Optical Society of America, 2011), paper PDPB2.

**17. **T. Pfau, S. Hoffmann, and R. Noe, “Hardware-efficient coherent digital receiver concept with feedforward carrier recovery for *M*-QAM constellations,” J. Lightwave Technol. **27**(8), 989–999 (2009). [CrossRef]

**18. **A. J. Viterbi and A. M. Viterbi, “Nonlinear estimation of PSK-modulated carrier phase with application to burst digital transmission,” IEEE Trans. Inf. Theory **29**(4), 543–551 (1983). [CrossRef]

**19. **I. Fatadin, D. Ives, and S. J. Savory, “Laser linewidth tolerance for 16-QAM coherent optical systems using QPSK partitioning,” IEEE Photon. Technol. Lett. **22**(9), 631–633 (2010). [CrossRef]

**20. **M. H. Morsy-Osman, L. R. Chen, and D. V. Plant, “Joint mitigation of laser phase noise and fiber nonlinearity using pilot-aided transmission for single-carrier systems,” in *in 37th European Conference and Exposition on Optical Communications*, OSA Technical Digest (CD) (Optical Society of America, 2011), paper Tu.3.A.3.

**21. **J. G. Proakis, *Digital Communications*, 4th ed. (McGraw-Hill, New York, 2001).

**22. **B. Chatelain, C. Laperle, D. Krause, K. Roberts, M. Chagnon, X. Xu, A. Borowiec, F. Gagnon, J. C. Cartledge, and D. V. Plant, “SPM-tolerant pulse shaping for 40- and 100-Gb/s dual-polarization QPSK systems,” IEEE Photon. Technol. Lett. **22**, 1641–1643 (2010).

**23. **Y. Benlachtar, S. J. Savory, B. C. Thomsen, G. Gavioli, P. Bayvel, and R. I. Killey, “Robust long-haul transmission utilizing electronic precompensation and MLSE equalization,” in *National Fiber Optic Engineers Conference,* OSA Technical Digest Series (CD) (Optical Society of America, 2007), paper JWA52.

**24. **A. Oppenheim and R. Shafer, *Discrete-Time Signal Processing*, 2nd ed. (Prentice-Hall, New Jersey, 1999).

**25. **H. Sorensen, D. Jones, M. Heideman, and C. Burrus, “Real-valued fast Fourier transform algorithms,” IEEE Trans. Acoust. Speech Signal Process. **35**(6), 849–863 (1987). [CrossRef]

**26. **S. L. Marple Jr., “Computing the discrete-time `analytic' signal via FFT,” IEEE Trans. Signal Process. **47**(9), 2600–2603 (1999). [CrossRef]

**27. **H. Sorensen and C. Burrus, “Efficient computation of the DFT with only a subset of input or output points,” IEEE Trans. Signal Process. **41**(3), 1184–1200 (1993). [CrossRef]