## Abstract

Soft-decision forward error correction (SD-FEC) schemes are typically designed for additive white Gaussian noise (AWGN) channels. In a fiber-optic communication system, noise may be neither circularly symmetric nor Gaussian, thus violating an important assumption underlying SD-FEC design. This paper quantifies the impact of non-AWGN noise on SD-FEC performance for such optical channels. We use a conditionally bivariate Gaussian noise model (CBGN) to analyze the impact of correlations among the signal’s two quadrature components, and assess the effect of CBGN on SD-FEC performance using the density evolution of low-density parity-check (LDPC) codes. On a CBGN channel generating severely elliptic noise clouds, it is shown that more than 3 dB of coding gain are attainable by utilizing correlation information. Our analyses also give insights into potential improvements of the detection performance for fiber-optic transmission systems assisted by SD-FEC.

© 2012 Optical Society of America

## 1. Introduction

Forward error correction (FEC) has become an essential element in fiber-optic communication systems [1,2]. It significantly reduces the required signal-to-noise ratio (SNR) and enhances the system’s tolerance to propagation impairments for reliable communication of information [3]. Traditionally, hard-decision (HD) FEC which relies on a 1-bit HD estimate per received bit has been employed in optics; more recently, soft-decision (SD) FEC, which is fed by additional multiple-bit information indicating the uncertainty about a HD estimate, is being investigated as a promising alternative as it provides a greater net coding gain (NCG) [1, 2]. While in the FEC design community the performance of FECs is usually characterized by the minimum SNR that allows a decoder to recover all transmitted information without any error, the optical communications community typically characterizes FEC performance by the maximum tolerable pre-FEC bit error ratio (BER), called the ‘FEC threshold’, that still allows the FEC to correct errors down to a very low value. For example, for 7% overhead HD-FEC with a NCG of 9.19 dB at a corrected BER of 10^{−15}, the pre-FEC BER should be lower than 3.8 × 10^{−3} [4], which is regarded as ‘error-free’ in the context of optical transport networks. Consequently, optical systems researchers usually measure the pre-FEC BER and claim error-free system operation when their measured BER is below the assumed FEC threshold. This approach, which has the benefit of not only including noise but capturing all signal impairments affecting the easily measured pre-FEC BER, works well for HD-FEC, since HD-FECs can be solely characterized by their pre-FEC BER provided that errors occur statistically independently. The latter can always be ensured through proper interleaving. However, systems using SD-FEC can no longer be characterized in such a uniparametric way using only the pre-FEC BER, since the probability density function (pdf) of the received signal affects the post-FEC BER; i.e., the same SNR (or the same pre-FEC BER) can yield very different decoder performance, depending on the shape of the noise clouds at the receiver. While some optical transmission systems perform like an AWGN channel [5–8], others do not [9–13], and the traditional FEC characterization method needs to be revisited for SD-FEC. For example, Fig. 1 gives a few simulated constellations of 112-Gb/s polarization-division-multiplexed (PDM) quadrature phase-shift keyed (QPSK) signals after propagation over different fiber links. The first constellation is the constellation in *x*-polarization of a 112-Gb/s PDM-QPSK signal after single-channel transmission over twenty 100-km dispersion-managed LEAF^{®} fiber spans at a launch power of 6-dBm [14]. The constellation of the *y*-polarization is similar. It is evident that self-phase modulation (SPM) generates a large phase distortion in the constellation. The second and third constellations are for *x*- and *y*-polarizations of a PDM-QPSK signal, respectively, co-propagating with 10-Gb/s on-off-keyed (OOK) signals over ten 100-km dispersion-managed standard single-mode fiber (SSMF) spans at a 50-GHz channel spacing and −1-dBm per channel launch power [15]. The large phase spreads in the constellations are generated by cross-phase modulation (XPM) from neighboring 10-Gb/s OOK channels. As the 10-Gb/s OOK channels are in *x*-polarization, the nonlinear phase distortions caused by XPM in *x*-polarization are much larger than those in *y*-polarization. Figure 1 shows that fiber nonlinearities can make a channel in an optical communication system very different from an AWGN channel.

From the above discussion, it is natural to ask fundamental questions about how far the actual SD-FEC performance would deviate from the one predicted under the AWGN assumption. Motivated by this problem, we model the fiber-optic channel as a conditionally bivariate Gaussian noise (CBGN) channel and quantify the effect of bivariate noise on the performance of SD decoding of low-density parity-check (LDPC) codes. First results were presented in [16], where two decoders independently corrected the in-phase (*I*) and quadrature (*Q*) components of a QPSK-modulated signal, respectively. In this paper, we further investigate the CBGN model, including the case where the *I* and *Q* components are processed by a single decoder in an interleaved manner such that the ‘weakest link’ problem of the two independent decoders is eliminated, i.e., such that the worst component of the noise no longer dominates system performance as in [16]. Our analysis of the CBGN channel gives insights on how to improve the performance of fiber-optic transmission systems assisted by SD-FEC.

## 2. Systems with conditionally bivariate Gaussian noise

#### 2.1. Channel model

The joint pdf of bivariate Gaussian noise (BGN) centered at (*μ _{I}*,

*μ*) has the form

_{Q}*σ*and

_{I}*σ*are the noise standard deviations along standard bases

_{Q}*I*and

*Q*, respectively, and

*ρ*is the correlation coefficient. The contours of constant probability densities in the BGN model form sets of ellipses, of which one axis is oriented at an angle

*α*, as shown in Fig. 2(a). This

*tilt angle*

*α*is computed as [17]

*eccentricity angle*0 ≤

*β*≤

*π*/2 such that

*σ*=

_{u}*σ*cos

*β*and

*σ*=

_{v}*σ*sin

*β*(see Fig. 2(b)). Note that the tilt angle

*α*is 45° when the variances

*σ*and

_{I}*σ*are equal, and the difference between the length of major and minor axes becomes smaller as the eccentricity angle

_{Q}*β*approaches 45°. In this work, therefore, by ‘tilted’ and ‘elliptic’ we mean that the angles

*α*and

*β*deviate from 45°, respectively. From the above, it follows that

*σ*,

_{I}*σ*and

_{Q}*ρ*of the bivariate joint pdf can be represented as functions of

*α*,

*β*and

*σ*.

Let us assume unit transmit signal power for QPSK modulation, as well as statistically identical noisy constellation clouds after
$\frac{\pi}{2}$*k*-radian rotation about the origin for any integer *k*. Then the distribution of received QPSK symbols can be formulated as

**R**= (

_{i}*I*,

_{i}*Q*) for 1 ≤

_{i}*i*≤ 4 is the received signal that was mapped to the

*i*-th quadrant of the constellation diagram at the transmitter, and $\mathcal{N}\hspace{0.17em}\left({\mu}_{I},{\mu}_{Q},\rho ,{\sigma}_{I}^{2},{\sigma}_{Q}^{2}\right)$ denotes the bivariate Gaussian distribution according to Eq. (1). This channel can be viewed as the CBGN channel, in which the mean, correlation, and variance of the received signal is conditional on the transmitted signal.

#### 2.2. Entropy considerations

The entropy, which is a measure of uncertainty, of a bivariate Gaussian pdf *f*(*I*, *Q*) is calculated as [18]

*σ*,

_{I}*σ*, and

_{Q}*ρ*of Eqs. (3) and (4) into Eq. (6), we can express the entropy as a function of

*α*,

*β*and

*σ*; i.e., The contour plots of

*σ*,

_{I}*σ*,

_{Q}*ρ*, and

*h*(

*f*), evaluated in the region {0 ≤

*α*,

*β*≤

*π*/4}, are depicted in Fig. 3. We notice from Eq. (7) and Fig. 3(d) that the entropy decreases as the noise cloud becomes more elliptic, but it is not affected by the tilt of ellipse. This is expected, since the amount of uncertainty of a given noise cloud does not depend on its location or orientation in the complex plane. Entropy is an important measure because it is used to derive the bounds on reliable communication; the capacity of an additive noise channel is defined by the supremum of

*h*(

*g*) –

*h*(

*f*), where

*h*(

*g*) denotes the entropy of the received signal and

*h*(

*f*) is the entropy of the noise calculated in Eq. (7). This definition holds also for the CBGN model considered in this work because the differential entropy of the noise is independent of the transmitted signal even when the noise itself depends on it. Therefore, as a consequence of smaller

*h*(

*f*), more elliptic noise clouds lead to higher capacity of the CBGN channel. This does not imply that we should intentionally make the noise cloud more elliptic using ‘additional’ DSP algorithms to increase the channel capacity, since any additional DSP must decrease the information contained in the received signal by the

*data processing inequality*[19]; e.g., it may increase

*σ*. Rather, it indicates that the DSP algorithms need to preserve possibly existing correlation information in the received signal for exploitation by optimized FEC decoders. Capacity-approaching FEC decoding should perform better for a more elliptic noise cloud but should not be affected by tilt.

#### 2.3. System model

In our previous work [16], we assumed two LDPC encoders generating codewords independently for the *I* and *Q* components of QPSK symbols. With that approach, the CBGN induced up to 3-dB SNR penalty in the decoding process compared to the AWGN. The worst case occurred when the noise was extremely tilted and elliptic such that its uncertainty concentrated on one of the two quadrature axes, and hence half of the signal power was wasted. Here, we investigate another coding scheme to resolve the ‘weakest link’ problem of [16]; we now use a single LDPC encoder to produce codewords and modulate two consecutive (odd and even) codeword bits onto a QPSK symbol in an interleaved manner, as illustrated in Fig. 4(a).

In this system model, upon observing the impaired signal constellation, the receiver performs digital signal processing (DSP) to isolate the four separable clouds (see Fig. 4(b)). The log-likelihood ratio (LLR) is then computed for each symbol based on its location and pdf, where the sign of the LLR indicates the HD estimate and the magnitude is the reliability of this HD estimate. In the following, two different approaches of LLR computation are studied: one computes the LLR based on the marginal pdfs (i.e., the pdfs integrated along the I and Q axes), and the other computes the LLR based on the joint pdf in the *I*–*Q* plane.

## 3. Performance analysis based on density evolution on the AWGN channel

At a fixed transmission rate, the Shannon limit of the AWGN channel is often expressed by the maximum noise power allowed for error-free transmission when unit signal power is provided. This ‘noise threshold’ *σ*^{*}, which is equivalent to the inverse SNR, can be obtained by the *density evolution* algorithm [20] when SD iterative message-passing decoding is used. The *belief-propagation* (BP) algorithm is one of the most powerful SD decoding algorithms to date [21–23], and its performance can be analyzed by density evolution. In this section, the influence of bivariate noise on SD-FEC is investigated using density evolution of LDPC codes under BP decoding.

As a first simple example illustrating density evolution, we consider a unit-power, binary phase-shift keyed (BPSK) signal with a symbol alphabet *c* ∈ {−1, +1}, transmitted over an AWGN channel with a zero-mean Gaussian noise distribution 𝒩(0,*σ*^{2}). Given that an amplitude *I* is observed at the receiver, the likelihood that *c* = +1 was transmitted is represented as the conditional probability *f*(*c* = +1|*I*). The
$\text{LLR}\hspace{0.17em}\mathcal{L}(c|I)=\text{log}\hspace{0.17em}\left[\frac{f(c=+1|I)}{f(c=-1|I)}\right]$ quantifies the log-ratio of probabilities that *c* = +1 was transmitted rather than *c* = −1, given the observation *I*. After computing the LLR for the received value *I*, we go with the (most likely) hypothesis that *c* = +1 was transmitted if (*c*|*I*) is positive, and *c* = −1 was transmitted otherwise. The reliability of this HD estimate is proportional to the absolute value of the LLR, |(*c*|*I*)|, and this is the key ingredient of the SD-FEC. By Bayes’ theorem, if *c* = +1 and *c* = −1 are equally probable,

*I*by a noise-dependent constant,

*f*(

*I*;

*μ*,

*σ*

^{2}) denotes the pdf of the Gaussian distribution with mean

*μ*and variance

*σ*

^{2}. Likewise, the LLRs of a QPSK-modulated signal are obtained as

In a BP decoder, the estimation of a received signal is updated iteratively by interchanging LLR messages between bits and parity-checks within the code structure [24]. The decoding iterations are repeated until the updated bits constitute a legitimate codeword or the iteration limit is reached. The initial messages are computed purely based on the observation and therefore they are linearly proportional to the signal amplitude in the AWGN, as previously described. The *extrinsic information* is then generated in each round of iterations based on parity-check constraints and is used as intermediate messages that are combined with the *a priori information* (i.e., initial messages). Since this information combining leads to *a posteriori information*, the BP decoding is optimal if the code length is sufficiently large [25].

The density evolution algorithm assumes that only all-zero codewords are transmitted; i.e., all symbol alphabets are *c* = +1 for BPSK modulation, and hence the LLRs of all codeword bits should have positive sign in the absence of errors. For identical pdfs on both constellation points of BPSK, this is a legitimate approach due to the symmetry conditions of the underlying channel and decoding algorithm. As the codewords are corrupted by AWGN on the channel, the density evolution algorithm initializes the LLR messages with a Gaussian pdf centered at 2/*σ*^{2}. Then, by proper calculations imposed by the BP decoding rule, the algorithm keeps track of the evolution of the pdf of LLR messages passed during the BP decoding process as illustrated in Fig. 5(a). If the fraction of negative LLR messages *ε* approaches zero (e.g., less than 10^{−15}) after sufficiently many iterations, the density evolution algorithm claims that the transmitted codeword is recovered with probability 1, and thus the current noise variance is considered to allow for error-free decoding. The same procedure continues with increasing noise variance, and eventually the threshold *σ*^{*} is obtained when the first non-zero error probability is encountered. Figure 5(b) shows the flowchart of density evolution, where a sufficiently small *σ* should initially be input to the algorithm such that at least one trial satisfies the condition *ε* → 0. For the AWGN channel, the noise thresholds of various LDPC codes have been obtained in this way [22, 23].

## 4. SD-FEC for CBGN channels

#### 4.1. Decoding with marginal and joint probabilities

The LLR results in a much more complicated expression in the CBGN channel than in the AWGN channel. If *f _{i}*(

*I*,

*Q*) denotes the joint pdf of

**R**in Eq. (5), the marginal pdf

_{i}*f*(

_{i}*I*) (or

*f*(

_{i}*Q*), respectively) is obtained by integrating

*f*(

_{i}*I*,

*Q*) over

*Q*(or over

*I*, respectively). Suppose Gray-labeling for the QPSK modulation, then the LLRs are derived from the single argument of marginal pdfs as:

*c*

_{odd}and

*c*

_{even}denote the odd and even bits of the transmitted codeword sequence, respectively. If we utilize the joint pdf, the LLRs are computed with two quadrature components as

*f*. Note that if we use the conventional AWGN model, the marginal pdf-based and the joint pdf-based LLRs show no difference because both of them are eventually reduced to Eq. (10).

_{i}#### 4.2. Density evolution on the CBGN channel

The threshold of the BP decoding on the CBGN channel is obtained using density evolution in this section, where the marginal pdf decoding is only considered due to the complexity of the joint pdf decoding. Assume FEC encoding with a code rate *R*, i.e., with a coding overhead (1 –*R*)/*R*. Then, given unit transmit signal power, the energy per information symbol is *E _{s}* = 1/

*R*and the energy per information bit is

*E*= 1/(2

_{b}*R*) for QPSK. The two-sided noise power spectral density is given by ${N}_{0}/2={\sigma}_{I}^{2}={\sigma}_{Q}^{2}$, and the two-dimensional noise variance is ${\sigma}^{2}={\sigma}_{I}^{2}+{\sigma}_{Q}^{2}={N}_{0}$. We therefore have the SNR per bit as

*E*/

_{b}*N*

_{0}= 1/(2

*Rσ*

^{2}), and equivalently

*σ*

_{1}and

*σ*

_{2}in Eq. (5) can be calculated from Eqs. (3) and (13) for given

*E*/

_{b}*N*

_{0}and

*R*. Finally, the marginal pdf-based LLRs in Eq. (11) can be obtained using

*σ*

_{1}and

*σ*

_{2}. Figure 6(a) shows these LLRs for several values of

*β*and at fixed

*α*= 5.6°. It can clearly be seen that the LLRs are not proportional to the signal’s amplitude, as is the case for AWGN. The LLRs can even decrease with increasing signal amplitude for small

*β*in some regions to produce local minima. In this case, the function (

*c*|

*I*) is a many-to-one function; e.g., when

*β*= 11.3°, multiple

*I*’s ∈ [−1.11, −0.69] ∪ [0.69, 1.11] yield the same (

*c*|

*I*) with some quantization errors involved in the numbers. Given the pdf of the received amplitude

*I*, the pdf of the many-to-one function (

*c*|

*I*) is computed by [17] Depicted in Fig. 6(b) are the resulting pdfs, where some abrupt peaks are observed for small

*β*. The curve for

*β*= 11.3° shows two small spikes at = −5.27 and −4.98 because multiple

*I*’s contribute to ∈ [−5.27, −4.98] but only one

*I*contributes to = −5.27

^{−}and −4.98

^{+}. By the same reason, the curve has another two spikes in the positive region at = 4.98 and 5.27, where

*f*() = 3.86 and 8.17, respectively, which are not shown in the figure. When such a peaky pdf is initially fed to the density evolution algorithm, numerical problems occur and the procedure fails to provide an accurate noise threshold. This is because the density evolution operates not in a continuous space but in a discretized space [20] for implementation feasibility. Therefore, in this work, we only analyze angles that do not cause numerical problems; i.e., clouds having

*α*and

*β*both smaller than 21° (extremely elliptic and tilted clouds are not considered).

The channel symmetry condition of BPSK-AWGN needs to be slightly modified to perform the density evolution algorithm for the QPSK-CBGN as follows: the channel is *output-symmetric*, i.e., *f*(*I* = *q*|*c*_{odd} = 0) = *f*(*I* = −*q*|*c*_{odd} = 1) = *f*(*Q* = *q*|*c*_{even} = 0) = *f*(*Q* = −*q*|*c*_{even} = 1). With this modification the behavior of the decoder can be predicted by transmission of all-zero codewords for the QPSK-CBGN. The other symmetry assumptions such as variable node symmetry and check node symmetry, and the ideal conditions including infinite code length, infinite message representation and infinite decoding time [22] are applicable without modification.

Figure 7 shows the noise threshold *σ*^{*} and the corresponding SNR per bit threshold
${E}_{b}/{N}_{0}^{*}$ obtained using discretized density evolution for the rate-0.8 (4, 20) regular LDPC code ensemble. The noise threshold varies depending on the degree distribution of the underlying code, and thus it is possible to find and use the optimal degree distribution having the maximum noise threshold. In this work, however, we exclude the effect of choice of codes but only investigate the threshold variation induced by the CBGN channel parameters using the (4, 20)-regular degree distribution. In our experiments, approximately 0.56 dB of *E _{b}*/

*N*

_{0}penalty is observed to achieve error-free transmission under very elliptic and tilted noise distributions. Compared to the result presented in [16], in which penalties of more than 1.9 dB occur within the same range of

*α*and

*β*given the same noise power, the interleaved transmission of codeword bits effectively relieves the weakest link problem. The channel parameters indicated by circle markers in Fig. 7 are used for Monte Carlo simulation, as will be described in Section 4.3 in more detail, to assess the performance of practical BP decoders; the corresponding noise clouds are shown in Fig. 8 for reference.

It can be seen from Fig. 7 that with marginal pdf decoding, the performance of SD-FEC is reduced as the noise clouds are tilted, while the entropy is independent of tilt as shown in Fig. 3(d). This is because the correlation information between the *I* and *Q* components of the signal is not utilized in the marginal pdf decoding. In order to approach capacity, the entire information contained in the observations must be exploited in the decoding process. Joint pdf decoding (i.e., decoding based on Eq. (12)) utilizes all this information, including the correlation of elliptic noise; e.g., consider a signal observed in the first quadrant of the leftmost constellation in Fig. 8(a). Here, the joint pdf *f*(*I*, *Q*) implies that *I* is more likely to be small when *Q* is large than when *Q* is small, whereas the marginal pdf *f*(*I*) produces the same probability of *I* regardless of *Q*. For the case of joint pdf decoding, the density evolution is even more complicated than that for marginal pdf decoding since the pdf of the LLR is now a two-argument function. Therefore, instead of using a complicated density evolution analysis, we demonstrate the superiority of joint pdf decoding empirically based on Monte Carlo simulations.

#### 4.3. Monte Carlo simulations

A rate-0.8 (4, 20) regular LDPC code is used for BP decoding Monte Carlo simulations. As in the density evolution, the all-zero codeword transmission facilitates performance estimation in our simulations. Given a unit-power all-zero codeword, noise is added randomly such that the received QPSK symbols form the proper elliptic clouds. The decoding simulations are repeated until 10^{7} bits are decoded at low SNR and until at least 100 post-FEC bit errors are gathered at high SNR.

The BER performance of marginal pdf BP decoding influenced by bivariate noise is illustrated in Fig. 9. It can be seen that marginal pdf decoding leads to poorer BER performance as *β* decreases (i.e., as the noise becomes more elliptic), as predicted by the density evolution. The SNR loss due to ellipticity exceeds 1 dB at a BER of 10^{−6}. Recall that the SNR loss predicted by density evolution is approximately 0.56 dB. The difference between analysis and simulation may stem from that the code length is limited in practice while the analysis assumes infinite code length. Tilted noise clouds also show a similar tendency of performance degradation. In Fig. 10, we compare the results of joint pdf BP decoding to those of marginal pdf decoding. For a tilt angle of *α* = 21°, more elliptic clouds (smaller *β*) lead to better joint pdf decoding performance in contrast to the gradually worse performance of marginal pdf decoding. For the two decoding schemes, the gap between the SNRs to attain a BER of 10^{−6} reaches beyond 3 dB in the extreme cases considered here. When the eccentricity *β* is fixed, a greater tilt does not deteriorate the joint pdf decoding performance. There is a general coincidence between the simulation results of joint pdf decoding in Fig. 10 and the entropy contours in Fig. 3(d). This evidences that the joint pdf is fully utilizing the information contained in the received signal. The post-FEC BER is shown in Fig. 11 as a function of pre-FEC BER. For the case of marginal pdf decoding, it can be seen that the post-FEC BER is (almost) a function of only the pre-FEC BER and is independent of tilt and eccentricity even if we employ SD-FEC. The joint pdf decoding, however, leads to significantly different post-FEC BER from the same pre-FEC BER depending on tilt and eccentricity. More specifically, AWGN-like noise clouds result in worse joint pdf decoding performance than the tilted elliptic noise clouds. From this, we can deduce that the conventional system requirements specified by the pre-FEC BER are still valid for SD-FEC if the ‘FEC threshold’ is calculated for the AWGN channel and the noise is distributed with a family of Gaussian pdfs. But if the noise has correlation between its two quadrature components (or between amplitude and phase), this requirement is excessive and the fiber-optic system can be overdesigned.

## 5. Conclusion

The performance of fiber-optic communication systems has traditionally been characterized by measuring a certain level of pre-FEC BER and deducing error-free post-FEC performance under the assumption of the characteristics of certain FECs on the AWGN channel. This simple technique proved to be a powerful tool under the HD-FEC assumption, but as FECs are moving to soft decision, it can be insufficient. In this paper, we analyzed the SD-FEC for non-AWGN channels, specifically for bivariate Gaussian noise having tilted elliptical contours representing the noise pdf. We showed that the shape of the noise clouds significantly affects the system’s detection performance for SD decoding. We further gave some insights on how to improve SD-FEC performance for non-AWGN channels. In our system model, the encoded codeword bits are mapped to a modulated symbol in an interleaved manner such that the worse component of the observation does not dominate the system performance. With these noise and system models, we analyzed two different decoding schemes, one based on marginal pdfs and the other based on the full joint pdf of the received signal. Analyses and simulations showed that the decoding performance is significantly influenced by noise shapes and that the joint pdf BP decoding is vastly superior for elliptic and tilted noise; e.g., the joint pdf led to greater than 3 dB of coding gain in extreme cases compared to the marginal pdfs. A precise computation of the LLR is key to maximally utilizing information and thereby approaching channel capacity. Therefore, if an adaptive look-up table can be constructed to provide accurate LLRs for arbitrary noise clouds without relying on theoretic assumptions such as AWGN and CBGN, SD-FEC will perform best under any actual channel impairments. Our work suggests the need of redesigning conventional DSP algorithms that have been optimized for HD criteria towards a maximally information-preserving direction; conventional DSP algorithms may unintentionally shape the noise clouds into AWGN-like clouds, which destroys useful information for SD-FEC.

## Acknowledgments

This work was partially supported by the the Seoul R&BD Program ( WR080951) funded by Seoul Metropolitan Government. We acknowledge valuable discussions with S. Ten Brink, Xiang Liu, and S. Chandrasekhar.

## References and links

**1. **F. Chang, K. Onohara, and T. Mizuochi, “Forward error correction for 100 G transport networks,” IEEE Comm. Mag. **48**, S48–S55 (2010). [CrossRef]

**2. **T. Mizuochi, Y. Miyata, K. Kubo, T. Sugihara, K. Onohara, and H. Yoshida, “Progress in soft-decision FEC,” in Proc. Optical Fiber Communication Conference (OFC’11), NWC2 (2011).

**3. **P. J. Winzer, M. Pfennigbauer, and R.-J. Essiambre, “Coherent crosstalk in ultra-dense WDM systems,” IEEE J. Lightwave Technol. **23**, 1734–1744 (2005). [CrossRef]

**4. **ITU-T Recommendation G.975.1, 2004, Appendix I.9.

**5. **A. Carena, G. Bosco, V. Curri, P. Poggiolini, M. T. Taiba, and F. Forghieri, “Statistical characterization of PM-QPSK signals after propagation in uncompensated fiber links,” in Proc. European Conference on Optical Communication (ECOC’10), P4.07 (2010). [CrossRef]

**6. **P. Poggiolini, A. Carena, V. Curri, G. Bosco, and F. Forghieri, “Analytical modeling of nonlinear propagation in uncompensated optical transmission links,” IEEE Photon. Technol. Lett. **23**, 742–744 (2011). [CrossRef]

**7. **X. Liu, S. Chandrasekhar, P. J. Winzer, S. Draving, J. Evangelista, N. Hoffman, B. Zhu, and D. W. Peckham, “Single coherent detection of a 606-Gb/s CO-OFDM signal with 32-QAM subcarrier modulation using 4×480-Gsamples/s ADCs,” in Proc. European Conference on Optical Communication (ECOC’10), PD2.06 (2010). [CrossRef]

**8. **X. Liu, S. Chandrasekhar, P. J. Winzer, B. Zhu, D. W. Peckham, S. Draving, J. Evangelista, N. Hoffman, C. J. Youn, Y. H. Kwon, and E. S. Nam, “3×485-Gb/s WDM transmission over 4800 km of ULAF and 12×4100-GHz WSSs using CO-OFDM and single coherent detection with 80-GS/s ADCs,” in Proc. Optical Fiber Communication Conference (OFC’11), JThA37 (2011).

**9. **C. Dorrer, “High-speed measurements for optical telecommunication systems,” IEEE J. Sel. Top. Quantum Electron. **12**, 843–8582006. [CrossRef]

**10. **C. Xie, “Suppression of inter-channel nonlinearities in WDM coherent PDM-QPSK systems using periodic-group-delay dispersion compensators,” in Proc. European Conference on Optical Communication (ECOC’09), P4.08 (2009).

**11. **C. R. Doerr, L. Zhang, and P. J. Winzer, “Monolithic InP multiwavelength coherent receiver using a chirped arrayed waveguide grating,” J. Lightwave Technol. **29**, 536–5412011. [CrossRef]

**12. **P. J. Winzer, A. H. Gnauck, C. R. Doerr, M. Magarini, and L. L. Buhl, “Spectrally efficient long-haul optical networking using 112-Gb/s polarization-multiplexed 16-QAM,” J. Lightwave Technol. **28**, 547–5562010. [CrossRef]

**13. **A. H. Gnauck, P. J. Winzer, A. Konczykowska, F. Jorge, J.-Y. Dupuy, M. Riet, G. Charlet, B. Zhu, and D. W. Peckham, “Generation and transmission of 21.4-Gbaud PDM 64-QAM using a novel high-power DAC driving a single I/Q modulator,” J. Lightwave Technol . (to be published).

**14. **C. Xie and R.-J. Essiambre, “Electronic nonlinearity compensation in 112-Gb/s PDM-QPSK optical coherent transmission systems,” in Proc. European Conference on Optical Communication (ECOC’10), Mo.1.C.1 (2010). [CrossRef]

**15. **C. Xie, “Impact of nonlinear and polarization effects on coherent systems,” in Proc. European Conference on Optical Communication (ECOC’11), We.8.B.1 (2011).

**16. **J. Cho, C. Xie, and P. J. Winzer, “Performance of soft-decision FEC in systems with bivariate Gaussian noise distributions,” in Proc. European Conference on Optical Communication (ECOC’11), to appear (2011).

**17. **A. Leon-Garcia, *Probability and random processes for electrical engineering*, 2nd ed. (Addison-Wesley, 1993).

**18. **N. A. Ahmed and D. V. Gokhale, “Entropy expressions and their estimators for multivariate distributions,” IEEE. Trans. Inf. Theory **35**, 688–692 (1989). [CrossRef]

**19. **T. M. Cover and J. A. Thomas, *Elements of Information Theory*, 2nd ed. (Wiley-Interscience, 2005). [CrossRef]

**20. **S. -Y. Chung, J. G. D. Forney, T. Richardson, and R. Urbanke, “On the design of low-density parity-check codes within 0.0045 dB of the Shannon limit,” IEEE Commun. Lett. **5**, 58–60 (2001). [CrossRef]

**21. **R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory **IT-8**, 21–28 (1962). [CrossRef]

**22. **T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Trans. Inf. Theory **47**, 599–618 (2001). [CrossRef]

**23. **T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Trans. Inf. Theory **47**, 619–637 (2001). [CrossRef]

**24. **F. R. Kschischang, B. J. Frey, and H-A. Loeliger, “Factor graphs and the sumproduct algorithm,” IEEE Trans. Inf. Theory **47**, 498–519 (2001). [CrossRef]

**25. **R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inf. Theory **IT-27**, 533–547 (1981). [CrossRef]