## Abstract

We study the performance of a simple optical receiver for use with polarization multiplexed transmission in the presence of polarization dependent loss (PDL). The receiver is based upon filtering of each of the channels with a polarizer that is orthogonal to the other channel, such that interference due to loss of orthogonality is avoided at the expense of a reduction in the detected signal to noise ratio. In spite of its simplicity, this receiver is shown to perform almost as well as the optimal maximum likelihood receiver, and much better than receivers that are based on conventional polarization splitting.

© 2009 Optical Society of America

## 1. Introduction

In modern optical communication systems, the combination of coherent transmission with advanced signal processing methods allows, in principle, complete elimination of distortions that are induced by unitary optical effects [1]. Indeed, in recent experiments, excellent suppression of the effects of chromatic dispersion and of polarization mode dispersion (PMD) using electronic post-processing, has been demonstrated [2],[3]. Polarization dependent loss (PDL) is a non-unitary phenomenon and therefore it cannot be eliminated by signal processing, even in principle. The impairments that this phenomenon introduces into optical systems are unavoidable and they set an ultimate limit to performance. The effect of PDL is particularly important in polarization multiplexed systems, where in addition to the distortion of the signal to noise ratio [4],[5], the orthogonality between the two launched optical signals is lost at the end of the system [6]. This loss of orthogonality leads to a severe increase in the complexity of the optimal receiver.

In a recent publication [7] the reduction in performance of coherent polarization multiplexed systems as a result of PDL has been evaluated. There, the goal was to find the ultimate (i.e. smallest possible) performance penalty in the presence of PDL and for that purpose, an optimal receiver (one in which a maximum likelihood decision is made based on the joint detection of signals in both polarizations) had to be assumed. In practice, the implementation of such, joint detection, receivers can be quite complicated. In the case of M-ary modulation on each of the two polarizations (one of *M* possible symbols is transmitted in each symbol duration), the overall number of constellation points is *M*^{2} and they reside in a 4 dimensional space (consisting of two quadratures in two polarizations). The optimal receiver needs to check the Euclidean distance between each of the *M*^{2} constellation points and the received sample, in order to decide upon the symbol that is most likely to have been transmitted. With the kind of symbol rates that are common in optical communications, such computational complexity at the receiver becomes prohibitive even with moderate values of *M* that have recently become feasible in optical communications [8]. Therefore, more practical schemes for the detection of PDL distorted symbols are required.

In this paper we consider the performance of a practical detection scheme for polarization multiplexed systems in the presence of PDL, [9],[10]. This scheme is illustrated in Fig. 1 and it is based on disjoint detection of the two polarization signals. Thus, the complexity of each receiver scales like *M* as opposed to *M*^{2} in the optimal receiver case. The considered scheme is based on splitting the received light with an ordinary 3dB splitter, such that each of the identical splitter outputs reaches its own receiver after passing through an appropriately aligned polarizer. In order to detect signal #1 in the top receiver, the polarizer preceding it is aligned to be orthogonal to signal #2 so that interference caused by loss of orthogonality is avoided. Similarly, the bottom polarizer is aligned so as to reject signal #1. Since one may legitimately assume that the system is limited by optical noise and not by the noise of the electronic receivers, the 3dB splitting has no effect on performance. In what follows we assess the PDL induced penalty in systems using the detection scheme of Fig. 1 and show that its inferiority to the optimal (joint detection) receiver is almost unnoticeable with relevant levels of average link PDL. We also illustrate the large advantage of the scheme in Fig. 1 relative to the brute-force detection approach exercised in systems today. There, interference between the two polarizations is not prevented and the penalties are considerably higher than what can be achieved with the type of detection considered in this paper. Note that the implementation of the scheme in Fig. 1 can also be done with advanced signal processing as described in [11].

## 2. Theory

#### 2.1. The general case

Two independent streams of digital information are launched into the fiber over two orthogonal polarizations of light. The complex envelope of the electric field entering the fiber is given by

*e*

_{1}and

*e*

_{2}are two orthogonal unit Jones vectors representing the launch states of polarization. The terms

*a*

_{k,1}and

*a*

_{k,2}are complex numbers that represent the digital information carried by the

*k*-th symbol in each of the two polarization channels. The pulse-shape is given by

*f*(

*t*) and

*ξ*is the pulse energy. In order to avoid overburdening the notation, we ignore for now possible differences in optical phase and timing between the two streams of data, as such differences are immaterial in digital systems of the kind considered here. The signal emanating from the receiver end of the fiber-optic link is given by

**T**

*is a 2*

_{j}*x*2 complex matrix that describes the transmission from the output of the

*j*-th amplifier to the end of the last span. Consequently,

**T**

_{0}is the transmission matrix from the transmitter side to the end of the entire link. In Eq. (2) and throughout this manuscript, we use underlined letters to denote Jones vectors (as in

*s*(

*t*)) and boldface letters to denote transfer matrices. The terms

*n*

^{(j)}denote Jones vectors representing the noise of the inline optical amplifiers. The two components of

*n*

*are statistically independent complex, circular white Gaussian noise processes, whose power densities are equal to $\genfrac{}{}{0.1ex}{}{{P}_{n}}{2{N}_{s}}$, with*

_{j}*N*being the overall number of spans. Thus, the total accumulated noise power density at the end of the entire link, in the absence of PDL, would be equal to

_{s}*P*. Note that each noise contribution is propagated through the remaining part of the system, as indicated by the multiplication of the noise vectors by the matrices

_{n}**T**

*. In the process of being transmitted through the system, the polarization states of the two data-carrying channels change from*

_{j}*e*

_{1}and

*e*

_{2}into the unit vectors

*e*

_{2}), whereas the bottom polarizer is aligned along ${\underset{\_}{p}}_{2}={\underset{\_}{e}}_{1,\mathit{out}}^{\perp}$, so that the polarizer blocks the interfering waveforms from reaching the receiver. Denoting as in [7], the receiver filter by

*h*(

*t*), and assuming synchronous sampling at the instants

*t*=

*kT*, we find that the decision variables obtained by the two receivers are the two components of the vector

*r*

*= (*

_{k}*r*

_{k,1},

*r*

_{k,2})

*, with ${r}_{k,i}={\int}_{T}\left[{\underset{\_}{p}}_{i}^{\u2020}\underset{\_}{r}\left(t\right)\right]h\left(-t\right)\text{d}t$, which after substitution of*

^{T}*r*(

*t*) from Eq. (2), are given by

*i*= 1 and 2 corresponding to the top and bottom receivers of Fig. 1, respectively. Additionally in Eq. (4),

*n*

^{(j)}=∫

_{T}*n*

^{(j)}(

*t*)

*h*(−

*t*)d

*t*is a Gaussian vector whose components are independent identically distributed circular Gaussian random variables with variance equal to

*P*/2

_{n}*N*. The term

_{s}*κ*is the overlap integral

*κ*= ∫

*(*

_{T}f*t*)

*h*(−

*t*)d

*t*and it assumes its maximum value of 1 for the case of matched filtering, i.e. when

*h*(

*t*) =

*f*(−

*t*). In order to simplify the notation, the index

*k*denoting the number of the symbol that is being detected, will be omitted in what follows. The statistical properties of the noise are represented by its coherency matrix which can be expressed in the form

*P*, passed through a system with polarization independent loss

_{n}*g*′ and with a (backward [4]) PDL vector Γ⃗′. Thus, these two quantities

*g*′ and Γ⃗′ are the effective PDL parameters experienced by the overall noise contribution. It is easy to show that Γ⃗′ is what would be registered as the Stokes vector of the noise when measured by a standard polarization analyzer, and that its modulus is the noise degree of polarization. The details of how

*g*′ and Γ⃗′ are related to the local PDL parameters are given in [7]. Note that for all practical considerations, the noise vector ${\sum}_{j=1}^{{N}_{s}}{\mathbf{T}}_{j}{\underset{\_}{n}}^{\left(j\right)}$ is exactly equivalent to Λ

^{1/2}

*ñ*, with

*ñ*denoting a normalized random Gaussian vector with two independent components whose variances are equal to 1. In order to assess the performance of the system, one needs to determine the variance of the noise term appearing in Eq. (4). Straightforward substitution suggests that the noise variance in (4) is given by $\u3008{\underset{\_}{\tilde{n}}}^{\u2020}{\mathrm{\Lambda}}_{n}^{1/2}{\underset{\_}{p}}_{i}{\underset{\_}{p}}_{i}^{\u2020}{\mathrm{\Lambda}}_{n}^{1/2}\underset{\_}{\tilde{n}}\u3009$ where the angled brackets denote ensemble averaging. After some algebraic manipulation,whose details are given in Appendix B, we find that

*p̂*is the unit Stokes vector corresponding to the transmitting axis of the polarizer, whose unit Jones vector was earlier denoted by

_{i}*p*

*. In the present scheme, using the definitions following Eq. (3),*

_{i}*p̂*

_{1}= −

*ê*

_{2,out}and

*p̂*

_{2}= −

*ê*

_{1,out}. The normalized decision vector can now be expressed as with

*ñ*being a standard Gaussian vector with statistically independent zero mean and unit variance components, and with the components of

*ã*= (

*ã*

_{1},

*ã*

_{2})

*being*

^{T}*ξ*(equivalent to reduction in the OSNR) the PDL induced penalty is equivalent. For each of the two receivers (

*i*= 1,2), this quantity is given by the absolute square value of the term appearing in parenthesis in Eq. (8). Following some algebraic manipulations, the equivalent OSNR penalty in the

*j*’th receiver is given by

*ê*is the unit Stokes vector that corresponds to the unit Jones vector

_{i}*e*

*. The transition to the right-most form on the right-hand side of Eq. (9) involves some intense use of Pauli matrix algebra, but its correctness can be tested on an explicit choice of transfer matrix*

_{i}**T**

_{0}(yielding Γ⃗

_{0}) and launch polarizations

*e*

_{1,2}. A reasonable definition for the penalty of the entire link is the worst between the penalties of the two channels. Namely

*η*= min {

*η*

_{1},

*η*

_{2}}. The presence of PDL is manifested in the value of

*η*being smaller than unity. To see that, note that when there is no PDL,

*g*′ =

*g*

_{0}= 1, Γ⃗

_{0}= Γ⃗′ = 0 and

*p̂*=

_{i}*ê*

_{i,out}. The PDL induced penalty in decibels is given by

*η*

_{dB}= −10 log

_{10}(

*η*). Equation (9) is very general in the sense that it does not assume anything regarding the magnitude of PDL in the link, the distribution of the PDL and the way in which it accumulates. Nor does it assume anything about the modulation format that is being transmitted. It does not even rely on whether coherent, or incoherent detection is used. When modeling a system numerically, a set of local PDL vectors is generated, leading to a unique set of transfer matrices

**T**

*with*

_{j}*j*= 0 to

*j*=

*N*, from which the overall PDL parameters Γ⃗

_{s}_{0}, Γ⃗′

*g*′ and

*g*

_{0}can be readily obtained as described in [7].

The scheme represented in Fig. 1, is implemented entirely in the optical domain and apart from polarization tracking for the identification of signal polarizations, which is done in all polarization multiplexed schemes, it requires no particular electronic processing. This is in contrary to the case of optimal (joint) detection that can only be implemented in the electronic domain. Nevertheless, it is quite obvious that this very scheme (of Fig. 1) can be readily implemented in the electrical domain, whenever the hardware of coherent detection is in place. Then, it can be naturally integrated with the powerful signal processing tools that are used for the elimination of accumulated chromatic dispersion and PMD.

#### 2.2. Practical range of average PDL values

A significant simplification of the general result (9) follows once we concentrate on typical values of average PDL in links. Reasonable, or acceptable, PDL values are those for which the occurrence of performance (Q-factor, or eye) penalties much in excess of ∼ 2dB is unlikely. More specifically, “unlikely” means that its probability is lower than the allowed outage probability, which is usually quoted as being equal to 4 × 10^{−5}. As was shown in [7], this requirement is flagrantly violated once the average link PDL exceeds quite moderate values of the order of 2dB. In this regime, using the small PDL approximation (see appendix A), we expand (9) as

_{0})

^{⊥}is the component of the Stokes vector (Γ⃗′ – Γ⃗

_{0}) that is orthogonal to the Stokes vector

*ê*representing the launched state of polarization. It is also shown in appendix A that (1 –

_{i}*g*′) is of second order in the values of the local vectors of PDL. Notice that apart of the term ((Γ⃗′ – Γ⃗

_{0})

^{⊥})

^{2}Eq. (10) is identical to the equivalent expression for the penalty in the optimal detection case [7],[13]. Since this term is always positive,

*η*in our sub-optimal case is always smaller than it is in the optimal detection case [7], as expected. Notice however that this difference is only of second order in the PDL parameters. We may therefore conclude that the scheme presented in Fig. 1 constitutes optimal detection to first order in the PDL parameters. As we show in the simulations that follow, the lack of optimality is practically unnoticed except for very (unrealistically) high values of average PDL.

In what follows we approximate the distribution of *η* based on the first order expansion with respect to PDL, but similarly to [7], we assess the center of the distribution based on the zeroth and second orders in (10). As we demonstrate later by comparison with simulations, this approximation for the distribution of *η* gives very accurate results in the relevant range of parameters. Also, starting from this point in our derivation, and consistent with most of the existing analytical PDL literature, we will assume that many independent elements contribute to the overall PDL of the link [7]. In this case the local PDL vectors and to first order also the vectors Γ⃗_{0} and Γ⃗′ are Gaussian distributed, and the distribution of *η*, looks like the left wing of a Gaussian function (see appendix A and [7] for details);

*m*and ${\sigma}_{r}^{2}$ being

_{r}*ρ*〉 denoting the total average PDL of the link in decibels. The probability for any prescribed PDL induced penalty is then given in terms of the Gaussian Q-function.

#### 2.3. Comparison with the brute force approach

In order to appreciate the scheme presented in Fig. 1, we compare it with the brute force approach where a single polarization beam splitter (PBS) is used to separate the incoming light into two receivers that decide (separately) on the identity of the incoming symbols. Since the two incoming channels are no longer orthogonally polarized, they leak into each others receivers causing interference. The amount of interference depends on the way in which the PBS is aligned. In a practical situation, a polarization controller preceding the PBS will be used in order to optimize the performance of the two channels. In our proceeding analysis we will assume that the PBS is aligned symmetrically with respect to the two channels. Namely, such that they suffer from the same amount of interference [10]. We claim this to be a characteristic, though not necessarily the absolutely optimal, orientation in all scenarios. The unit Stokes vectors corresponding to the two orthogonal output states of polarization from the PBS are ±*p̂*, where the symmetry condition presented above implies that *p̂* · *ê*_{1,out} = (−*p̂*) · *ê*_{2,out}, so that the same level of interference is observed in the two channels. The vector *p̂* can be expressed explicitly as

*ê*

_{1,out}=

*ê*

_{2,out}, namely, when the communications link functions as a perfect polarizer. Using the same methods as we used in section 2.1, we may write the normalized (to the standard deviation of the noise) decision variables of the two receivers as

*b̃*| ≪ |

_{j}*ã*|) it is linearly dependent upon the ratio (

_{j}*b̃*/

_{j}*ã*). The term

_{j}*ã*is given by expression (8), and using similar methods to those that we used before, we obtain for the interference affecting the first receiver,

_{j}_{0}that is orthogonal to the launch polarizations

*ê*

_{1,2}. As a result, the expected power penalty (due to the interference) should also be of first order in the magnitude of the PDL vector. This is in contrast to the two-polarizers scheme of Fig.1, where we found that the penalty with respect to optimal detection is of second order in Γ⃗

_{0}. The consequences of this difference will be seen in the numerical study in the section that follows. Next, notice the presence of the ratio (

*a*

_{2}/

*a*

_{1}), which represents the dependence of the interference on the data symbols that are transmitted in the two channels. Clearly, this term is also responsible for the dependence of the interference penalty on the kind of modulation format that is used. For example, in the case of phase modulated transmission, (

*a*

_{2}/

*a*

_{1}) is a unit modulus pure phase factor. It may receive various phases depending on the constellation that is used. For example in the case of QPSK, this term is given by exp(

*ikπ*/2) with

*k*= 0,1,2 or 3. Finally, the term exp(

*i*Δ

*ϕ*) was added in order to account for a possible difference in the optical phase that is acquired by the two channels. This phase can be caused by various factors in the transmitters, line and receiver and it should be treated as a uniformly distributed random variable between 0 and 2

*π*. Note that for rich M-ary PSK constellations, Δ

*ϕ*plays no role in determining the performance because almost all possible phases follow from the data-dependent part (

*a*

_{2}/

*a*

_{1}).

## 3. Simulations

We model a link of *N _{s}* = 10 amplified spans of fiber, each represented by a local PDL vector

*α*⃗, whose three components are taken from a zero mean Gaussian distribution. The variance of these components is taken to be such that the overall average PDL of the link equals the prescribed value according to Eq. (14). The assumption of Gaussianity is a common one in analytical studies of PDL as well as in those of PMD and it is justified when the the overall PDL phenomenon consists of multiple independent contributions in each span. While this assumption affects the absolute numerical values of penalties appearing in our results, it is immaterial for the comparison between the various schemes. The local PDL vectors are used to generate the transfer matrices of the individual spans

**M**

*= exp(−0.5*

_{j}*α*⃗

*·*

_{j}*σ*⃗) (

*j*= 0 to

*N*– 1). Since we assume the negligibility of dispersive effects and concentrate on the evaluation of penalty due to the noise and interference effects of PDL, we need not consider the spectral dependence of the transfer matrices in our simulations. The matrices

_{s}**T**

*of Eq. (4) are given by ${\mathbf{T}}_{j}={\mathrm{\Pi}}_{k=j}^{k={N}_{s}-1}$*

_{j}**M**

*and with*

_{k}**T**

_{Ns}being the identity matrix (because the last amplifier is adjacent to the receiver and hence the noise generated by it sees no PDL). from which the vectors Γ⃗

_{0}and Γ⃗′ are obtained as described in appendix A. In addition to the transfer matrices, we need to define a pattern of power equalization that is consistent with actual systems. We assume that the inline amplifiers operate in the constant total output power mode. In this case, assuming that a large number of wavelength multiplexed channels is transmitted in the system, fixing the overall power to a constant value at every span is nearly equivalent to stating that the polarization averaged loss from the link input and to the output of any given amplifier is equal to unity. This form of equalization sets

*g*

_{0}to unity and uniquely defines the value of

*g*′. Since we are dealing with additive Gaussian noise channels, the exact symbol-error probability can be extracted rigorously for every given realization and launched symbol combination. The overall symbol-error probability

*P*is obtained, for each link realization, by averaging the error probabilities over all the possible symbol identities in the transmitted channels. From the error probability, we obtain the Q-factor, as usual, by inverting the relation ${P}_{e}={\left(2\pi \right)}^{-1/2}{\int}_{Q}^{\infty}\text{exp}\left(-{x}^{2}/2\right)\text{d}x$. The Q-penalty is defined by the ratio between the value of Q in the presence of PDL and the value of Q in its absence, expressed in dB. This penalty value, is what we compare with the quantity

_{e}*η*

_{dB}introduced in the analysis. We choose the signal to noise ratio such that in the absence of PDL the symbol error probability is

*P*= 10

_{e}^{−4}, which is a typical value for the raw error probability in modern communications schemes. Whenever there is a visible difference between modulation formats (as in the brute-force implementation) we display the results for both 2-PSK and QPSK modulation. Otherwise, only a single curve is shown.

Figure 2(a) shows the cumulative distribution of the PDL induced penalty corresponding to the two-polarizers scheme of Fig. 1, together with the penalty in the case of the optimal receiver obtained in [7]. The results correspond to an average PDL of 1dB, 2dB and 3dB. The horizontal line shown in the figure indicates the (commonly quoted) maximum outage probability of 4 × 10^{−5}. The penalty values at which the various curves intersect that horizontal line, indicate the largest relevant penalty for which a system designer must provision a margin during system design. Penalties greater than that value occur with a lower probability than 4 × 10^{−5} and are therefore irrelevant. The results here do not depend on modulation format and the difference between the performance of the simple receiver scheme considered in Fig. 1 and the performance of the optimal receiver [7], is hardly noticeable at these average PDL values of 1dB to 3dB. Consideration of higher average PDL values is unnecessary in this context since the required margins reach unrealistic values of many decibels and are unlikely to be encountered in properly working systems. The analytical results are also shown in the figure and they represent the integral of the Gaussian tail function according to Eq. (11. The analytical curves are very accurate in describing the PDL induced penalty in the cases of 1dB and 2dB of average PDL, and even in the 3dB case, the deviation between the numerical and analytical results is only within a fraction of a dB. From the practical, system design, standpoint, perhaps the most relevant quantity to be extracted from Fig. 2(a) is the amount of average PDL that can be tolerated by an optical system using this type of receiver for a given margin that one agrees to allocate to PDL. This information is collected by finding the intersection points between the cumulated probability curves corresponding to the relevant range of average PDL values (some of which are shown in Fig. 2(a) and the horizontal line at 4 × 10^{−5}. The results are presented in Fig. 2(b). The numerical results overlap with the analytical curve for average PDL values within 3dB and are only marginally worse than those corresponding to the optimal detection case. The analytical expression for the relation between average PDL and the margin is given by Margin = *m _{r}* + 3.89

*σ*, where

_{r}*m*and

_{r}*σ*are given by Eq. (12) and (13), respectively. The other curves in Fig. 2(b) correspond to the brute-force detection method as described in what follows.

_{r}In Fig. 3 we compare the nearly optimal scheme of Fig. 1 with the the performance of the brute force approach of section 2.3. Figure 3(a) was obtained for an average PDL of 1dB, whereas in Fig. 3(b) the average PDL is 2 dB. In both figures the advantage of the two-polarizers scheme is quite evident. In the case of brute-force detection, the results depend strongly on the modulation format. The performance curve of Q-PSK is noticeably worse that that of 2-PSK modulation. The reason is quite intuitive. The effect of interference is worst when the phase of the interferer is reversed by *π* relative to the phase of the detected channel (destructive interference). The phase difference between the two is the phase of the term (*a*_{2}/*a*_{1}) exp(*i*Δ*ϕ*) in Eq. (17). The term Δ*ϕ* which denotes the optical phase difference that is accumulated between channels is fixed in every realization. The possible values of (*a*_{2}/*a*_{1}) in the case of 2-PSK are only (*a*_{2}/*a*_{1}) = ±1, namely only phases of Δ*ϕ* and Δ*ϕ* + *π* are possible for the interferer. In the case of QPSK, there are 4 possible phases Δ*ϕ*, Δ*ϕ* ± *π*/2 and Δ*ϕ* + *π*. One of which is bound to be close (within 45 degrees) to the “destructive interference” angle of *π*. Since the overall penalty is always dominated by the “worst-case” with respect to the identity of the transmitted data symbols, QPSK shows noticeably inferior performance. For the tolerable PDL versus margin curves in the brute-force implementation, we return to Fig. 2(b), where the cases of 2-PSK and QPSK are shown by the stars and the squares, respectively. In the case of the brute-force implementation the solid lines simply connect the points for visualization purposes and they do not represent an analytical solution. The inferiority with respect to the two-polarizers case is self evident.

## 4. Conclusions

We have studied the potential performance of a simple optical receiver scheme for separate (disjoint) detection of polarization multiplexed data. The scheme is based on detecting each polarization channel along a polarization that is orthogonal to the other, interfering channel. Using this scheme, the penalty due to loss of orthogonality is avoided at the expense of a reduction in the power of the channel that one wishes to receive. The advantage is that this scheme is followed by two independent receivers, whose implementation is trivial. This is in contrast to the implementation of the optimal maximum likelihood receiver of [7] that recovers the transmitted symbols with the lowest possible error probability, but requires joint and therefore rather complex, processing. As we showed, the simplified receiver of Fig. 1 performs almost as well as the optimal receiver, within the range of relevant PDL values. In order to be able to appreciate the advantage due to the simplified scheme, we have also compared it with the usual form of detection, where the two polarization channels are separated with the help of a single polarization splitter followed by disjoint detection. The latter scheme has been shown to be significantly inferior to the former. In addition to its optical implementation, the scheme of Fig. 1 can be equivalently implemented in the totally electronic domain, when coherent detection is used. While the analysis that we performed concentrated on the coherent detection regime, where error probabilities could be handled analytically, thereby considerably facilitating the estimation of penalties, we believe that the advantage of the scheme of Fig. 1 is not limited to the coherent communications case.

## 5. Appendix A

This is the place to explain the calculations leading to Eqs.(10) to (12). We start from the derivation of Eq. (10) from the full expression of *η _{i}*, Eq. (9). In order to do that, we first note that the relation between the Stokes vectors representing the states of polarization before and after the effect of PDL is given by

_{0}that is orthogonal to the input vector

*ê*. This relation is expressed in a reference frame that rotates with the background birefringence of the link [4], so that unitary rotations are factored out. The final results only depend on dot products and thus the use of a rotating reference frame is immaterial. Using (18) and noting that

_{j}*p̂*

_{1}= −

*ê*

_{2,out}=

*ê*

_{1}+

**O**(Γ⃗

_{0}) we expand the right-most expression for

*η*in (9) up to the second-order in the PDL vectors Γ⃗

_{i}_{0}and Γ⃗′. The result is the following

*g*

_{0}, the polarization averaged loss over the entire link is set to unity, as would be the case in typical high channel-count WDM systems in which the amplifiers operate in the constant output power mode. Expression (19) is identical to Eq. (10) appearing in the main text. Since, as we show in what follows, the term (1 –

*g*′) is of second order in the PDL vectors,

*η*is identical to first order to the equivalent quantity calculated in [7] for the maximum likelihood receiver. Therefore, the expression for ${\sigma}_{r}^{2}$ is identical to the one provided there. Yet due to the difference in the second-order dependence, the expression for

_{j}*m*in [7] needs to be corrected here by subtracting from it the quantity $\u3008{({\overrightarrow{\mathrm{\Gamma}}}_{0}^{\perp}-{{\overrightarrow{\mathrm{\Gamma}}}^{\prime}}^{\perp})}^{2}\u3009$ and by adding to it the quantity. The first contribution is easily found using the isotropy of the problem: $\u3008{({\overrightarrow{\mathrm{\Gamma}}}_{0}^{\perp}-{{\overrightarrow{\mathrm{\Gamma}}}^{\prime}}^{\perp})}^{2}\u3009=2\u3008{[({\overrightarrow{\mathrm{\Gamma}}}_{0}-{\overrightarrow{\mathrm{\Gamma}}}^{\prime})\cdot {\widehat{e}}_{j}]}^{2}\u3009=2{\sigma}_{r}^{2}$. The second contribution vanishes since 〈1 –

_{r}*g*′〉 = 0, as we show in what follows. Starting from Eq. (4), it is evident that

*g*′ we first consider the structure of ${g}_{j}^{\left(0\right)}$. The term ${g}_{j}^{\left(0\right)}$ is half the trace of ${\mathbf{T}}_{j}{\mathbf{T}}_{j}^{\u2020}$, which is the same as half the trace of ${\mathbf{T}}_{j}^{\u2020}{\mathbf{T}}_{j}$. Lets call the matrix describing transmission from the link input and to the output of the

*j*’th amplifier by ${\mathbf{J}}_{j}={\mathbf{T}}_{j}^{-1}{\mathbf{T}}_{0}$. This implies that ${\mathbf{T}}_{j}={\mathbf{T}}_{0}{\mathbf{J}}_{j}^{-1}$ and that

*is the PDL vector of the link between input and amplifier number*

_{j}*j*and ${\mathbf{T}}_{0}^{\u2020}{\mathbf{T}}_{0}=1+{\overrightarrow{\mathrm{\Gamma}}}_{0}\cdot \overrightarrow{\sigma}$. Using these relations we find that

*α*⃗

*are the local vectors of PDL [12]. Thus, while Γ⃗*

_{k}*is the PDL vector between the link input and the*

_{j}*j*’th amplifier, (Γ⃗

_{0}– Γ⃗

*) represents the PDL vector between the*

_{j}*j*’th amplifier and the end of the link and the two are statistically independent of each other. Hence, the average of the above expression is 〈1 –

*g*′〉 = 0 to second order in the PDL.

## 6. Appendix B

The goal is to describe the derivation of eq. (6). We start from introducing

*l*and

*q⃗*. From here, the derivation proceeds as follows,

## References and links

**1. **J. Renaudier, G. Charlet, M. Salsi, O. B. Pardo, H. Mardoyan, P. Tran, and S. Bigo, “Linear Fiber Impairments Mitigation of 40-Gbit/s Polarization-Multiplexed QPSK by Digital Processing in a Coherent Receiver,” J. Lightwave Technol. **26**, 36–42 (2008). [CrossRef]

**2. **H. Sun, K.-T. Wu, and K. Roberts, “Real-time measurements of a 40 Gb/s coherent system,” Opt. Express **16**, 873–879 (2008) [CrossRef] [PubMed]

**3. **L. E. Nelson, S. L. Woodward, M. D. Feuer, X. Zhou, P. D. Magill, S. Foo, D Hanson, D. McGhan, H. Sun, M. Moyer, and M. O’Sullivan, “Performance of a 46Gbps dual polarization QPSK transceiver in a high-PMD fiber transmission experiment,” Optical Fiber Communications conference, Paper PDP9, OFC San Diego (2008).

**4. **A. Meccozzi and M. Shtaif, “Signal-to-noise-ratio degradation caused by polarization-dependent loss and the effect of dynamic gain equalization,” J. Lightwave Technol. **22**1856–1871 (2004). [CrossRef]

**5. **C. Xie and L. F. Mollenauer, “Performance degradation induced by polarization dependent loss in optical fiber transmission systems with and without polarization mode dispersion,” J. Lightwave Technol. **21**, 1953–1957 (2003). [CrossRef]

**6. **T. Duthel, C. R. S. Fludger, J. Geyer, and C. Schulien, “Impact of polarization dependent loss on coherent POLMUX-NRZ-DQPSK, Optical Fiber Communications conference, Paper OThU5, OFC San Diego (2008).

**7. **M. Shtaif, “Performance degradation in coherent polarization multiplexed systems as a result of polarization dependent loss” Opt. Express. **16**, 13918–13932 (2008). [CrossRef] [PubMed]

**8. **M. Yoshida, H. Goto, K. Kasai, and M. Nakazawa, “64 and 128 coherent QAM optical transmission over 150 km using frequency-stabilized laser and heterodyne PLL detection,” Opt. Express **16**, 829–840 (2008). [CrossRef] [PubMed]

**9. **A. R. Chraplyvy, A. H. Gnauck, R. W. Tkach, J. L. Zyskind, J. W. Sulhoff, A. J. Lucero, Y. Sun, R. M. Jopson, F. Forghieri, R. M. Derosier, C. Wolf, and A. R. McCormick, “1-Tb/s transmission experiment,” IEEE Photon. Technol. Lett. **8**, 1264–1266 (1996). [CrossRef]

**10. **Z. Wang and C. Xie, “PMD and PDL Tolerance of Polarization Division Multiplexed Signals with Direct Detection,” European Conf. on Opt. Comm. Paper We.3.E.2, ECOC 2008, Brussels, Belgium

**11. **S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express , **16**, 804–817 (2008). [CrossRef] [PubMed]

**12. **A. Mecozzi and M. Shtaif, “The statistics of polarization dependent loss in optical communication systems,” IEEE Photon. Technol. Lett. **14**, 313–315 (2002). [CrossRef]

**13. **The term (1 – *g*′) was omitted mistakenly from the optimal detection case in [7]. Since its average is 0 (see appendix A), it does not affect the final result.