## Abstract

We study the nonlinear interference noise (NLIN) generated in SDM systems, and generalize the NLIN model introduced in the context of single-mode fibers to the multi-mode case. The generalized model accounts for the modulation-format dependence of the NLIN, and gives the scaling of the NLIN power with the number of transmitted modes. It also provides the tools for extending the results of the NLIN Wizard to SDM. Unlike in the case of single-mode systems, the effect of MD cannot in general be ignored in the SDM case. We show that inclusion of MD erases the contribution of FWM effects, and significantly suppresses the effect of XPM.

© 2017 Optical Society of America

## 1. Introduction

Space-division multiplexed transmission on multi-mode and multi-core fibers is viewed as a promising approach for scaling the capacity of fiber communication systems. Numerous experimental demonstrations indicate the feasibility of the involved technology (record experiments have reported spectral efficiencies as high as 58 bits/second/Hertz [1] over fibers supporting up to 15 spatial modes [2]), thus making space-division multiplexing (SDM) a leading candidate for the future transport network. Similarly to the case of single-mode fiber based systems, the ultimate limit to the fiber information throughput is set by the fiber’s nonlinearity. In the absence of full knowledge of the information transmitted through the fiber (in all channels), the nonlinear distortions must be treated as noise that penalizes transmission. In what follows we refer to this noise as nonlinear interference noise, or NLIN. In the context of single-mode systems it has been shown that a good estimate of the effects of nonlinearity on performance can be obtained based on the second-order statistics of the NLIN, i.e. the variances and correlations of the various noise components [3–7], and secondly their temporal correlation properties [8]. A significant amount of work on this topic has been carried out in the context of single-mode fiber (SMF) systems, cultivating into the well known Gaussian Noise (GN) model [3], and its extensions [4, 9]. One attempt to characterize the NLIN in SDM fibers has been recently reported by Rademacher and Petermann [10], who essentially extended the formulation of the basic GN model [3] to the multi-mode-case. The limitations of this approach are in the fact that the GN model does not account for modulation format dependence and tends to be inaccurate in the case of short links [9, 11]. In addition, it does not account for the presence of modal dispersion (MD) within quasi-degenerate mode-groups [12, 13], whose impact on the NLIN can be substantial, as has been pointed out in [14]. A purely numerical characterization of the NLIN power and its scaling with the number of modes was reported in [15], but in that study as well the effect of intra-group MD was not taken into account. In [16] the variance of the NLIN induced by inter-channel nonlinearities in SDM fibers was evaluated for the case of a single group of quasi-degenerate modes in the high MD regime, where cross-phase-modulation (XPM) is the dominant inter-channel nonlinearity [17].That work extended the analysis of [4], where the modulation format dependence is rigorously taken into account.

In this paper we conduct a comprehensive study of the NLIN power in SDM systems where the information is encoded into multiple groups of quasi-degenerate modes, and characterize all the relevant nonlinear processes. The major outcome of our study is a set of expressions for the scaling of the NLIN power with the number of modes. These expressions allow the extraction of the powers of the NLIN contributions from well established results that were derived and validated in the context of single-mode transmission [9, 18].We note that, given the assumptions on the SDM fiber characteristics that will be discussed in detail at a later stage, the derived expressions are exact within the same perturbation approach underpinning the treatment of the fiber Kerr nonlinearity introduced in the study of single-mode transmission [4].

The paper is organized as follows. In Section 3 we introduce the formalism that is necessary for the subsequent study of the NLIN, and review the coupled Manakov equations, which are the starting point of the analysis. In Section 4 we derive the expressions for the powers of the NLIN produced by intra-group nonlinear processes. These are self-phase modulation (SPM), XPM, and FWM involving either three, or four wavelength-division multiplexed (WDM) channels. In the same section we discuss the beneficial effect of intra-group MD, which is shown to notably reduce the contribution of XPM to the NLIN power, and to produce de-phasing that eliminates the contribution of FWM altogether. Section 5 is devoted to the study of NLIN produced by inter-group interactions. These are shown to be of two types, one is XPM, and another is non-degenerate FWM. We derive the expressions of the NLIN powers for the two interactions, and show that intra-group MD does not affect cross-group XPM, whereas it averages out cross-group FWM. The related phase-matching conditions are discussed. The paper ends with an appendix. In the first part of the appendix, we present a detailed derivation of the NLIN power expressions discussed in Sections 4 and 5. These expressions involve coefficients consisting of infinite sums of triple integrals. In the second part of the appendix, we express these coefficients in a form that is suitable for efficient numerical evaluation.

## 2. A note on terminology

The terms SPM, XPM, and FWM, which will be intensively used in what follows, need some clarification. We consider an SDM system where different mode groups do not couple strongly with one another, and hence they are received and processed separately. Conversely, all modes within the same mode group couple strongly, and hence they need to be detected and processed jointly, just like the two field polarizations in a single-mode system. Consistently with this notion, the signals transmitted in the various modes of the same group and at the same central wavelength are considered a single WDM channel, so that the total number of WDM channels transmitted in an SDM system equals the number of WDM wavelengths multiplied by the number of mode groups. We refer to one of the WDM channels as the channel of interest (COI), whose performance degradation due to NLIN is to be evaluated. Each of the other WDM channels is referred to as an interfering channel (IC). With these notions in mind, we use the terms SPM, XPM, and FWM as follows. SPM is a process through which the COI interferes nonlinearly with itself. XPM is a process where a single IC imposes a nonlinear interference on the COI. Finally, FWM is a process in which either two or three ICs impose a nonlinear interference on the COI.

## 3. Formalism and propagation equations

For the simplicity of notation and with no loss of generality, we consider propagation of two quasi-degenerate mode groups, which we denote as group *a* and group *b*. In this work we assume that the linear coupling between the mode groups is small, as should be the case in short-to medium reach links, in which case its contribution to the nonlinear distortion is negligible. When this assumption is not satisfied, meaning that the inter-group coupling is large, then its inclusion in the analysis is per se a problem that requires a dedicated effort, which is left for future work. In any case, the inclusion of linear coupling is straightforward for the purpose of performing numerical studies [16, 19]. The number of modes in each group is 2*N _{a}* and 2

*N*, where the factor of two accounts for the existence of two degenerate orthogonal polarization modes in each spatial mode (as appropriate in the weakly guiding approximation [20]). The complex envelopes of the electric fields in the modes of groups

_{b}*a*and

*b*are stacked into two column vectors ${\overrightarrow{E}}_{a}$ and ${\overrightarrow{E}}_{b}$, containing 2

*N*and 2

_{a}*N*components, respectively. The evolution of the electric field vectors along the SDM fiber obeys the coupled multi-component Manakov equations [21, 22], supplemented with the terms accounting for MD within the groups (this type of MD should not be confused with the fact that the group velocities of two mode groups are different. On the contrary, it originates from the fact that what we refer to as

_{b}*degenerate modes*consist of true modes characterized by somewhat different group velocities propagating in the regime of strong random coupling [23]),

A detailed derivation of Eqs. (1) and (2) from Maxwell’s equations can be found in [16]. Here α is the fiber loss coefficient (in this work we neglect mode-dependent loss), *β _{a}* and

*β*are the propagation constants of the two groups, ${\beta}_{a}^{\prime}$ and ${\beta}_{b}^{\prime}$ are the inverse group velocities, ${\beta}_{a}^{\u2033}$ and ${\beta}_{b}^{\u2033}$ are the chromatic dispersion coefficients. The terms

_{b}**B**

*(*

_{a}*z*) and

**B**

*(*

_{b}*z*) are 2

*N*× 2

_{a}*N*and 2

_{a}*N*× 2

_{b}*N*traceless matrices, respectively, which describe the local generalized birefringence within the two groups [24]. By

_{b}*γ*we denote the usual nonlinearity coefficient appearing in the scalar nonlinear Schrödinger equation (NLSE), namely

*γ*=

*n*

_{2}

*ω*

_{0}/

*cA*

_{eff}, where

*n*

_{2}is the glass nonlinearity coefficient,

*ω*

_{0}is the center frequency of the optical signal,

*c*is the speed of light in vacuum, and

*A*

_{eff}is the effective area of the fundamental mode. The coefficients

*κ*,

_{aa}*κ*,

_{ab}*κ*, and

_{ba}*κ*are given by the following expression

_{bb}*δ*is the Kronecker delta function, and each of the indices

_{uv}*u*and

*v*takes the values

*a*and

*b*depending on which of the coefficients is evaluated. For example, if

*u*=

*a*and

*v*=

*b*, then in the summation the index

*j*runs over all values corresponding to the modes in group

*a*and the index

*h*runs over all the indices corresponding to the modes in group

*b*. The indices

*k*and

*m*in the summation span all the modes. The coefficients

*C*involve overlap integrals between the various mode profiles and their expressions can be found in [22]. Owing to the symmetry

_{jhkm}*C*=

_{jhkm}*C*that characterizes them, the coefficients that nonlinearly couple the two groups are identical,

_{hjkm}*κ*=

_{uv}*κ*=

_{vu}*µ*. Finally, it should be stressed that both

*γ*and

*κ*depend on the number of modes, although this dependence is omitted in our notation for the sake of simplicity.

_{uv}In order to describe the scaling, in what follows we also express the scalar NLSE,

*π*is the frequency spacing between adjacent channels. In what follows, we will assume that the vector ${\overrightarrow{E}}_{a,0}$ corresponds to the COI, whereas the vectors ${\overrightarrow{E}}_{a,k\ne 0}$ and ${\overrightarrow{E}}_{b,k}$ describe the ICs. By substituting Eqs. (6) and (7) into Eq. (1), the nonlinear terms that fall into group

*a*and are within the bandwidth of the COI can be expressed as

*η*is 0 when

_{l,m}*l*= 2

*m*(which includes the case

*l*=

*m*= 0), and it is equal to 1 in all other cases. Equation (8) represents intra-group nonlinear interference, whereas Eq. (9) represents the nonlinear interference between the two groups. Note that these terms do not account for the interference caused by the nonlinear spectral broadening of the nearest ICs [25].

In the following section we start by characterizing intra-group interference, and proceed to inter-group interference in the subsequent section. Throughout this paper, we assume that the various space and polarization modes are modulated with statistically independent and identically distributed data streams. We note, however, that the assumption of identical distribution can easily be removed, at the expenses of the simplicity of the final results. Another important assumption concerns the effect of MD. In order to be able to analytically account for this phenomenon in the context of nonlinear interference, we assume that MD is sufficiently large to imply that different WDM channels undergo independent mode coupling processes, while at the same time the MD-induced distortion of the individual channels is negligible. For reference, we also consider the limit in which MD is negligible altogether, a situation of practical relevance for two-fold degenerate mode groups. Concerning the role of the modal dispersion terms proportional to the matrices **B*** _{a}* and

**B**

*in Eqs. (1) and (2), we note that MD is assumed not to imply intra-channel distortions, which corresponds to setting*

_{b}**B**

*and*

_{a}**B**

*to zero in the analysis of the intra-channel dynamics. As for the inter-channel dynamics, we stress that the analysis focuses on the two limiting cases where MD is either absent or large, where the term*

_{b}*large*is used by assuming the spacing between adjecent WDM channels as a reference, so that MD is considered to be large when the MD bandwidth [24] is smaller than the channel spacing, but larger than the channel bandwidth. In the former case the terms

**B**

*and*

_{a}**B**

*disappear, whereas in the second case their effect is to render the orientations of the hyper-polarizations of different WDM channels belonging to the same mode group mutually uncorrelated. The more general case where MD implies intra-channel distortions goes beyond the scope of this paper and is left for future work.*

_{b}## 4. Intra-group nonlinear interference

The nonlinear interactions that are addressed in this section are sketched in Fig. 1. A detailed description of the various interactions is provided in the respective subsections below.

#### 4.1. Intra-channel interaction

The first case that we consider is the one where *l* = *m* = 0. As will be seen in what follows, it is convenient to look at one component of the vector in Eq. (8), which can be expressed as follows,

*E*

_{a}_{,0,}

*we denote the*

_{j}*j*-th component of vector ${\overrightarrow{E}}_{a,0}$. The first term at the right-hand side of the equation is identical to the self-phase modulation term that one would encounter in the analysis of the scalar NLSE (4), provided that

*γ*is replaced by

*γκ*. Each of the remaining (2

_{aa}*N*− 1) terms is equivalent to the term that accounts for cross-polarization phase modulation encountered in the single-mode case, as follows from Eq. (5), where 8

_{a}*γ*/9 is to be replaced by

*γκ*. In addition, the terms in the summation are statistically independent of each other, and hence their contributions to the NLIN power in the

_{aa}*j*-th mode are additive. For the sake of simplicity we refer to all the terms appearing in Eq. (10) as SPM, although this definition should be intended as a generalization of the standard definition of SPM. Based on the observations described above, the variance of the NLIN contribution due to intra-group SPM in each of the modes belonging to group

*a*can be expressed as

*κ*and through

_{aa}*A*

_{eff}, which determines

*γ*.

#### 4.2. Two-channel interaction

The second case that we consider is the one where *l* = *m*, with *l*, *m* ≠ 0. This term describes XPM and hyper-polarization rotation (the latter coincides with the rotation of the standard polarization state vector in the single-mode case). In this case, the sum of the two contributions from Eq. (8) can be expressed as

**I**

*is the 2*

_{a}*N*× 2

_{a}*N*identity matrix.

_{a}In the limit of large MD, the nonlinear interference vector can be averaged with respect to the relative orientation of the two state vectors. In other words, the nonlinear interference vector is to be averaged with respect to the orientation of ${\overrightarrow{E}}_{a,l}$, in a reference frame rotating with the orientation of ${\overrightarrow{E}}_{a,0}$. As a result of this procedure (which has also been discussed in [16]), Eq. (12) simplifies to

This expression is obtained by averaging the matrix
${\overrightarrow{E}}_{a,l}{\overrightarrow{E}}_{a,l}^{\u2020}$ with respect to the isotropically distributed orientation of the state vector
${\overrightarrow{E}}_{a,l}$, which yields
${\mathbf{I}}_{a}{|{\overrightarrow{E}}_{a,l}|}^{2}/2N$. The form of Eq. (13) indicates that the terms considered here contribute to NLIN through XPM. It is now convenient to expand the expression of the *j*-th component of the vector in Eq. (13) as

This form shows that the modes at the frequency *l*Ω provide 2*N _{a}* statistically independent contributions to NLIN, and hence the corresponding variances can be added to each other. The XPM-induced NLIN variance can in this case be expressed as

*γ*=

*γκ*, and where the factor of 1/4 accounts for the fact that the XPM term in the scalar NLSE has a factor of 2 in front. We note that Eq. (15) was first presented in [16].

_{aa}In the opposite limit of small MD, the various WDM channels do not rotate relative to each other, and the contribution to the NLIN variance is calculated as follows. We first express explicitly the *j*-th component of the nonlinear interference vector as follows

The first term of Eq. (15) is identical to the XPM term given by the scalar NLSE (4), with *γ* = *γκ _{aa}*. The second and third terms introduce (2

*N*− 1) statistically independent contributions, which are individually equivalent to the contribution of the second polarization in the single-mode case. Based on this argument, the contribution of Eq. (15) to the NLIN variance can be expressed as

Using the results of [26], Eq. (15) and (17) can be expressed as

*P*is the average signal power per WDM channel in each scalar mode,

*b*stands for the constellation symbols used in each independent data stream (e.g. in the case of QPSK

*b*receives the values ±1 ±

*i*with equal probabilities), and angled brackets denote ensemble average. Using the terminology of [4, 26], the first term in the two expressions accounts for second order noise (SON), which coincides with the standard GN model [3, 10], whereas the second term is referred to as fourth-order noise (FON), and it accounts for the dependence on modulation format. The coefficients

*χ*

_{1}and

*χ*

_{2}are the SON and FON coefficients, respectively, and their expressions are given in the appendix of [26], except that the factor 8

*γ*/9 therein needs to be replaced by

*γκ*. It is worth noting that the dependence of the NLIN power on the modulation format in the multi-mode caseis very similar to that seen in single-mode transmission. Indeed, the ratio between the FON and the SON terms is equal to 6/5 in the case of negligible MD, whereas it reduces from 1 for

_{aa}*N*= 1 to 3/5 for large mode counts, in the case of large MD. The situation is similar in the other nonlinear interference processes which are studied in what follows.

_{a}As will be discussed in more detail at a later stage, we note that MD yields a reduction of the SON coefficient. The ratio between the SON coefficients appearing in Eqs. (19) and (18) can be expressed as 2[1 − 1/(2*N _{a}* + 1)], which corresponds to a reduction by a factor of about 1.6 for

*N*= 2, and approaches the value of 2, as the number of modes increases.

_{a}#### 4.3. Three and four-channel interactions

Three and four-channel interactions involve either two or three different ICs whose nonlinear interaction has an effect on the COI. In the limit of large MD, the various channels undergo uncorrelated mode coupling processes and hence these interactions do not build up coherently, with the consequence that they can be safely neglected. It is only in the limit of low MD that the contribution of these terms needs to be considered. To the best of the authors’ knowledge, this point has eluded previous work, where intra-group modal dispersion was neglected within the master model assumptions [10, 15].

We start by considering the case of three-channel interactions. There are two types of such interactions. One is degenerate with respect to the COI, and is of the form
$i\gamma {\kappa}_{aa}{\overrightarrow{E}}_{a,0}^{\u2020}{\overrightarrow{E}}_{a,m}{\overrightarrow{E}}_{a,-m}+i\gamma {\kappa}_{aa}{\overrightarrow{E}}_{a,0}^{\u2020}{\overrightarrow{E}}_{a,-m}{\overrightarrow{E}}_{a,m}$, with *m* ≠ 0. The other is degenerate with respect to one of the ICs, and is of the form
$i\gamma {\kappa}_{aa}{\overrightarrow{E}}_{a,2m}^{\u2020}{\overrightarrow{E}}_{a,m}{\overrightarrow{E}}_{a,m}$, with *m* ≠ 0. The *j*-th component of the nonlinear interference vector in the first case reads as

*l*= 2

*m*, the

*j*-th component of the nonlinear interference vector reads as

The first term in the expansion is the same as would follow from the scalar NLSE, whereas each of the other (2*N _{a}* − 1) terms is equivalent to the second polarization in the single-mode fiber case described by the Manakov equation (5). Since they are all uncorrelated with each other, the overall NLIN power can be expressed as

Finally, we consider the contribution of four-channel interactions, which are accounted for by the nonlinear interference vector
$i\gamma {\kappa}_{aa}{\overrightarrow{E}}_{a,l}^{\u2020}{\overrightarrow{E}}_{a,m}{\overrightarrow{E}}_{a,l-m}+i\gamma {\kappa}_{aa}{\overrightarrow{E}}_{a,l}^{\u2020}{\overrightarrow{E}}_{a,l-m}{\overrightarrow{E}}_{a,m}$, with *l* ≠ *m*, *l* ≠ 2*m*, and *l*, *m* ≠ 0. Its *j*-th component is

Here also all terms are identically distributed and uncorrelated with each other, and hence their contribution to the NLIN power reads as

where by ${\sigma}_{4\text{ch},\text{scalar}}^{2}$ we denote the corresponding contribution in the scalar case, as follows from the scalar NLSE (4). Its expression, first derived in [9], is also given in the appendix.#### 4.4. Numerical validation: intra-group nonlinear interference

We now proceed to validate the formulae derived in Section 4, which describe the scaling of the NLIN power with the number of strongly coupled modes. In this case the simulations are based on the numerical integration of Eq. (1), where the inter-group nonlinear coupling term
$\gamma {\kappa}_{ab}{|{\overrightarrow{E}}_{b}|}^{2}{\overrightarrow{E}}_{a}$ is suppressed by setting *κ _{ab}* = 0. We considered a 5 × 100 km SDM system, with a fiber loss coefficient of 0.2 dB/km, and a nonlinearity coefficient equal to

*γκ*= 1.3/

_{aa}*N*W/km

_{a}^{−1}, consistently [27]with the scaling discussed in [16, 28]. We transmitted a 16 QAM modulated signal using a square-root raised-cosine fundamental waveform with a roll-off factor of 0.01, and assumed coherent reception with a matched electrical filter. The signal power was set to -2 dBm per scalar (space and polarization) mode. Given the focus on nonlinear interference, amplification noise was not added to the propagating field. In Fig. 2 we plot the NLIN power as a function of the number of strongly coupled modes for various system settings, as discussed in what follows. All simulations were performed in the regime of negligible MD, which is the case where theory needs to be validated numerically (in the opposite regime of large inter-channel MD, the scaling of XPM has already been tested numerically in [28] and [16], whereas FWM terms would simply vanish, as discussed in Section 4.3).

In Fig. 2(a) we plot the results obtained for SPM and XPM. We first extracted the SPM contribution by propagating the COI alone. Then, the XPM contribution was evaluated by propagating the COI along with the nearest IC (*l* = *m* = 1), and subtracting the SPM contribution from the overall NLIN power. In both runs the same set of 2^{15} pseudo-random symbols was used. The simulations of Fig. 2(a) were performed with a typical modulation rate of 32 Gbaud and a channel spacing of 50 GHz, and set the chromatic dispersion coefficient to the usual value
${\beta}_{a}^{\u2033}=-21{\text{ps}}^{2}/\text{nm}/\text{km}$. Squares and dots show the simulation results for SPM and XPM, respectively. Solid and dashed lines represent Eq. (11) and (17), respectively.

Figures 2(b) – 2(d) show the results for three- and four-channel interactions. In order to resolve the NLIN contribution of these interactions, it was necessary to reduce the baudrate and the dispersion coefficient to the extent that SPM and XPM are not dominant. Hence we assumed a baudrate of 3.2 GBaud, and set the chromatic dispersion coefficient to
${\beta}_{a}^{\u2033}=-5{\text{ps}}^{2}/\text{nm}/\text{km}$. In order to avoid the interference which is caused by the spectral broadening of the individual channels, we set the channel spacing to 6 GHz, so that the ratio between the channel separation and the baudrate is slightly larger than it is in Fig. 2(a). Figure 2(b) refers to three-channel interaction of type I, which was produced by transmitting the COI along with the two nearest ICs (*l* = 0, *m* = 1 in Eq. (8)). The NLIN power is plotted after removal of the SPM contribution. The solid curve was obtained by summing Eqs. (15) and (21), whereas the dashed curve shows only the XPM contribution (Eq. (17)). The appreciable difference between the two curves confirms that the three-channel interaction of type I was tested in a regime where its contribution to the NLIN power is substantial. Figure 2(c) shows the results obtained for the three-channel interaction of type II, produced by transmitting the COI along with the first and second neighboring channels (*l* = 2, *m* = 1 in Eq. (8)). The solid curve was obtained by summing Eqs. (15) and (23), whereas the dashed curve shows the XPM contribution alone. Finally, Fig. 2(d) refers to four-channel interaction, and was obtained by transmitting the first, third, and fourth neighboring channels along with the COI (*l* = 4, *m* = 1 in Eq. (8)). The agreement between simulation results and theory is excellent in all cases.

#### 4.5. Effect of modal dispersion on intra-group NLIN

While the beneficial effect of mode coupling has been noted since early numerical studies [29], MD has been argued to reduce the strength of the nonlinear interference in SDM systems only recently [14, 16, 30]. This effect follows from the fact that MD imposes uncorrelated rotations of different WDM channels in the process of propagation. The immediate consequence of this reality is that three- and four-channel interactions are averaged out, as discussed in the previous section. In addition, as we explain below, MD also suppresses the NLIN caused by XPM. In order to quantify this effect, we define the parameter *ρ* as the ratio between the powers of the NLIN in the two cases of large and negligible MD, which are given in Eq. (15) and (17), respectively:

Interestingly, *ρ* is independent of the nonlinearity coefficient *γκ _{aa}*, and the system parameters affect its value only through the ratio
${\sigma}_{\text{XPM},1}^{2}/{\sigma}_{\text{XPM},\text{scalar}}^{2}$, which can be conveniently computed by means of the NLIN Wizard [18]. In the limit of large mode count,
$\rho ~0.25/({\sigma}_{\text{XPM},1}^{2}/{\sigma}_{\text{XPM},\text{scalar}}^{2}-1)$.

Figure 3 shows a plot of *ρ* as a function of the number of modes *N _{a}* for the same system considered in Fig. 2(a). The four curves were obtained by increasing the number of spans from two to thirty, whereas from the leftmost to the rightmost panel, the number of interfering channels was increased from one to four, as is sketched in the insets. The plots show that MD yields a substantial reduction in the NLIN power, implying that failure to account for it (e.g. as in [10,15]) leads to exaggerated estimates of the nonlinear distortion due to inter-channel interactions.

## 5. Inter-group nonlinear interference

In this section we consider the nonlinear interference that channels propagating in mode group *b* impose on the COI, which is in mode group *a*, and whose central frequency is set to zero. We stress that in our assumptions the receiver processes separately the fields in group *a* and *b*, so that the COI receiver does not have access to the zero-frequency channel in group *b*. This assumption is consistent with the neglect of linear coupling between different mode groups.

Here we are concerned with nonlinear terms of the form
$i\gamma {\kappa}_{ab}{\overrightarrow{E}}_{b,l}^{\u2020}{\overrightarrow{E}}_{b,m}{\overrightarrow{E}}_{a,l-m}$, and two cases need to be addressed separately. The first is *l* = *m* and the second *l* ≠ *m*. In the first case, which is illustrated in Fig. 4(a), the nonlinear interference vector can be expressed in the form

where the terms in the summation are independent of each other when conditioning on the signal transmitted in the channel of interest. This conclusion is valid in both limits of small and large MD. The effect of these terms is identical to that of the the terms in Eq. (14), as they produce cross-group cross phase modulation (XGXPM), and their contribution to the NLIN variance is given by

The quantity ${\zeta}_{\text{XPM},\text{scalar}}^{2}$ is very similar, yet not identical, to what we have denoted previously as ${\sigma}_{\text{XPM},\text{scalar}}^{2}$ in Eq. (15). The difference is in how this quantity is evaluated from the scalar NLSE (4), while accounting for the difference in group velocities and chromatic dispersion between the mode groups. The expression for ${\zeta}_{\text{XPM},\text{scalar}}^{2}$ is provided in the appendix. The dominant contribution to XGXPM comes from the WDM channel for which group velocity dispersion compensates for the group velocity mismatch between the two mode groups to the largest extent, that is

We note that, when Eq. (29) is fulfilled, XGXPM could in principle prevail over intra-group XPM.

The second case that needs to be addressed is *l* ≠ *m*, and it is illustrated in Fig. 4(b). The *j*-th component of the nonlinear interference vector in this case can be expressed in the form of the summation

The quantity ${\zeta}_{4\text{ch},\text{scalar}}^{2}$ is very similar to the quantity ${\sigma}_{4\text{ch},\text{scalar}}^{2}$ in Eq. (25), except that its evaluation from the scalar NLSE also requires accounting for the difference in group velocities and chromatic dispersion between the mode groups, similarly to the calculation of ${\zeta}_{\text{XPM},\text{scalar}}^{2}$. The expression for ${\zeta}_{4\text{ch},\text{scalar}}^{2}$ is provided in the appendix. This type of nonlinear interference is most effective when the ICs fulfill the following phase matching condition,

*m*=

*l*.

#### 5.1. Numerical validation: inter-group nonlinear interference

In this section we validate the formulae derived in Section 5. The transmission settings are identical to those used in Fig. 2(a), and in order to isolate the NLIN imposed by group *b* on group *a*, all undesired nonlinear interference processes were suppressed by setting *κ _{aa}* =

*κ*=

_{bb}*κ*= 0.The only nonzero nonlinearity coefficient was

_{ba}*γκ*= 1.3/

_{ab}*N*W/km

_{b}^{−1}(here too, the inverse dependence on

*N*is justified in [16]). The chromatic dispersion coefficients were set to ${\beta}_{a}^{\u2033}=-10{\text{ps}}^{2}/\text{km}$ and ${\beta}_{b}^{\u2033}=-25{\text{ps}}^{2}/\text{km}$, while the inverse-group-velocity difference ${\beta}_{a}^{\prime}-{\beta}_{b}^{\prime}$ was set accordingly, so as to fulfill the phase-matching condition (33).

_{b}The contribution of XGXPM to the NLIN power was extracted by transmitting only the COI in group *a*, and two ICs in group *b* at ±Ω (*l* = −1 and in *m* = 1 in Eq. (9)). The results are shown Fig. 5(a), where the power of the NLIN imposed by group *b* on group *a* is plotted versus the number of strongly coupled modes in group *b*. Markers refer to simulations, while the solid curve is the plot of Eq. (28).

The contribution of XGFWM to the NLIN power was computed for the same system configuration of Fig. 5(a), except that an additional IC at frequency −2Ω (*l* − *m* = −2 in Eq. (9)) was also transmitted in group *a* along with the COI. The solid curve and markers in Fig. 5(b) show the total NLIN power, whereas the dashed curve shows the contribution of XGXPM only.

In both numerical examples considered in this section we assumed that mode group *a* consisted of a single spatial mode (*N _{a}* = 1).

## 6. Mitigation of the nonlinear distortions in SDM links

The nonlinear interference in SDM systems is strongly influenced by the mechanisms that are responsible for linear mode coupling. As has been extensively argued in [16], strong mode mixing implies a considerable reduction of the NLIN power. This effect follows from the fact that because of the strong mode mixing, the power transmitted in each spatial mode divides equally across the individual modes, with the consequence that the overall NLIN results from independent sources, each distributed over a wider effective area.

Intra-group MD yields an additional reduction of the NLIN power, by preventing the coherent build-up of the nonlinear distortions. This reduction follows from the fact that different spectral components of the SDM signal undergo uncorrelated rotations of their hyper-polarization states. The downside of this mechanism is that use of backward propagation techniques for removing the nonlinear noise becomes highly impractical, as it would require detailed knowledge of the linear SDM fiber channel characteristics across the entire signal spectrum at every position along the fiber. Nonetheless, the MD-induced NLIN reduction can be even more effective than backward propagation techniques whose implementation is conceivable in the foreseeable future. Moreover, the benefit of MD-induced NLIN reduction comes at essentially no cost (given that MD is unavoidable anyway). This is in sharp contrast to backward propagation, which requires detailed knowledge of the received signal in the entire bandwidth which is to be back-propagated, and is only effective when the WDM channels are mutually coherent [32].

Similar considerations apply to the nonlinear interaction between non-degenerate mode groups.

## 7. Conclusions

We evaluated the power of the nonlinear interference noise (NLIN) generated in SDM systems, focusing on the relevant situation where the various fiber modes can be classified into groups such that modes belonging to the same group are degenerate and strongly coupled, whereas modes in different groups are non-degenerate and their coupling is negligible. This work generalizes the model introduced in [4, 9] to the multi-mode case, and gives the scaling of the NLIN power with the number of transmitted modes. Similarly to the single-mode case, the generalized model accounts for the modulation-format dependence of the NLIN, and provides tools for extending the results of the NLIN Wizard [18] to SDM. Unlike in the case of single-mode systems, the effect of MD cannot in general be ignored in the SDM case. We show that inclusion of MD erases the contribution of FWM effects, and significantly suppresses the effect of XPM.

## 8. Appendix

This appendix is devoted to providing the expressions for the variances used in the main body of the paper. These are ${\sigma}_{\text{SPM},\text{scalar}}^{2}$, ${\sigma}_{\text{SPM},1}^{2}$, ${\sigma}_{\text{XPM},\text{scalar}}^{2}$, ${\sigma}_{3\text{ch}-\mathrm{I},\text{scalar}}^{2}$, ${\sigma}_{3\text{ch}-\text{II},\text{scalar}}^{2}$, ${\sigma}_{3\text{ch}-\mathrm{I},1}^{2}$, ${\sigma}_{4\text{ch},\text{scalar}}^{2}$, ${\zeta}_{\text{XPM}}^{2}$, and ${\zeta}_{\text{FWM}}^{2}$. All quantities, except for ${\zeta}_{\text{XPM}}^{2}$ and ${\zeta}_{\text{FWM}}^{2}$ are available in the existing literature, and many of then can be conveniently extracted from the single-mode NLIN Wizard [18], as will be explained in what follows. We proceed by evaluating the quantities ${\zeta}_{\text{XPM}}^{2}$ and ${\zeta}_{\text{FWM}}^{2}$, and will rely on their derivation to extract the others.

## 8.1. Derivation of the NLIN power expressions

Inter-group, or cross-group XPM (XGXPM) involves only a single IC in group *b*, whereas cross-group FWM (XGFWM) involves one IC in group *a* and two ICs in groups *b*. Using the indexes *l* and *m* as in Eq. (9) (meaning that group *a* contains the COI and the (*l* − *m*)-th IC, whereas group *b* contains the *l*-th and the *m*-th ICs) we can express the linearly propagating fields in groups *a* and *b* as

*n*-th symbol slot in the channel of interest, whereas ${\overrightarrow{b}}_{n}$, ${\overrightarrow{c}}_{n}$, ${\overrightarrow{d}}_{n}$, represent the data transmitted in the various ICs. The vectors ${\overrightarrow{a}}_{n}$ and ${\overrightarrow{b}}_{n}$ are of length 2

*N*, whereas the vectors ${\overrightarrow{c}}_{n}$ and ${\overrightarrow{d}}_{n}$ are of lengths 2

_{a}*N*. The case

_{b}*l*=

*m*corresponds to XGXPM, and the case

*l*≠

*m*corresponds to XGFWM. The functions

*g*(

_{a,k}*z*,

*t*) and

*g*(

_{b,k}*z*,

*t*) are the propagated versions of the fundamental waveform

*g*(

*t*) in the

*k*-th channel of groups

*a*and

*b*, respectively. The fundamental waveform is chosen so as to ensure ISI-free reception in the back-to-back configuration, that is ∫ d

*t g*

^{*}(

*t*−

*mT*)

*g*(

*t*−

*nT*) =

*δ*, where the energy of

_{n,m}*g*(

*t*) is normalized to 1 for convenience. Thus

The NLIN induced error in the received symbol vector of the COI in the 0-th time slot after chromatic dispersion compensation and matched filtering, is given by

*f*(

*z*) describing the

*z*-dependent loss/gain profile [4]. By expanding Eqs. (38) and (39) with respect to the scalar products therein, the

*j*-th component of $\mathrm{\Delta}{\overrightarrow{a}}_{0}$ is seen to consist of 2

*N*independent and identically distributed terms of the kind

_{b}*c*we denote the

_{k,q}*q*-th component of ${\overrightarrow{c}}_{k}$. The variance of each of these terms can be extracted by analogy with XPM and four-channel FWM contributions to NLIN in the scalar case. To do so, we express the error in the reception of the symbol

*a*

_{0}in the scalar case

*X*is obtained from Eq. (40) by replacing the subscript

_{h,k,p}*b*with

*a*. The XPM variance of

*∆a*

_{0}has been calculated in [4], and it can be expressed in the time-domain formalism as,

*b*we denote a symbol of the signal constellation used to modulate the various modes. Note that Eq. (45) contains coefficients that differ from those contained in Eq. (25) of [4], where the

*χ*

_{1}and

*χ*

_{2}were defined for the scalar case. Here we adopt the definition used in subsequent literature, where

*χ*

_{1}and

*χ*

_{2}correspond to the case of polarization-multiplexed transmission [26]. In addition, unlike in [26], the NLIN power and the signal power are intended to be both

*per polarization*(as opposed to be per WDM channel), and the factor of 8/9 is absorbed into the definition of

*γ*. Namely, we use the following expression for ${\sigma}_{\text{XPM},1}^{2}$,

The quantity
${\zeta}_{\text{XPM},\text{scalar}}^{2}$ is given by Eqs. (45)–(47), provided that the coefficient *X _{h,k,p}* are replaced with

*Y*and 4

_{h,k,p}*γ*is replaced with ${\gamma}^{2}{\kappa}_{ab}^{2}$, namely [33]

Expressions for *ξ*_{1} and *ξ*_{2} which are suitable for efficient numerical evaluation are provided in the second part of this appendix.

We now switch to the extraction of the NLIN variance due to inter-group four-channel FWM, which we denoted as
${\zeta}_{4\text{ch},\text{scalar}}^{2}$. To this end we evaluate the four-channel FWM variance of *∆a*_{0} in the (single-mode) scalar case. The various FWM contributions have been evaluated previously (e.g. in [9]), but those derivations were performed in the frequency domain, and their relation with the coefficients *X _{h,k,p}* is not obvious. The contribution of the four-channel FWM in the scalar case, which is used in Eq. (25), is

Inter-group XPM and FWM are most effective when the phase-matching condition is fulfilled. Inspection of (40) shows that inter-group XPM is strongest when *∆β _{g}* = 0, that is for

On the other hand, inspection of Eqs. (34) and (35) shows that inter-group FWM is most effective when the following phase-matching condition is met

We now proceed to evaluating the variance ${\sigma}_{3\text{ch}-\mathrm{I},\text{scalar}}^{2}$, which results from the three-channel FWM interaction which is degenerate with respect to the COI. In this case we consider the scalar field

*a*

_{0}

The variance of *∆a*_{0} is

In order to compute the NLIN variance in the scalar case due to the three-channel FWM interaction that is degenerate with respect to one of the ICs, we consider the field

*a*,

_{0}The variance of *∆a*_{0} is

*µ*

_{1}

*+ µ*

_{3}, and

*µ*

_{2}are the SON and FON coefficients, respectively, in this type of three-channel interaction. In order to obtain ${\sigma}_{3\text{ch}-\text{II},1}^{2}$, the same procedure needs to be repeated for the single-mode case with two-polarizations. The field expression is

*a*

_{0},

Its *x* component is given by

We now conclude this part by evaluating the single-channel NLIN contributions ${\sigma}_{\text{SPM},\text{scalar}}^{2}$ and ${\sigma}_{\text{SPM},1}^{2}$. We start with the scalar case, where we may write

Like in the case of XPM [4], the NLIN power is obtained as the mean square value of ∆*a*_{0} − *iθ*_{RX}*a*_{0}, where *θ*_{RX} is the rotation of the symbol constellation removed by the receiver, hence
${\sigma}_{\phantom{\rule{0.2em}{0ex}}\text{scalar}}^{2}=\u3008|\mathrm{\Delta}{a}_{0}{|}^{2}\u3009+\u3008|b{|}^{2}\u3009{\theta}_{\text{RX}}^{2}-2{\theta}_{\text{RX}}\mathrm{Im}\left\{\u3008{a}_{0}^{*}\mathrm{\Delta}{a}_{0}\u3009\right\}$. We start by computing 〈|∆*a*_{0}|^{2}〉, which can be expanded as

There are three cases that need to be distinguished. In the first case all the indexes are identical, with the result

The second case is the one with a quadruplet and a pair of identical indexes. In this case the result is proportional to 〈|*b*|^{4}〉〈|*b*|^{2}〉, and there are nine possible combinations, with the result

_{k}*S*=

_{h,k,k}*δ*

_{h}_{,0}∑

_{k}*S*

_{0,}

*. The third case is the one with three different pairs of indexes. In this case there are six possible combinations (all possible triplets of pairs, where each pair contains one conjugated symbol and one non-conjugated symbol). The total contribution of these terms is*

_{k,k}In a similar way one can compute

In order to finalize this calculation we need to specify the expression of *θ*_{RX}, based on what the receiver does.

The simplest assumption is that the receiver removes the average rotation produced by SPM. This can be computed by averaging all the triplets of the form |*a _{k}*|

^{2}

*a*

_{0}with respect to |

*a*|

_{k}^{2}, with the result

This type of phase recovery is rather simple (it does not imply any adaptive procedure), but at the same time it is clearly sub-optimal. More accurate approaches to removing the nonlinear phase shift imply adaptive capabilities of the receiver [8], or the use of pilots. In the ideal case, the rotation removed by the receiver follows from minimizing the mean-square distance between the received symbols and the transmitted ones. By straightforward algebra one can find the following expression for the rotation,

*θ*

_{RX}becomes ${\theta}_{\text{RX}}=\mathrm{Im}\left\{\u3008{a}_{0}^{*}\mathrm{\Delta}{a}_{0}\u3009\right\}/\u3008|{a}_{0}{|}^{2}\u3009$, with the result

The difference between the two expressions is in the first term of the above. This term depends on the modulation format, and it vanishes for Gaussian modulation.

In the case of two polarizations the error vector can be expressed as

From this point it is convenient to look at the two components of
$\mathrm{\Delta}{\overrightarrow{a}}_{0}$ separately. In fact, the variance of the vector is the sum of the variances of its two components, which by symmetry arguments are identical. We hence express the *x*-component of the vector
$\mathrm{\Delta}{\overrightarrow{a}}_{0}$ as

*∆x*

_{0}represents the interactions within the

*x*-polarized components of the data symbols, whereas

*∆y*

_{0}accounts for cross-polarization effects. We denote again by

*θ*

_{RX}the phase shift removed by receiver. The NLIN power can hence be expressed as

The terms 〈|*∆x*_{0}|^{2}〉 and 〈|*∆y*_{0}|^{2}〉 are identical to those evaluated for SPM and XPM in the scalar case, respectively. The term
$\u3008\mathrm{\Delta}{x}_{0}^{*}\mathrm{\Delta}{y}_{0}\u3009$ is new. After some straightforward algebra, it can be expressed in the following form,

The remaining term is

The calculation ends by specifying the expression of *θ*_{RX}. The first definition of *θ*_{RX} discussed in the scalar case yields

As in the scalar case, the two rotation angles are identical for Gaussian modulation.

## 8.2. Efficient calculation of the coefficients involved in the NLIN power expressions

The NLIN power expressions derived in the previous section involve the following coefficients,

By using the following definition

*x*∈ {

*a*,

*b*}, the pulse-collision coefficients can be expressed in terms of

*C*(

_{h,k,p}*a*,

*x*,

*l*,

*m*) as follows:

*S*=

_{h,k,p}*C*(

_{h,k,p}*a*,

*a*, 0, 0),

*W*=

_{h,k,p}*C*(

_{p,h,k}*a*,

*a*, 0,

*m*),

*X*=

_{h,k,p}*C*(

_{h,k,p}*a, a, l, m*),

*Y*

_{h,k,p}*= C*(

_{h,k,p}*a, b, l, m*),

*Z*

_{h,k,p}*= C*(

_{k,h,p}*a, b*, 2

*m, m*). Given this, the different types of integrals that need to be computed reduce to

The numerical extraction of Eqs. (97)–(102) is highly nontrivial. A great simplification can be achieved by following the procedure introduced in [8]. Using the formalism of [8], the interaction coefficients *C _{h,k,p}* (

*a, x, l, m*) may be expressed as frequency-domain integrals,

*g*(

*t, z*= 0), and the

*link function ϕ*(

*ω*,

*ω*′,

*ω*″) is given by

Using these definitions, the coefficients ${\mathcal{C}}_{n}(n=1\dots 6)$ may be expressed as 3rd order integrals. Their manipulation involves a number of straightforward, yet cumbersome steps, which are either identical or very similar to those detailed in [8]. As a result, the coefficient ${\mathcal{C}}_{1}$ is found to be given by

The coefficient ${\mathcal{C}}_{2}$ is given by

The coefficient ${\mathcal{C}}_{3}$ is given by

The coefficient ${\mathcal{C}}_{4}$ is given by

The coefficient ${\mathcal{C}}_{5}$ is given by

Lastly, the coefficient ${\mathcal{C}}_{6}$ is given by

The most important feature of this representation is that the summations are no longer infinite. Assuming that the pulse spectrum vanishes outside the range [−1/*T*, 1/*T*], each summation is restricted to the only terms for which *h*, *k*, *p* ∈ [−1, 0, 1]. For the specific case of Nyquist pulses, the only non zero term is *h* = *k* = *p* = 0.

## Funding

Italian Government; Innovating City Planning through Information and Communication Technologies (INCIPICT); Israel Science Foundation (grant 1401/16).

## Acknowledgments

C. Antonelli and A. Mecozzi acknowledge financial support by the Italian Government under Cipe resolution n. 135 (Dec. 21, 2012), project Innovating City Planning through Information and Communication Technologies (INCIPICT). O. Golani and M. Shtaif acknowledge financial support from Israel Science Foundation (grant 1401/16). Discussions with Ronen Dar are acknowledged.

## References and links

**1. **H. Chen, R. Ryf, N. K. Fontaine, A. M. Velazquez-Benitez, J. Antonio-Lopez, C. Jin, B. Huang, M. Bigot-Astruc, D. Molin, F. Achten, and P. Sillard, “High Spectral Efficiency Mode-Multiplexed Transmission over 87-km 10-Mode Fiber,” OFC 2016, paper Th4C.2 (2016).

**2. **N. K. Fontaine, R. Ryf, H. Chen, A. V. Benitez, B. Guan, R. Scott, B. Ercan, S. J. B. Yoo, L. E. Gruner-Nielsen, Y. Sun, R. Lingle, E. Antonio-Lopez, and R. Amezcua-Correa, “30×30 MIMO Transmission over 15 Spatial Modes,” OFC 2015, paper PDP Th5C.1 (2015).

**3. **P. Poggiolini, “The GN Model of Non-Linear Propagation in Uncompensated Coherent Optical Systems,” J. Lightwave Technol. **30**, 3857–3879 (2012). [CrossRef]

**4. **R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Properties of nonlinear noise in long, dispersion-uncompensated fiber links,” Opt. Express **21**, 25685–25699 (2013). [CrossRef] [PubMed]

**5. **P. Serena and A. Bononi, “An Alternative Approach to the Gaussian Noise Model and its System Implications,” J. Lightwave Technol. **31**, 3489–3499 (2013). [CrossRef]

**6. **M. Secondini and E. Forestieri, “Analytical fiber-optic channel model in the presence of cross-phase modulations,” IEEE Photon. Technol. Lett. **24**, 2016–2019 (2012). [CrossRef]

**7. **P. Johannisson and M. Karlsson, “Perturbation analysis of nonlinear propagation in a strongly dispersive optical communication system,” J. Lightwave Technol. **31**, 1273–1282 (2013). [CrossRef]

**8. **O. Golani, R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Modeling the bit-error-rate performance of nonlinear fiber-optic systems,” J. Lightwave Technol. **34**, 3482–3489 (2016). [CrossRef]

**9. **A. Carena, G. Bosco, V. Curri, Y. Jiang, P. Poggiolini, and F. Forghieri, “EGN model of non-linear fiber propagation,” Opt. Express **22**, 16335–16362 (2014). [CrossRef] [PubMed]

**10. **G. Rademacher and K. Petermann, “Nonlinear Gaussian Noise Model for Multimode Fibers With Space-Division Multiplexing,” J. Lightwave Technol. **34**, 2280–2287 (2016). [CrossRef]

**11. **R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Accumulation of nonlinear interference noise in fiber-optic systems,” Opt. Express **22**, 14199–14211 (2014). [CrossRef] [PubMed]

**12. **R. Ryf, R.-J. Essiambre, A. H. Gnauck, S. Randel, M. A. Mestre, C. Schmidt, P. J. Winzer, R. Delbue, P. Pupalaikis, A. Sureka, T. Hayashi, T. Taru, and T. Sasaki, “SDM transmission over 4200-km 3-core microstructured fiber,” OFC 2011, Paper PDP5C.2 (2011).

**13. **R. Ryf, N.K. Fontaine, B. Guan, R.-J. Essiambre, S. Randel, A. H. Gnauck, S. Chandrasekhar, A. Adamiecki, G. Raybon, B. Ercan, R. P. Scott, S.J. Ben Yoo, T. Hayashi, T. Nagashima, and T. Sasaki, “1705-km Transmission over Coupled-Core Fibre Supporting 6 Spatial Modes,” ECOC 2014, PD.3.2 (2014).

**14. **C. Antonelli, A. Mecozzi, and M. Shtaif, “Nonlinear propagation in Space-Division Multiplexed fiber-optic transmission,” IEEE Photonics Conference 2015, Paper MG2.1 (2015).

**15. **M. Brehler, D. Ronnenberg, and P. M. Krummrich, “Scaling of Nonlinear Effects in Multimode Fibers with the Number of Propagating Modes,” OFC 2016, Paper W4I.3 (2016).

**16. **C. Antonelli, M. Shtaif, and A. Mecozzi, “Modeling of Nonlinear Propagation in Space-Division Multiplexed Fiber-Optic Transmission,” J. Lightwave Technol. **34**, 36–54 (2016). [CrossRef]

**17. **In the regime of high MD, FWM is negligible. Other terms, that are related to the nonlinear spectral broadening of the nearest neighboring channels are also negligible in the high MD case.

**18. **NLIN wizard website at http://nlinwizard.eng.tau.ac.il/

**19. **C. Antonelli, A. Mecozzi, M. Shtaif, and P. J. Winzer, “Random coupling between groups of degenerate fiber modes in mode multiplexed transmission,” Opt. Express , **21**, 9484–9490 (2013). [CrossRef] [PubMed]

**20. **D. Gloge, “Weakly guiding fibers,” Appl. Opt. **10**, 2252–2258 (1971). [CrossRef] [PubMed]

**21. **A. Mecozzi, C. Antonelli, and M. Shtaif, “Nonlinear propagation in multi-mode fibers in the strong coupling regime,” Opt. Express **20**, 11673–11678 (2012). [CrossRef] [PubMed]

**22. **A. Mecozzi, C. Antonelli, and M. Shtaif, “Coupled Manakov equations in multimode fibers with strongly coupled groups of modes,” Opt. Express **20**, 23436–23441 (2012). [CrossRef] [PubMed]

**23. **C. Antonelli, A. Mecozzi, and M. Shtaif, “The delay spread in fibers for SDM transmission: dependence on fiber parameters and perturbations,” Opt. Express **23**, 2196–2202 (2015). [CrossRef] [PubMed]

**24. **C. Antonelli, A. Mecozzi, M. Shtaif, and P. J. Winzer, “Stokes-space analysis of modal dispersion in fibers with multiple mode transmission,” Opt. Express **20**, 11718–11733 (2012). [CrossRef] [PubMed]

**25. **Here we are referring to spectral broadening of the nearest ICs that is caused by either self-phase modulation, cross-phase modulation with any of the other channels, or four-wave mixing contributions, whose central frequency coincides with that of the nearest ICs, and whose spectrum leaks into the bandwidth of the COI.

**26. **R. Dar, M. Feder, A. Mecozzi, and M. Shtaif, “Inter-Channel Nonlinear Interference Noise in WDM Systems: Modeling and Mitigation,” J. Lightwave Technol. **33**, 1044–1053 (2015). [CrossRef]

**27. **We stress that since our goal is to validate the above presented analysis against numerical solutions, the particular choice of γκ_{aa} is of minor importance.

**28. **C. Antonelli, A. Mecozzi, and M. Shtaif, “Scaling of inter-channel nonlinear interference noise and capacity with the number of strongly coupled modes in SDM systems,” OFC 2016, Paper W4I.2 (2016).

**29. **S. Mumtaz, R.-J. Essiambre, and G. P. Agrawal, “Reduction of Nonlinear Penalties Due to Linear Coupling in Multicore Optical Fibers,” Photon. Technol. Letters **24**, 1574–1576 (2012). [CrossRef]

**30. **C. Antonelli, A. Mecozzi, O. Golani, and M. Shtaif, “Inter-modal nonlinear interference in SDM systems and its impact on information capacity,” IEEE Summer Topicals 2016, Paper ME1.1 (2016). [CrossRef]

**31. **G. P. Agrawal, “*Nonlinear Fiber Optics*,” 4th ed. (Elsevier, 2006).

**32. **E. Temprana, E. Myslivets, B.P.-P. Kuo, L. Liu, V. Ataie, N. Alic, and S. Radic, “Overcoming Kerr-induced capacity limit in optical fiber transmission,” Science **348**, 1445–1448 (2015). [CrossRef] [PubMed]

**33. **The derivation P of (45) makes use of the property Σ_{k} X_{h, k, k} = 0 for h ≠ 0. It can be shown that the same property holds for Σ_{k} Y_{h, k, k}.