Equalization in dispersion-managed systems using learned digital back-propagation

Mohannad Abu-romoh; Nelson Costa; Yves Jaouën; Antonio Napoli; João Pedro; João Pedro; Bernhard Spinnler; Mansoor Yousefi

doi:10.1364/OPTCON.497117

1. Introduction

Before the advent of the coherent detection in optical fiber communication systems, chromatic dispersion (CD) was compensated in the optical domain through in-line dispersion management. In such systems, an optical link typically comprises a cascade of a single-mode fiber (SMF), followed by a dual-stage Erbium-doped fiber amplifier (EDFA) with a mid-stage dispersion-compensating fiber (DCF). In this scenario, the residual CD in the link forms a dispersion map that requires careful optimization. These systems have been utilized for transmitting data rates typically around 10 Gb/s per wavelength, with a few instances of upgrades to 40 Gb/s [1].

With the introduction of the coherent receiver (RX), the accumulated CD can now be fully compensated in the electrical domain using digital signal processing (DSP). As a result, dispersion management is no longer a necessity, leading to the evolution of optical links to non-dispersion-managed (NDM) systems. However, commercial dispersion-managed (DM) links continue to be in operation today, since upgrading deployed solutions to state-of-the-art systems can be particularly costly, especially in submarine fiber links. Furthermore, advances in nonlinear mitigation in recent decades may address some of the limitations of the DM systems.

Coherent systems depend on DSP at the receiver to equalize transmission impairments in the electrical domain [2]. While these systems effectively compensate for linear channel impairments, nonlinearity mitigation remains challenging. The pursuit of the nonlinearity compensation (NLC) and associated DSP algorithms has led to the proposal of several equalization techniques to mitigate nonlinear impairments [3,4], including Volterra series-based equalization [5] and digital back-propagation (DBP) [6–9].

DBP compensates for deterministic nonlinear effects by digitally simulating propagation at the receiver using negated fiber parameters [6]. However, the effectiveness of DBP is hindered by both hardware limitations, which impose restrictions on the complexity of DBP, and signal interactions between adjacent channels. Thus, advanced equalization techniques are essential for improving system performance and reducing complexity.

Research has focused on optimizing DBP algorithms for lower complexity or better performance in various transmission scenarios [10,11]. For instance, channel parameters used in DBP were numerically optimized for DM and NDM systems in [12]. Combining transmitter-side DBP with frequency referenced carriers has been shown to double the reach of the transmission, as demonstrated in [13]. Optimal DBP step sizes for polarization division multiplexed transmission systems were studied in [14]. Furthermore, several DBP variants have been proposed, such as correlated-DBP [15], which accounts for the correlation between neighboring signal samples. Dispersion folded-DBP [16–19] was first proposed for zero-residual dispersion but later extended to any dispersion map. Filtered DBP [20,21] introduces a parameterized low-pass filter (LPF) in the nonlinear step to improve phase tracking. Enhanced DBP [22,23] is an extension of FDBP that considers interactions between channel signals and adjacent channels. Coupled-Channel Enhanced DBP [24] achieves optimal cross-phase modulation (XPM) equalization in WDM systems. Recently, neural networks (NNs) have been employed to enhance DBP. Learned-DBP (LDBP) [25–29] uses a deep NN inspired by the split-step Fourier method (SSFM) computational graph, optimized using stochastic gradient descent (SGD). LDBP treats the LPF filter taps in DBP as free parameters optimized by the NN. For NDM systems, LDBP was extended to dual-polarization transmission [25] and experimentally demonstrated with one layer per span (LpS) [26]. Generalized DBP (GDBP) [30] employs a deep NN to parameterize DBP, specializing the system by combining NN training with adaptive DSP.

The main objective of this paper is to investigate the potential of repurposing legacy DM links for use in coherent transmission systems. We do not claim that replacing NDM with DM systems is superior; rather, we aim to explore the possibility of adapting already-deployed DM systems, typically used for intensity modulation and direct detection (IM/DD), to enable coherent transmission with higher-order modulation (DP-QPSK, DP-$16$-QAM and even DP-$64$-QAM). This approach yields bitrates typically ranging from 100 Gbits/s to 400 Gbits/s, clearly outperforming legacy systems operating in the range of 10 Gbits/s. To achieve this goal, we use machine learning tools to optimize DBP, specifically the Learned-DBP method. We start by developing a DBP variant that is suitable for DM systems with a fractional number of steps per span (StpS), which can be applied to DM links with arbitrary dispersion maps. The resulting DBP is then used as the blueprint for LDBP, and we evaluate the performance of both DBP and LDBP in a realistically simulated WDM DM long-haul transmission system using dual-polarization (DP) $M$-ary quadrature amplitude modulation ($M$-QAM), considering various channel effects such as loss, CD, Kerr nonlinearity, polarization-mode dispersion (PMD), amplified spontaneous emission (ASE) noise, and laser phase noise (PN).

In single-channel transmission of 32 Gbaud/s using DP-$16$-QAM format, over a distance of ${\sim }2000$ km, i.e $28\times 72$ km, LDBP improves the effective signal-to-noise ratio (SNR$_{\text {eff}}$) by 6.3 dB and 2.5 dB compared to linear equalization (LE) and conventional DBP, respectively. In WDM transmission of 32 Gbaud/s per channel over the same distance, LDBP enhances the Q-factor by 1.1 dB and 0.4 dB compared to LE and DBP, respectively. We present both time- and frequency-domain implementations of LDBP (TD- and FD-LDBP) and examine the impact of fiber parameter variations due to aging, as well as laser PN, on LDBP performance. Our analysis shows that PN results in a Q-factor penalty of less than 0.2 dB when the laser linewidth is below 200 kHz. Furthermore, we demonstrate that LDBP’s performance gains over DBP are approximately maintained even when the model is retrained with updated fiber parameters post-aging. This work extends the findings in [29] by providing further valuable insights into improving data rates in DM systems utilizing coherent detection and LDBP.

The paper is structured as follows: In Section 2, we describe the optical fiber signal propagation model. In Section 3, we present the DM-adapted DBP, while Section 4 introduces its learned version. In Section 5, we compare the performance of LE, DBP, and LDBP in four different setups and provide a complexity analysis in Section 6. Finally, Section 7 offers concluding remarks.

Fig. 1. The DM-WDM optical fiber system.

Download Full Size | PDF

2. Dual-polarization optical fiber system model

2.1 Transmitter and channel

Figure 1 depicts the complete optical fiber transmission system. Within the context of the polarization-multiplexed fiber transmission, two random bit streams with $N_b$ bits, $\mathbf {b_x}=(b_x^{(1)},b_x^{(2)},\dots,b_x^{(N_b)})$ and $\mathbf {b_y}=(b_y^{(1)},b_y^{(2)},\dots,b_y^{(N_b)})$, $b_{x/y}^{(i)}\in \{0,1\}$, are generated at the transmitter (TX), then each is mapped into a sequence of $N_s$ symbols $\mathbf {s_x} = (s_x^{(1)},s_x^{(2)},\dots,s_x^{(N_s)})$ and $\mathbf {s_y} = (s_y^{(1)},s_y^{(2)},\dots,s_y^{(N_s)})$, where $s_{x/y}^{(i)}$ are drawn from the constellation $\mathcal {S}$. The Gray mapping is used to map the bit stream to the symbols in the constellation. The complex envelope of the optical signal of the polarization $x$ is obtained by modulating $\mathbf {s_x}$ into the waveform

(1)$$q_{x,n}(z=0,t) = \sum_{i=1}^{N_s}\mathbf{s}_{x,n}^{(i)}\,p(t-iT_s),$$

where $q_{x,n}$ represents the complex envelope of the signal in the $n$-th WDM channel as a function of distance $z$ and time $t$, $p(\cdot )$ is the pulse shape, and $T_s$ is the symbol period. $z=0$ indicates that the signal is located at the TX. The equations for $q_{y,n}(0,t)$ are identical upon replacing $x$ with $y$. The waveforms $q_{x,n}$ and $q_{y,n}$ are multiplexed into an optical signal $\mathbf {q}_n(0,t)=[q_{x,n},q_{y,n}]$. The complex envelope of the WDM signal $\mathbf {q}(0,t)$ launched in the fiber link is generated by adding the signals of the different WDM channels

(2)$$\mathbf{q}(0,t) = \sum_n \mathbf{q}_n(0, t) e^{{-}jn\Delta \omega t},$$

where $\Delta \omega$ is the frequency spacing between adjacent WDM channels. The WDM signal is launched through an optical fiber channel consisting of multiple spans, where each span includes a SMF, dual-stage EDFAs, separated by a DCF with a proper length to compensate for dispersion. DCF has a higher nonlinearity compared to SMF, so the first-stage EDFA provides pre-DCF gain, but only to a power level that would not generate excessive nonlinear effects. The second EDFA in the span then amplifies the signal further to its original power level. The end-to-end channel with all components can be described by the interplay between CD, PMD and Kerr nonlinearity effects. The propagation of signals inside each span in the presence of PMD is governed by the coupled nonlinear Schrödinger’s equations (CNLSE), modeling the interaction between the two states of polarization

(3a)$$\frac{\partial q_x}{\partial z} = \left[-\frac{\alpha}{2}-\beta_{1x}\frac{\partial}{\partial t} - \frac{j\beta_2}{2}\frac{\partial^2}{\partial t^2} + j\gamma (|q_x|^2+\frac{2}{3}|q_y|^2) \right] q_x,$$

(3b)$$\frac{\partial q_y}{\partial z} = \big[-\frac{\alpha}{2}-\beta_{1y}\frac{\partial}{\partial t} - \frac{j\beta_2}{2}\frac{\partial^2}{\partial t^2} + j\gamma (|q_y|^2+\frac{2}{3}|q_x|^2) \big] q_y,$$

where $\alpha$ is the loss parameter, $\beta _{1x/y}$ are the first-order dispersion coefficients, $\beta _{2}$ is the CD coefficient and $\gamma$ is the Kerr nonlinearity parameter. The fiber length $\mathcal {L}$ considered in our simulations is significantly larger than the beat length $L_B$, allowing us to neglect the coherent cross-polarization terms [31, Eq. 6.1.11–12]. The effect of the rapidly changing state of polarization (SOP) is described separately in the numerical simulation of the propagation equation in Section 2.2. At the end of the span, an EDFA with gain $G$ compensates for the fiber attenuation and introduces ASE noise. The noise $n(t)$ is a band-limited white circularly-symmetric complex Gaussian process with the auto-correlation function $E\bigl (n(t)n^*(t)\bigr ) = \sigma _0^2 \delta _B(t-t')$, where $\delta _B(x)=B {\rm sinc}(Bt)$, $\sigma _0^2= \frac {1}{2} (G-1) B h\nu _0 {\rm NF}$, where $\nu _0$ is the carrier frequency, $B$ is the signal bandwidth, $h$ is Planck constant, and NF is the amplifier’s noise figure.

2.2 Split-step Fourier method

In this section, we describe the numerical approach for solving the CNLSE using the SSFM. All fiber effects occur simultaneously and accumulate along the length of the fiber. In SSFM, a fiber of length $\mathcal {L}$ is divided into $N_{seg}$ segments of short length $\delta _s=\mathcal {L}/N_{seg}$, which we refer to as the PMD correlation length, for which the SOP at the end of the segment is uncorrelated with its initial state. The channel effects can be assumed to occur individually and consecutively in each segment. Each step of the SSFM involves evaluation of three sub-steps: a linear step, a PMD step and a nonlinear step. Below, we describe the steps in the SSFM.

(1) Linear step: Solves for the signal loss and CD in the frequency domain. Considering only the terms which include $\alpha$ and $\beta _2$ in Eq. (3), we obtain $(4)$$\hat q_{x/y}(z,\omega) \rightarrow \exp \left(-\frac{\alpha}{2}\delta_s + \frac{j\beta_2}{2} \omega^2 \delta_s\right) \hat q_{x/y}(z,\omega),$$$ where $\hat q_{x/y}$ denotes the Fourier transform of the time domain signal $q_{x/y}$.
(2) PMD step: The PMD can be modeled by applying the unitary Jones matrix $\mathbf {J}^{(i)}(\omega )$ to the signal vector $\mathbf {\hat q}(z,\omega )=[\hat q_x(z,\omega ),\hat q_y(z,\omega )]^\top$ $(5)$$\mathbf{\hat q}(z,\omega) \rightarrow \mathbf{J}^{(i)}(\omega) \mathbf{\hat q}(z,\omega),$$$ where $\mathbf {J}^{(i)}(\omega ) = \mathbf {R}^{(i)}\mathbf {D}^{(i)}(\omega )$. The $\mathbf {R}^{(i)}$ here is a unitary matrix of the form $(6)$$\mathbf{R}^{(i)} = \begin{bmatrix} \cos{\theta_i} & e^{j\frac{\phi_i}{2}} \sin{\theta_i} \\ -e^{{-}j\frac{\phi_i}{2}} \sin{\theta_i} & \cos{\theta_i}, \end{bmatrix}$$$ where $\phi _i$ and $\theta _i$, $i\in \{1,2,\ldots,N_{seg}\}$, are sequences of independent identically distributed (i.i.d.) random variables drawn from a uniform distribution from $[-\pi,\pi )$. Furthermore, $\mathbf {D}^{(i)}(\omega )$ is the differential group delay (DGD) matrix $(7)$$\mathbf{D}^{(i)}(\omega) = \begin{bmatrix} e^{{-}j\omega\frac{\tau_i}{2}} & 0 \\ 0 & e^{j\omega\frac{\tau_i}{2}} \end{bmatrix},$$$ where the DGD parameters $(\tau _i)_{i=1}^{N_{seg}}$ are taken to be as i.i.d. random variables drawn from the natural probability distribution $\mathcal {N}(0,\tau \sqrt {\delta _s})$, where $\tau$ is the characteristic constant of the channel called the PMD coefficient.
(3) Nonlinear step: Solves for the signal nonlinear effects by only considering the terms which include $\gamma$ in Eq. (3). For the $x$ polarization, $(8)$$q_x(z,t) \rightarrow\exp\left(j\gamma\delta_s\big(|q_x|^2 + \frac{2}{3}|q_y|^2\big)\right)q_x(z,t).$$$

In our simulations, we consider a symmetric SSFM, where the nonlinear step is applied in the middle of the two linear half-steps.

2.3 Receiver

At the RX, a polarization-diversity coherent receiver converts the optical signal to the electrical domain. A low-pass filter with the same bandwidth of the central WDM channel is applied to the signal, obtaining $\mathbf {q}_{0}(z=\mathcal {L},t)$. The resulting signal is sampled at 2 samples/symbol, and translated into four electrical signals corresponding to the $I$ and $Q$ components of each polarization, which are then processed by the classical linear DSP chain [2]. The DSP chain consists of the following components:

(1) CD compensator (CDC), which reverses the CD as $(9)$$\hat q_x(\mathcal{L},\omega) \rightarrow \exp \Bigl({-}j\frac{\bar\beta_2}{2}\omega^2\mathcal{L}\Bigr)\hat q_x(\mathcal{L},\omega),$$$ where $\bar \beta _2$ is the average group velocity dispersion in the link. According to Eq. (9), CDC requires processing the signal in frequency domain using fast Fourier transform (FFT), followed by inverse FFT (IFFT) to transform the signal back to the time domain. An alternative implementation of the CDC in the time domain can be realized using finite impulse response (FIR) filters, which we will discuss in Section 6.
(2) MIMO equalizer, which compensates the time-varying PMD and the random SOP in the channel. The adaptive equalization of both effects requires using a set of four complex-valued FIR filters, which together preform the inverse of the Jones matrix of the dynamic channel in Eq. (5). The outputs of the MIMO equalizer are given by $(10a)$$x_{\text{MIMO,out}}[k] = \mathbf{h}_{xx}^H\,\mathbf{x}_\text{in}[k]\, + \mathbf{h}_{xy}^H\,\mathbf{y}_\text{in}[k]$$$ $(10b)$$y_{\text{MIMO,out}}[k] = \mathbf{h}_{yx}^H\,\mathbf{x}_\text{in}[k]\, + \mathbf{h}_{yy}^H\,\mathbf{y}_\text{in}[k],$$$ where $\mathbf {h}_{xx}$, $\mathbf {h}_{xy}$, $\mathbf {h}_{yx}$, and $\mathbf {h}_{yy}$ are vectors of size $\xi$, representing the taps of the MIMO filter, and $\mathbf {x}_{\text {in}}$ and $\mathbf {y}_{\text {in}}$ are sliding windows of the signal after chromatic dispersion (CD) compensation. These windows have a length of $\xi$, and they can be defined as $\mathbf {x}_{\text {in}}[k] = [x_\text {in}[k]$, $\mathbf{x}_\text {in}[k-1]$, $\dots$, $x_\text {in}[k-\xi ]]$ and $\mathbf {y}_{\text {in}}[k] = [y_\text {in}[k]$, $y_\text {in}[k-1]$, $\dots$, $y_\text {in}[k-\xi ]]$. A popular method for optimizing the FIR taps is called the constant modulus algorithm (CMA), which is typically used for phase modulations. For the higher order QAM modulations we use in our simulations, the radially directed equalizer proposed in [2] is used conjointly with the CMA. Following the MIMO equalizer, the output sequences are sampled at 1 sample/symbol.
(3) Carrier phase estimation (CPE): The last step in the DSP is to estimate the PN, $\phi _N$, resulting from the phase fluctuations between the local oscillator at TX and RX ends [32]. The PN of a single-frequency laser exhibits gradual and continuous phase shifts, resembling a quasi-continuous frequency drift. For the CPE, we use the two-stage algorithm proposed in [33]. A measure of the duration over which the laser phase remains stable is provided by the coherence time which is related inversely to the laser linewidth [34]. Within a single coherence time, the PN varies slowly compared to the signal, and can thus assumed to be constant. The output of the CPE, denoted as $x_\text {CPE,out}$, is expressed as: $(11)$$x_\text{CPE,out}[k] = x_{\text{CPE,in}}[k]\exp({-}j\phi_N),$$$ where $x_{\text {CPE,in}}$ is the same as the output of the MIMO equalizer, $x_{\text {MIMO,out}}$.

The aforementioned optical fiber receiver only concerns equalizing the linear effects of the signal resulting from the dispersive channel and PMD.

3. Digital back-propagation

The signal propagation in the reverse direction through the optical fiber is numerically approximated using DBP. In Fig. 1, the DBP block takes the place of the CDC in the receiver and operates at the same sampling rate of 2 samples/symbol. DBP employs the SSFM with negative propagation parameters and larger spatial segments compared to the SSFM. This approach helps to mitigate the high complexity typically associated with a large number of segments in the SSFM, while maintaining accurate signal reconstruction. In practice DBP is limited to 3 or less StpS.

3.1 Mathematical model

The DBP block is placed before the MIMO equalizer, which means that the input signals are subject to PMD variations in the optical fiber. However, it is important to note that the standard DBP algorithm is not designed to equalize random effects like PMD. Instead, it primarily addresses deterministic effects such as CD and nonlinear effects. moreover, the changing SOP can cause the $x$-polarization and $y$-polarization signals to be indistinguishable due to rapid changes in the orientation of the axes of birefringence $\theta$ along the fiber. When the PMD is small but the birefringence rapidly and varies randomly, the Manakov equation can be used to describe signal propagation. The propagation model we consider for DBP is based on Eq. (3) averaged over rapidly-varying SOP along the fiber, which corresponds to the vector Manakov equation [35]:

(12)$$\frac{\partial \mathbf{q}}{\partial z} ={-} \frac{j\beta_2}{2}\frac{\partial^2 \mathbf{q}}{\partial t^2} + \Bigl[ -\frac{\alpha}{2}+ j\gamma \frac{8}{9}||\mathbf{q}||^2 \Bigr]\mathbf{q},$$

Here, $\mathbf {q}(z,t) = [q_x(z,t), q_y(z,t)]^T$ is the Jones vector containing the propagating signals in both polarizations, and the factor $8/9$ is a characteristic property of the PMD, introduced by the averaging process. In DBP, the optical channel is divided into $N_d$ spatial segments of equal lengths, denoted as $\delta _d = \mathcal {L}/N_d$. Within each segment, the dispersive and nonlinear channel effects are assumed to act independently. These effects correspond to the first and second terms on the right side of Eq. (12), representing the linear and nonlinear effects, respectively. To solve for each of these terms, we use negated parameters. The linear part is solved in the frequency domain, while the nonlinear part is solved in the time domain, resulting in two partial solutions:

(13)$$\mathbf{\hat q}(z+\delta_d,\omega) = \underbrace{\exp\left({-}j\beta_2 \frac{1}{2} \omega^2 \delta_d \right)}_{L(\delta_d,\omega)} \mathbf{\hat q}(z,\omega),$$

(14)$$\mathbf{q}(z+\delta_d,t) = \underbrace{\exp\left(- j\frac{8}{9}\gamma ||\mathbf{q}(z,t)||^2 \delta_\text{eff}\right)}_{N(\delta_d,t)} \mathbf{q}(z,t).$$

where $\delta _\text {eff} = (1-\exp {(-\alpha \delta _d}))/\alpha$ represents the effective nonlinear step length, and the location $z$ is measured relative to the receiver, with $z=0$ corresponding to the receiver’s location. The effect of attenuation is accounted for in the effective length $\delta _\text {eff}$ in the nonlinear step, and therefore does not appear in Eqs. (13) and (14). The DBP is hence characterized by two sets of operators: the linear operator $L(\delta _d,\omega )$ and the nonlinear operator $N(\delta _d,t)$. The equalized signal can be obtained from the received signal by alternating between these two solutions along the length of the fiber in the backward direction (from the receiver to the transmitter).

We assume a discretization of $\mathbf {q}(z,t)$ into the time-sampled vector $\mathbf {U}^{(n)} = [\mathbf {X}^{(n)},\mathbf {Y}^{(n)}]^T$, where $\mathbf {X}^{(n)}\in \mathbb {C}^N$ and $\mathbf {Y}^{(n)}\in \mathbb {C}^N$, with $N$ representing the size of the window over which DBP is applied. The superscript $(n)$ refers to the DBP step, with $n=0$ representing the input to the DBP, and $n=N_d$ representing the output of the DBP. The linear step is represented by a matrix multiplication, given by

(15)$$\mathbf{U}^{(n)} \rightarrow \mathbf{B}\mathbf{U}^{(n-1)} = \mathbf{W}^{{-}1}\text{diag}(e^{\delta_dH_1},\ldots,e^{\delta_dH_n})\mathbf{W}\mathbf{U}^{(n-1)},$$

where $\mathbf {B} \in \mathbb {C}^{N\times N}$ is a matrix representing the linear operator, $\mathbf {W}$ denotes the discrete Fourier transform matrix, $H_k = - j\beta _2 \omega _k^2/2$, and $\omega _k=2\pi f_k$, where $f_k$ corresponds to the $k$-th discrete frequency. Additionally, the nonlinear step can be represented by the nonlinear transformation $K(\cdot )$

(16)$$\mathbf{U}^{(n)} \rightarrow K\left(\mathbf{U}^{(n-1)}\right) = \mathbf{U}^{(n-1)} \exp\left({-}j\gamma \varepsilon \delta_\text{eff}\frac{8}{9}\Bigl(\mathbf{X}^{(n-1)} \odot \mathbf{X}^{(n-1)*}+\mathbf{Y}^{(n-1)}\odot\mathbf{Y}^{(n-1)*}\Bigr)\right).$$

where $\odot$ denotes the Hadamard product. Additionally, a real-valued parameter $\varepsilon \in [0,1]$ is introduced, to accurately model the nonlinear effects. The optimization of this parameter will be discussed in the results section.

3.2 DBP adaptation to DM systems

In NDM systems, CD is introduced by a single type of fiber, and the residual CD grows at a constant rate along the link. However, in DM systems, the compensation of CD using DBP becomes more involved as the linear step in DBP must account for the total dispersion generated by both the DCF and the standard SMF. The DBP algorithm we use in our work allows for flexible selection of StpS values less than 1, enabling us to include multiple spans in a single DBP step, similar to what is considered in [1,23,36]. To model the accumulated dispersion in the fiber, we denote by $\mathcal {D}_c(z)$ the total accumulated dispersion inside the fiber as a function of distance $z$; see Fig. 2. The linear step is adjusted as follows: Let us assume $N_d+1$ spatial steps, dividing a fiber of length $\mathcal {L}$ into spatially equal segments with step size $\delta _d = \mathcal {L}/N_d$, with the exception of the first and last steps where each has length $\delta _d/2$. Each step spans $[z_k,z_{k+1}]$, where $z_k=(k-\frac {1}{2})\delta _d$, $k \in \{1,\dots,N_d\}$, $z_0=0$, $z_{N_d+1}= \mathcal {L}$. This configuration is similar to the Wiener-Hammerstein model in [7]. Within each step, we calculate the weighted-average dispersion coefficient, which is described by the following equation

(17)$$\bar D = \frac{\mathcal{D}_c(z_k) - \mathcal{D}_c(z_{k-1})}{\delta_d}.$$

Fig. 2. Dispersion map of the propagating optical signal over the first 7 spans, with the last two DBP steps. A complete DBP step is performed over 4 spans, from points $A$ to $B$, while a half-step is performed over 2 spans, between points $B$ and $C$. Nonlinear steps are applied at points $A$ and $B$.

Download Full Size | PDF

The Eq. (17) is essentially approximating the dispersion map between points $z_k$ and $z_{k+1}$ with a constant average dispersion $\bar D$ that is between the values of $D$ for SMF and DCF. The power injected at the input of the DCF is set small enough to guarantee a quasi-linear transmission regime. As a consequence, the nonlinear step is performed with the coefficient $\bar \gamma =\gamma _{\text {SMF}}$ determined by the SMF. This approximation is accurate, as shown in the numerical simulations that will be presented in Sec. 5. of the paper. The linear and nonlinear steps in the proposed DBP alternate until the algorithm spans over the entire optical link.

3.3 Time domain and frequency domain implementation of DBP

The conventional method for implementing DBP involves using FFT and IFFT for each step, which can be computationally demanding due to the numerous FFTs and IFFTs required. However, considering the relatively low accumulated dispersion at the receiver in DM systems, we are interested in investigating whether a time-domain implementation of DBP could provide a complexity advantage over the frequency-domain approach. In this time-domain implementation, we replace the parameter $\mathbf {B}$ in Eq. (15) by employing an FIR filter with complex-valued taps, denoted as $h_\text {CDC}(\delta _d)$. This filter performs circular convolution with the backpropagating signal to compensate for the dispersion introduced within a step of length $\delta _d$. The DBP step in this case can be represented as [37]

(18)$$\mathbf{q}(z+\delta_d,t) = (\mathbf{q}(z,t)*h_\text{CDC}(\delta_d))\cdot \exp(\alpha\delta_d/2)\cdot\exp({-}j\delta_\text{eff}\varepsilon\gamma||\mathbf{q}||^2),$$

where $h_\text {CDC} = (h_{-F},\dots,h_{-1},h_0,h_1,\dots,h_F)$, and each $h_i$ for $i=1,2,\dots,F$ represents an individual tap of the filter, which has a total of $2F+1$ taps. It is worth noting that the filter taps exhibit symmetry, such that $h_i=h_{-i}$. The minimum number of taps needed in the FIR filter to compensate for dispersion within a DBP step is mainly determined by the length of the impulse response, approximated using the formula provided in [38]

(19)$$\tau_{CD}(\delta_d) = \frac{\lambda_c^2}{c}|D_{acc}|\Delta f,$$

where $|D_{acc}| = \bar D\,\delta _d$ is the total accumulated dispersion inside a step with length $\delta _d$, $\Delta f$ is the signal spectral width for a single channel, $\lambda _c=c/f_c$ is the carrier wavelength, and $c$ is the speed of light. The channel impulse response length $\tau _{CD}(\delta _d)$ is measured in seconds, in which case, the minimum number of filter taps required to perform CDC for a segment of length $\delta _d$ is

(20)$$N_{CDC,\delta_d} = \left\lceil \frac{\tau_{CD}(\delta_d)}{T_{s}}n_s\right\rceil,$$

and $\lceil x\rceil$ denotes the smallest integer larger or equal to $x$, and $n_s$ is the oversampling ratio.

4. Learned digital back-propagation

When designing NNs, incorporating prior knowledge of the system’s model can significantly expedite the training process and improve the convergence rate to a lower local minimum compared to black-box approaches [39]. The similarity between the SSFM and deep feed-forward NNs had been pointed out in the literature [25,28], where both involve alternating between linear matrix multiplication and nonlinear element-wise operator. This similarity can be exploited to design a model-based NN with DBP as a blueprint, which allows for optimizing LDBP filter taps using the SGD algorithm [28].

4.1 Mathematical model of NNs

Deep feed-forward NNs consist of an input layer, output layer, and a cascade of hidden layers each performing a nonlinear transformation on the input of the layer, then passing the output to the next layer. Deep feed-forward NNs can be mathematically represented as a series of alternating linear operations, denoted as $\mathbf {A}^{(k)}$, and element-wise activation function, denoted as $\Phi ^{(k)}$. These operations create a mapping from an input vector $\mathbf {u}\in \mathbb {C}^{N}$ to an output vector $\mathbf {v}\in \mathbb {C}^{N}$, as follows:

(21)$$\mathbf{v} = \Phi^{(N_l)}(\mathbf{A}^{(N_l)}(\Phi^{(N_l-1)}(\dots \mathbf{A}^{(0)}(\mathbf{u})))),$$

where $N_l$ represents the number of layers in the NN, and the superscript $(k)$ denotes the index of the $k$-th layer of the NN. The function performed by the linear operator $\mathbf {A}^{(k)}$ varies depending on the neural network architecture. In the case of fully-connected neural networks (NNs), it performs matrix multiplication. However, in the case of convolutional NNs, it performs convolution.

In our application for signal equalization, we specifically utilize the convolutional NN model. Within this model, the linear operator $\mathbf {A}^{(k)}(\mathbf {c})$ in the $k$-th layer is defined as $\mathbf {A}^{(k)}(\mathbf {c}) = \mathbf {c} \ast \mathbf {\Omega }^{(k)} + \mathbf {b}^{(k)}$, where $\mathbf {\Omega }^{(k)}\in \mathbb {C}^{2F+1}$ represents the convolutional filter, $\mathbf {c} \in \mathbb {C}^{N}$ is the layer’s input, and $\mathbf {b}^{(k)}\in \mathbb {C}^{N}$ is the bias vector.

4.2 Architecture of LDBP

Fig. 3. Block diagram of DBP (upper branch) and LDBP (lower branch) structures. The following symbols represent: $\mathrm {IX}\;:=\;\text {Re}\{\mathbf {X}\}$, $\mathrm {QX}\;:=\;\text {Im}\{\mathbf {X}\}$, $\mathrm {IY}\;:=\;\text {Re}\{\mathbf {Y}\}$, and $\mathrm {QY}\;:=\;\text {Im}\{\mathbf {Y}\}$.

Download Full Size | PDF

The block diagrams of LDBP and its blueprint DBP are depicted in Fig. 3. The input dimension of LDBP is $[N_{ex},N,4]$, and its output dimension is $[N_{ex},N-2N_{CDC,\mathcal {L}}\;,4]$. The first dimension, $N_{ex}$, represents the total number of training examples, the second dimension represents the input and output widths, denoted as $N$ and $N-2N_{CDC,\mathcal {L}}$ respectively, and the last dimension corresponds to the four signals representing the real and imaginary parts of both the $X-$ and $Y-$ polarizations of the signal. The LDBP is a complex-valued NN that comprises two real-valued networks operating jointly. Each of these networks contains $N_l=N_d$ layers and accepts four input vectors ($IX$, $QX$, $IY$, and $QY$). The $k$-th layer of the network consists of two parallel real-valued convolutional filters, $\Omega _R^{(k)}$ and $\Omega _I^{(k)}$, corresponding respectively to the real and imaginary parts of the filter, and a nonlinear function $\Phi$, which takes four input vectors and generates four output vectors. Each of the filters $\Omega _R^{(k)}$ and $\Omega _I^{(k)}$ are respectively initialized with the real and imaginary part of $h_\text {CDC}(\delta _d)$, such that the layers perform the real-valued equivalent of the operation described in Eq. (18). The input of each LDBP layer is zero-padded using the "same" padding option in the TensorFlow convolution function. The number of non-zero weights in the convolutional filters is determined numerically, as will be described in the simulation setup section later. Biases are not utilized in our model and are set to zero.

To train the Learned-DBP, we simulated a five-channel WDM PMD-free transmission of a block of $2^{15}$ symbols at various launch powers. The signal was initially sampled at 16 samples per symbol duration for forward propagation using SSFM. However, after the WDM demultiplexer, the signal from the central channel was sampled at a sampling ratio of $n_s = 2$ samples per symbol, resulting in a received block consisting of $2^{16}$ samples. The NN operates in a sliding window fashion, with a window of size $N$ sliding over the transmission block and advancing by $L\times n_s$, where the sliding factor $L$ is an integer that determines the number of symbols shifted between examples. For LDBP training, we generated input-output pairs, each with a size of $N$=1024 samples (512 symbols) for the input, and the corresponding output was 852 samples (426 symbols) long due to the dispersive channel effects (43 symbols on each side). We set the sliding factor to $L=8$, generating $N_{ex}=4021$ input-output pairs for training. To test LDBP, we generated 8 transmission blocks with PMD using the same overlapping and shifting technique as for training. However, this time, the sliding factor was set to $L = 426$ to ensure that each symbol in the transmission block was detected exactly once. The output of LDBP was then passed to the DSP to equalize PMD and dynamic channel effects.

We implement a symmetric DBP as the blueprint of LDBP, such that all layers are initialized with the corresponding parameters in a linear step of DBP at the full-step size $\delta _d$, except for the first and last layers which correspond to a half-step $\delta _d/2$. The NN is trained by minimizing the mean squared error (MSE) loss function, using the Adam optimizer with a learning rate of 0.001. During training, 20% of the training examples were used for validation to monitor the LDBP’s progress. The LDBP was trained for up to 75 epochs, with an early-stop condition triggered if the validation error did not decrease within 5 epochs. The best-performing epoch’s weights were used in the final LDBP. After training at each launch power, we evaluate the performance of the LDBP by calculating the Q-factor using independently generated testing data.

5. Simulated system setup and performance results

Table 1. Description of the SMF parameters and PMD coefficient for simulated setups.

View Table

The performance results are based on the simulation of the transmission system shown in Fig. 1. All elements of the transmission system, including the transmitter, receiver, and channel, are simulated in Python, while the NN is implemented using the TensorFlow library. In this section, we present the performance results of DBP and LDBP for four different setups: (A), (B), (C), and (D). Setup (A) represents a single-channel transmission system with DP-16-QAM modulation, while setups (B) and (C) are WDM transmission systems with DP-16-QAM and DP-64-QAM modulation formats, respectively. Setup (D) is also a WDM transmission using DP-16-QAM modulation, but it includes aging effects where a fiber channel undergoes aging. The aging study will be described in detail when we present the performance results for this setup. Table 1 provides a detailed description of each setup.

For all setups (A)–(D), the transmission symbol baud rate $B=32$ GBaud for each channel using a root-raised-cosine (RRC) pulse-shape with a roll-off factor $\rho =0.06$. The optical fiber link consists of $N_{sp}=$ $28$ spans, each span including an SMF and a DCF measuring $72$ km and $13$ km, respectively. The length of DCF is chosen such that it compensates for $85$% of the CD in each span. An amplifier with gain $G_{\text {SMF}} = 6.5$ dB is applied at the end of the SMF, and a second amplifier with gain of $G_{\text {DCF}} = 14.4$ dB is applied after the DCF. The SMF parameters and PMD value for all setups are provided in Table 1. The DCF parameters are $\alpha _\text {DCF}=0.5\,\text {dB/km}$, $D_\text {DCF} = -80\,\text {ps/(nm.km)}$, and $\gamma _\text {DCF} = 2.8\,\text {W/km}$. For WDM setups (B)–(D), the WDM channels are separated by a frequency spacing of $37.5$ GHz, resulting in a guard band of 5.5 GHz between adjacent channels. The lasers used for these setups had a linewidth of 50 kHz. To avoid overestimation of nonlinear crosstalk, the data symbols of all WDM channels were intentionally made unsynchronized in terms of time, polarization state, and phase. At the receiver, an RRC filter with a bandwidth of $(1+\rho )B$ is applied to filter out the adjacent channels, such that only the central channel is processed by the DBP or LDBP algorithms. The forward signal propagation using SSFM is simulated with 72 steps for each SMF and 13 steps for each DCF, corresponding to 1 km spatial resolution. The signals are sampled at a rate of 16 samples per symbol. At the receiver, the signal is downsampled to twice the symbol rate before being processed by either the CDC, DBP, or LDBP algorithms. Finally, the output of the CDC, DBP, or LDBP is downsampled to one sample per symbol and processed by the conventional DSP chain, which equalizes PMD effects and polarization mixing.

5.1 DBP parameters optimization

To optimize the performance of the DBP for all transmission scenarios (A)–(D), we select the parameter $\varepsilon$ in Eq. (16) such that each DBP achieves the highest $Q$-factor at the optimal launch power. The $Q$-factor, which is based on the bit-error rate (BER), is defined as follows

(22)$${Q\text{-factor}= 20\log_{10}[\sqrt{2}\text{erfc}^{{-}1}(2 \text{BER})],}$$

where $\text {erfc}(\cdot )$ represents the complementary error function. For a single-channel transmission, we find that the value of $\varepsilon$ is 1 for all values of StpS. For the WDM transmission of 5 channels (set-ups B, C and D), we find that the optimal values are $\varepsilon = 0.85$ for DBP with 1 StpS, $\varepsilon = 0.75$ for DBP with 0.5 StpS, and $\varepsilon = 0.64$ for DBP with 0.25 StpS. However, it should be noted that the value of $\varepsilon$ might vary if fibers with different characteristics, lengths, or residual dispersion values are used compared to those considered in our simulations. Furthermore, It should be noted that $\varepsilon = 0$ corresponds to performing LE, and in this case, all DBP configurations perform similarly to LE, regardless of the number of StpS. The filter size in each DBP step varies depending on the number of StpS. The impact of filter width on performance is shown in Fig. 4. A good trade-off between the number of filter taps and performance across different DBP configurations was determined through numerical optimization. Specifically, $F=24$ was found to be optimal for both 1 StpS and 0.5 StpS DBP, $F=32$ for 0.25 StpS, $F=36$ for 1/7 StpS, and $F=42$ for 1/14 StpS. These results suggest that a larger number of taps, specifically $2F+1 > N_{CDC,\delta _d}$, is required. We attribute this to the accumulation of truncation errors in each DBP step, leading to a significant degradation in performance. It is worth noting that this finding agrees with previous literature, specifically [30], which reported that the filter width has a significant impact on the performance of DBP-based equalizers (see Fig. (9) in [30]).

Fig. 4. Achieved Q-factor by DBP at the optimal launch power (−3 dBm) for different values of StpS as a function of filter taps per step for (a) DP-16-QAM and (b) DP-64-QAM.

Download Full Size | PDF

5.2 Performance comparison of DBP and LDBP in single channel transmission

A single-channel transmission is simulated in setup (A). We choose ($\text {SNR}_{\text {eff}}$) as a performance measurement for single-channel transmission, which is defined as

(23)$$\text{SNR}_{\text{eff}} = \frac{|| \mathbf{\hat{s}_x} ||^2+|| \mathbf{\hat{s}_y} ||^2} {|| \mathbf{s_x} - \mathbf{\hat{s}_x} ||^2 + || \mathbf{s_y}-\mathbf{\hat{s}_y} ||^2},$$

where $\mathbf {\hat {s}_x}$ and $\mathbf {\hat {s}_y}$ denote the estimated symbols at the output of the DSP, with true values $\mathbf {s_x}$ and $\mathbf {s_y}$, respectively. The reason for choosing $\text {SNR}_{\text {eff}}$ is that the BER is very low in the single-channel scenario. This makes accurate measurement of BER difficult, since it requires a large number of simulated symbols. The $\text {SNR}_\text {eff}$ provides a more practical measure of performance under these conditions. The performance of the DM-adapted DBP and LDBP in setup (A) is depicted in Fig. 5 for varying signal launch powers. The PMD-aware DBP in Fig. 5 is a genie-aided model that assumes perfect knowledge of the PMD rotation matrices, DGD parameters, and polarization phase shifts throughout the fiber, specifically affecting the central channel. This information is saved during forward propagation and used during backpropagation with the same step size ($\delta _s$) used in SSFM. Despite its impracticality, the PMD-aware DBP model provides a useful upper bound on the performance that can be achieved by PMD-agnostic DBP and LDBP algorithms. Our simulations consider DBP and LDBP equalizers with a fractional number of StpS or LpS that is less than or equal to 1.

Fig. 5. DBP performance for different StpS values in single-channel transmission over a $28\times 72$ km fiber with 32 Gbaud using DP-16-QAM format (setup A). The performance of LDBP is also shown for 3 launch powers around the peak $\text {SNR}_{\text {eff}}$.

Download Full Size | PDF

At the optimal launch power, LDBP with 1 LpS outperforms DBP with the same complexity by providing an $\text {SNR}_\text {eff}$ of 22.3 dB compared to 19.8 dB, with the optimal launch powers differing by about 1 dBm between the two algorithms. The LE achieves its best performance at an $\text {SNR}{\text {eff}}$ value of 16 dB, which is achieved with a launch power of −4 dBm. Both DBP and LDBP outperform the LE with varying gains, with LDBP with 1 LpS achieving the highest gain of 6.3 dB and DBP with 1 StpS achieving a gain of 3.8 dB. The simulated LDBP with the least complexity has 2 layers (1 full-step and 2 half-steps) and 2 activation functions, and outperforms DBP with similar complexity by 1.4 dB and the LE by 1.8 dB.

5.3 Performance comparison of DBP and LDBP in multi-channel transmission

Setups (B)–(D) present multiple WDM transmission scenarios. In such cases, the nonlinearity affecting the received signal is dominated by the nonlinear interference introduced by the adjacent channels via XPM. Since only the signal from one single channel is fed to the receiver, the information in adjacent channels is unknown to the receiver. Therefore, the nonlinearity generated by adjacent channels impacts all equalizers and limits their performance in the nonlinear regime. The performance of the PMD-aware DBP equalizer in these setups can be characterized by a bell-shaped curve, as seen in Fig. 6 and 7(a), in contrast to the straight line observed in the single-channel scenario (A) shown in Fig. 5.

Fig. 6. Achieved $Q$-factors for DBP and LDBP with different values of StpS and LpS for WDM transmissions over a $28\times 72$ km fiber with 32 Gbaud using (a) DP-16-QAM modulation, and (b) DP-64-QAM modulation.

Download Full Size | PDF

Fig. 7. $Q$-factor results for set-up (D) simulating transmission over a $28\times 72$ km aged fiber at 32 Gbaud. (a) Shows $Q$-factors across launch powers, and (b) displays the impact of laser PN on the peak $Q$-factor of LDBP with varying LpS values for the same set-up.

Download Full Size | PDF

In setups (B)–(D), we employ the $Q$-factor defined in Eq. (22). The simulation results for setups (B) and (C) are presented in Fig. 6. In these setups, the DBP and LDBP algorithms were simulated with varying numbers of total steps or layers ($N_d \in {7,14,28}$).

In setup (B), LDBP with 1 StpS achieved a peak $Q$-factor of 11 dB at the optimal launch power of $P=-3$ dBm. Notably, the peak performance of LDBP with 1 StpS is comparable to that achieved by the PMD-aware DBP at the same launch power. When comparing LDBP to DBP with 1 StpS, LDBP outperformed DBP by 0.3 dB at the same launch power. Furthermore, LDBP with 0.5 StpS and 0.25 StpS achieved a $Q$-factor of $10.8$ dB and $10.7$ dB, respectively, both outperforming DBP with similar numbers of StpS by 0.4 dB.

In setup (C), The LDBP with 1 StpS achieved a peak $Q$-factor of 5.8 dB, which is 0.3 dB higher than the peak performance achieved by the DBP with a similar number of StpS. The LDBP with 0.5 StpS and 0.25 StpS achieved a peak $Q$-factor of 5.8 dB and 5.6 dB, respectively, both outperforming the DBP with a similar number of StpS by 0.5 dB. On the other hand, the LE achieved a peak $Q$-factor of 4.7 dB. Lastly, upon comparing the single-channel scenario in setup (A) with the multi-channel scenario in setup (B), we observe a performance degradation in the WDM case, seen as a drop in the optimal operation power, due to the effect of XPM generated by the adjacent channels, as explained in [40].

5.4 Performance of LDBP in legacy systems

Legacy optical fiber networks deployed for IM/DD communications, especially those utilizing dispersion management, may exhibit higher PMD and attenuation coefficients due to outdated fabrication methods. Additionally, as the fiber ages and undergoes maintenance, these coefficients can further be affected [41,42]. To reflect these changes in a more realistic simulation transmission scenario, we consider the impact of aging and maintenance-induced changes on the fiber link. It is worth noting that the CD coefficient $\beta _2$ and the nonlinearity coefficient $\gamma$ are not affected by aging effects. While only these two coefficients are used to initialize the LDBP, the new attenuation coefficient $\alpha$ in aged fibers can cause a reduction in the effective length $L_\text {eff}$. Thus, retraining the NN with the updated parameters is necessary. Our goal is to examine how these changes affect the LDBP’s performance relative to DBP, and compare the two methods in a realistically simulated transmission scenario that incorporates aging and maintenance-induced changes in the fiber link.

The fiber parameters selected for setup (D) aim to simulate aging effects and consist of an attenuation coefficient of $\alpha = 0.24$ dB/km and a PMD coefficient of $0.3$ ps/km$^{-1/2}$. The value of the attenuation coefficient after degradation is based on experimental findings reported in [42]. The simulation results for this particular setup are presented in Fig. 7. To account for the changes in the fiber link, we retrained the model using data generated from simulations with the new parameters. The LDBP with 1 StpS achieved a peak $Q$-factor of $8.8$ dB at a launch power of $P=-2$ dBm, which is $0.6$ higher than the peak $Q$-factor of the DBP with similar complexity. Furthermore, we compared setup (D) with setup (B), which simulates a similar transmission scenario without aging effects. We observed an average drop in $Q$-factor of 2.2 dB across all graphs, but the $Q$-factor gain of LDBP over DBP was still maintained. This suggests that deploying the LDBP long-term is feasible with retraining of the model. In fact, when comparing the performance of LDBP before and after retraining, an improvement of 0.2 dB in the peak achieved $Q$-factor is observed when the model is retrained with the new parameters.

Fig. 8. Trade-off between complexity and performance for different equalizers in both FD-LDBP (represented by triangles) and TD-LDBP (represented by circles).

Download Full Size | PDF

The existing DM systems deployed for IM/DD may not be optimized for coherent transmission. In general, laser phase noise does not have a significant impact on the system, as the time interval over which the laser phase noise changes is much longer than the symbol period. However, to investigate its impact on the performance of LDBP for set-up (D), we choose to retrain the neural network using data that incorporates varying degrees of laser PN. While it is the primary role of the digital signal processing (DSP) following the LDBP to mitigate dynamic effects like laser PN, when the LDBP is exposed to examples of data affected by PN, it may be possible for the neural network to learn and reduce the impact of PN to some extent. The peak $Q$-factor achieved for different values of LpS and a range of laser linewidths is shown in Fig. 7(b). Since the input vector for LDBP consists of 512 symbols, the simulated laser with a linewidth of 50 kHz exhibits a PN variance of approximately $5\times 10^{-3}$ radian. This PN variance has a negligible impact on the performance. The obtained results demonstrate an average $Q$-factor drop of 0.1 dB for a laser linewidth of 100 kHz and 0.2 dB for a laser linewidth of 200 kHz.

6. Complexity

We measure the complexity of each equalizer in terms of real multiplications (RMs) per detected symbol (RMpS), excluding additions. The complexity calculation does not include the training process of the LDBP model, which occurs only once during model deployment. This is because the LDBP model is trained on PMD-free signals, focusing solely on mitigating nonlinear effects. As a result, both DBP and LDBP have the same complexity when considering the same number of StpS and LpS, and the complexity formulas derived for LDBP in this section are identical to those for DBP.

To efficiently compute the exponential function in the activation function, various approximation algorithms can be employed, such as the CORDIC algorithm [43,44]. These algorithms utilize look-up tables and bit-shifts, eliminating the need for multiplications. By employing similar algorithms, the activation function can be computed efficiently, with each computation of the activation requiring only 9 RMs. During LDBP inference, the linear step of LDBP, which involves a convolution in the time domain, can be alternatively performed using FFT/IFFT, referred to as FD-LDBP. However, it is important to note that the filter taps are always trained in the time domain. In the following section, we discuss the complexities of TD-LDBP and FD-LDBP and provide their respective complexity formulas as follows:

FD-LDBP complexity The total complexity of FD-LDBP per detected symbol, can be measured in RMpS for a signal with input size $N$ and sampled at rate $n_s$ samples/symbol. The complexity is given by [39]

(24)$$C_{\text{FD-{-}LDBP}} = \left(N_d+1\right)\left(4\frac{N(\log_2(N)+1)n_s}{N-2N_{CDC,\mathcal{L}}+1}\right) + \frac{9}{2}N_dn_s.$$

TD-LDBP complexity The complexity of TD-LDBP for each detected symbol is determined by the convolution of a filter with size $2F+1$ with an input of size $N$, which involves $4(2F+1)N$ RMs. This computation assumes a dilation of 1, stride of 1, and padding of $2F$, resulting in an output size that is the same as the input size for each convolutional layer. Therefore, the total complexity per detected symbol for TD-LDBP can be calculated as

(25)$$C_{\text{TD-{-}LDBP}} = \left(N_d+1\right)\left(4\frac{(2F+1)Nn_s}{N-2N_{CDC,\mathcal{L}}+1}\right) + \frac{9}{2}N_dn_s.$$

Figure 8 depicts the RMpS complexity of DBP and LDBP using both TD and FD implementations. The complexity of FD-LDBP (or FD-DBP) is primarily dependent on the number of steps involved in the algorithm, and can be approximated by 114 times the number of FFT/IFFT uses. The complexity of 1 LpS FD-LDBP, which uses 29 FFT/IFFT pairs, is approximately 3300 RMpS. On the other hand, TD-LDBP with the same number of LpS has a complexity of around 14000 RMpS. In fact, FD-LDBP is less computationally complex compared to TD-LDBP across all values of LpS. We initially hypothesized that, due to the lower accumulated dispersion in the DM system compared to NDM, smaller filters would be sufficient for each linear step in the DM system. However, our results indicate that TD implementation did not provide any complexity advantage over FD implementation. We hypothesize that this is due to the accumulation of truncation errors with each subsequent step in the DBP, resulting in a high overall error. As a result, larger filters are required to mitigate the truncation error and reduce its impact. We determined the number of filter taps required in each step through numerical simulations, as demonstrated by Fig. 4. Nevertheless, TD implementation restricts the neural network to a lower number of trainable parameters, which simplifies the training process and enhances training convergence even with smaller training datasets.

7. Conclusions

In this paper, we present the LDBP approach for mitigating nonlinear effects in DM optical fiber transmission systems. LDBP leverages NN’s training algorithms to optimize DBP. Our comparative study has shown that LDBP outperforms DBP, providing a significant gain in Q-factor, with an average improvement of 0.4 dB. The application of NNs to DM links is an important new development, as it demonstrates the possibility of repurposing DM systems for coherent transmission. The results have significant implications for the fiber-optics industry, suggesting that data rates in conventional DM optical links can be substantially improved using modern, simple coherent receivers. Additionally, we demonstrated the complexity of both DBP and LDBP in the TD and FD implementations. TD-implementation was found to be more complex across all values of LpS. Overall, our findings highlight the potential of LDBP as an effective method for mitigating nonlinear effects in DM optical fiber transmission systems.

Funding

H2020 Marie Skłodowska-Curie Actions (813144); H2020 European Research Council (805195); H2020 LEIT Information and Communication Technologies (101016663).

Acknowledgments

This work has received funding from the EU Horizon 2020 program under the Marie Skłodowska-Curie grant agreements 813144, and the European Research Council (ERC) research and innovation programme, Grant Agreement No. 805195. A. Napoli, N. Costa, J. Pedro, and B. Spinnler would like to thank the European Commission for funding their activities through the H2020 B5G-OPEN (G.A. 101016663).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. Napoli, Z. Maalej, V. A. Sleiffer, M. Kuschnerov, D. Rafique, E. Timmers, B. Spinnler, T. Rahman, L. D. Coelho, and N. Hanik, “Reduced Complexity Digital Back-propagation Methods for Optical Communication Systems,” J. Lightwave Technol. 32(7), 1351–1362 (2014). [CrossRef]

2. S. J. Savory, “Digital Coherent Optical Receivers: Algorithms and Subsystems,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1164–1179 (2010). [CrossRef]

3. C. Catanese, A. Triki, E. Pincemin, and Y. Jaouën, “A Survey of Neural Network Applications in Fiber Nonlinearity Mitigation,” in Proc. 21st Int. Conf. Transp. Opt. Netw., (Angers, France, 2019), pp. 1–4.

4. O. Sidelnikov, A. Redyuk, and S. Sygletos, “Equalization Performance and Complexity Analysis of Dynamic Deep Neural Networks in Long Haul Transmission Systems,” Opt. Express 26(25), 32765–32776 (2018). [CrossRef]

5. Y. Gao, F. Zhang, L. Dou, Z. Chen, and A. Xu, “Intra-channel Nonlinearities Mitigation in Pseudo-linear Coherent QPSK Transmission Systems via Nonlinear Electrical Equalizer,” Opt. Commun. 282(12), 2421–2425 (2009). [CrossRef]

6. E. Ip and J. M. Kahn, “Compensation of Dispersion and Nonlinear Impairments Using Digital Backpropagation,” J. Lightwave Technol. 26(20), 3416–3425 (2008). [CrossRef]

7. D. S. Millar, S. Makovejs, C. Behrens, S. Hellerbrand, R. I. Killey, P. Bayvel, and S. J. Savory, “Mitigation of Fiber Nonlinearity Using a Digital Coherent Receiver,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1217–1226 (2010). [CrossRef]

8. E. F. Mateo, F. Yaman, and G. Li, “Efficient Compensation of Inter-channel Nonlinear Effects via Digital Backward Propagation in WDM Optical Transmission,” Opt. Express 18(14), 15144–15154 (2010). [CrossRef]

9. C. B. Czegledi, G. Liga, D. Lavery, M. Karlsson, E. Agrell, S. J. Savory, and P. Bayvel, “Digital Backpropagation Accounting for Polarization-Mode Dispersion,” Opt. Express 25(3), 1903–1915 (2017). [CrossRef]

10. B. Schmauss, C.-Y. Lin, and R. Asif, “Progress in Digital Backward Propagation,” in Proc. Eur. Conf. Opt. Commun., (Amsterdam, Netherlands, 2012), pp. 1–3.

11. K. Goroshko, H. Louchet, and A. Richter, “Overcoming Performance Limitations of Digital Back Propagation Due to Polarization Mode Dispersion,” in Proc. 18th Int. Conf. Transp. Opt. Netw., (Trento, Italy, 2016), pp. 1–4.

12. C.-Y. Lin, M. Holtmannspoetter, M. R. Asif, and B. Schmauss, “Compensation of Transmission Impairments by Digital Backward Propagation for Different Link Designs,” in Proc. Eur. Conf. Opt. Commun., (Turin, Italy, 2010), pp. 1–3.

13. E. Temprana, E. Myslivets, L. Liu, V. Ataie, A. Wiberg, B. Kuo, N. Alic, and S. Radic, “Two-Fold Transmission Reach Enhancement Enabled by Transmitter-Side Digital Backpropagation and Optical Frequency Comb-Derived Information Carriers,” Opt. Express 23(16), 20774–20783 (2015). [CrossRef]

14. J. Shao, S. Kumar, and X. Liang, “Digital Back Propagation With Optimal Step Size for Polarization Multiplexed Transmission,” IEEE Photonics Technol. Lett. 25(23), 2327–2330 (2013). [CrossRef]

15. X. Liang and S. Kumar, “Correlated Digital Back Propagation Based on Perturbation Theory,” Opt. Express 23(11), 14655–14665 (2015). [CrossRef]

16. L. Zhu and G. Li, “Folded Digital Backward Propagation for Dispersion-Managed Fiber-Optic Transmission,” Opt. Express 19(7), 5953–5959 (2011). [CrossRef]

17. J. K. Fischer, C.-A. Bunge, and K. Petermann, “Equivalent Single-Span Model for Dispersion-Managed Fiber-Optic Transmission Systems,” J. Lightwave Technol. 27(16), 3425–3432 (2009). [CrossRef]

18. L. Zhu and G. Li, “Nonlinearity Compensation Using Dispersion-Folded Digital Backward Propagation,” Opt. Express 20(13), 14362–14370 (2012). [CrossRef]

19. X. Liu, S. Chandrasekhar, P. J. Winzer, B. Maheux-L, G. Brochu, and F. Trepanier, “Efficient Fiber Nonlinearity Mitigation in 50-GHz-DWDM Transmission of 256-Gb/s PDM-16QAM Signals by Folded Digital-back-propagation and Channelized FBG-DCMs,” in Proc. Opt. Fiber Conf., (San Francisco, CA, USA, 2014), pp. 1–3.

20. L. B. Du and A. J. Lowery, “Improved Single Channel Backpropagation for Intra-Channel Fiber Nonlinearity Compensation in Long-Haul Optical Communication Systems,” Opt. Express 18(16), 17075–17088 (2010). [CrossRef]

21. D. Rafique, M. Mussolin, M. Forzati, J. Mårtensson, M. N. Chugtai, and A. D. Ellis, “Compensation of Intra-Channel Nonlinear Fibre Impairments Using Simplified Digital Back-Propagation Algorithm,” Opt. Express 19(10), 9453–9460 (2011). [CrossRef]

22. M. Secondini, D. Marsella, and E. Forestieri, “Enhanced Split-Step Fourier Method for Digital Backpropagation,” in Proc. Eur. Conf. Opt. Commun., (Cannes, France, 2014), pp. 1–3.

23. M. Secondini, S. Rommel, G. Meloni, F. Fresi, E. Forestieri, and L. Poti, “Single-Step Digital Backpropagation for Nonlinearity Mitigation,” Photonics Netw. Commun. 31(3), 493–502 (2016). [CrossRef]

24. S. Civelli, E. Forestieri, A. Lotsmanov, D. Razdoburdin, and M. Secondini, “Coupled-Channel Enhanced SSFM for Digital Backpropagation in WDM Systems,” in Proc. Opt. Fiber Conf., (San Francisco, CA, USA, 2021), pp. 1–3.

25. R. M. Bätler, C. Häger, H. D. Pfister, G. Liga, and A. Alvarado, “Model-Based Machine Learning for Joint Digital Backpropagation and PMD Compensation,” J. Lightwave Technol. 39(4), 949–959 (2021). [CrossRef]

26. C. Häger and H. D. Pfister, “Physics-Based Deep Learning for Fiber-Optic Communication Systems,” IEEE J. Select. Areas Commun. 39(1), 280–294 (2021). [CrossRef]

27. T. Inoue, R. Matsumoto, and S. Namiki, “Learning-based Digital Back Propagation to Compensate for Fiber Nonlinearity Considering Self-phase and Cross-phase Modulation for Wavelength-Division Multiplexed Systems,” Opt. Express 30(9), 14851–14872 (2022). [CrossRef]

28. C. Häger and H. D. Pfister, “Nonlinear Interference Mitigation via Deep Neural Networks,” in Proc. Opt. Fiber Conf., (San Diego, CA, USA, 2018), pp. 1–3.

29. M. Abu-romoh, N. Costa, A. Napoli, B. Spinnler, Y. Jaouën, and M. Yousefi, “Learned Digital Back-Propagation for Dual-Polarization Dispersion Managed Systems,” in Proc. Eur. Conf. Opt. Commun., (Basel, Switzerland, 2022), p. We1C.6.

30. Q. Fan, C. Lu, and A. P. T. Lau, “Combined Neural Network and Adaptive DSP Training for Long-Haul Optical Communications,” J. Lightwave Technol. 39(22), 7083–7091 (2021). [CrossRef]

31. G. P. Agrawal, “Chapter 6 - polarization effects,” in Nonlinear Fiber Optics (Fifth Edition), G. Agrawal, ed. (Academic Press, Boston, 2013), Optics and Photonics, pp. 193–244, 5 edition ed.

32. T. Pfau, S. Hoffmann, and R. Noe, “Hardware-Efficient Coherent Digital Receiver Concept With Feedforward Carrier Recovery for m -QAM Constellations,” J. Lightwave Technol. 27(8), 989–999 (2009). [CrossRef]

33. T. Pfau and R. Noé, “Phase-Noise-Tolerant Two-Stage Carrier Recovery Concept for Higher Order QAM Formats,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1210–1216 (2010). [CrossRef]

34. G. Agrawal, Fiber-Optic Communication Systems (Wiley, 2021).

35. D. Marcuse, C. Manyuk, and P. Wai, “Application of the Manakov-PMD Equation to Studies of Signal Propagation in Optical Fibers with Randomly Varying Birefringence,” J. Lightwave Technol. 15(9), 1735–1746 (1997). [CrossRef]

36. Y. Gao, J. H. Ke, K. P. Zhong, J. C. Cartledge, and S. S.-H. Yam, “Assessment of Intrachannel Nonlinear Compensation for 112 Gb/s Dual-Polarization 16QAM Systems,” J. Lightwave Technol. 30(24), 3902–3910 (2012). [CrossRef]

37. C. Fougstedt, M. Mazur, L. Svensson, H. Eliasson, M. Karlsson, and P. Larsson-Edefors, “Time-domain Digital Back Propagation: Algorithm and Finite-precision Implementation Aspects,” in Proc. Opt. Fiber Conf., (Los Angeles, CA, USA, 2017), pp. 1–3.

38. B. Spinnler, “Equalizer Design and Complexity for Digital Coherent Receivers,” IEEE J. Sel. Top. Quantum Electron. 16(5), 1180–1192 (2010). [CrossRef]

39. O. Sidelnikov, A. Redyuk, S. Sygletos, M. Fedoruk, and S. Turitsyn, “Advanced Convolutional Neural Networks for Nonlinearity Mitigation in Long-Haul WDM Transmission Systems,” J. Lightwave Technol. 39(8), 2397–2406 (2021). [CrossRef]

40. R.-J. Essiambre, G. Kramer, P. J. Winzer, G. J. Foschini, and B. Goebel, “Capacity Limits of Optical Fiber Networks,” J. Lightwave Technol. 28(4), 662–701 (2010). [CrossRef]

41. B. Bakhshi, L. Rahman, G. Mohs, M. Vaa, W. Patterson, J. Cai, A. Lucero, E. Golovchenko, and S. Abbott, “Impact of Fiber Aging and Cable Repair in an Installed 28-nm Transatlantic 96 x 10 Gb/s DWDM System,” in OFC/NFOEC Technical Digest. Opt. Fiber Commun. Conf., 2005., vol. 1 (2005), pp. 169–171.

42. J. Bohata, J. Jaros, S. Pisarik, S. Zvanovec, and M. Komanec, “Long-Term Polarization Mode Dispersion Evolution and Accelerated Aging in Old Optical Cables,” IEEE Photonics Technol. Lett. 29(6), 519–522 (2017). [CrossRef]

43. P. K. Meher, J. Valls, T.-B. Juang, K. Sridharan, and K. Maharatna, “50 Years of CORDIC: Algorithms, Architectures, and Applications,” IEEE Trans. Circuits Syst. I 56(9), 1893–1907 (2009). [CrossRef]

44. M. Garrido, P. Källström, M. Kumm, and O. Gustafsson, “CORDIC II: A New Improved CORDIC Algorithm,” IEEE Trans. Circuits Syst. II 63(2), 186–190 (2016). [CrossRef]

	# of channels	Modulation format	fiber coefficients
Setup (A)	1	DP- $16$ -QAM	$α$ =0.2 dB/km, PMD = 0.05 ps/ $\sqrt{km}$ ,
Setup (A)	1	DP- $16$ -QAM	$D$ =17 ps/(nm.km), $γ$ = 1.4/W/km
Setup (B)	5	DP- $16$ -QAM	Same as above
Setup (C)	5	DP- $64$ -QAM	Same as above
Setup (D)	5	DP- $16$ -QAM	$α$ =0.24 dB/km, PMD = 0.3 ps/ $\sqrt{km}$ ,
Setup (D)	5	DP- $16$ -QAM	$D$ =17 ps/(nm.km), $γ$ = 1.4/W/km

Equalization in dispersion-managed systems using learned digital back-propagation

Abstract

1. Introduction

2. Dual-polarization optical fiber system model

2.1 Transmitter and channel

2.2 Split-step Fourier method

2.3 Receiver

3. Digital back-propagation

3.1 Mathematical model

3.2 DBP adaptation to DM systems

3.3 Time domain and frequency domain implementation of DBP

4. Learned digital back-propagation

4.1 Mathematical model of NNs

4.2 Architecture of LDBP

5. Simulated system setup and performance results

5.1 DBP parameters optimization

5.2 Performance comparison of DBP and LDBP in single channel transmission

5.3 Performance comparison of DBP and LDBP in multi-channel transmission

5.4 Performance of LDBP in legacy systems

6. Complexity

7. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (1)

Equations (27)

Optics Continuum