## Abstract

A pruning method of artificial neural network based nonlinear equalizer (ANN-NLE) is proposed and validated for single-sideband 4-ary pulse amplitude modulation (SSB-PAM4) in IM/DD system. As a classifier, ANN is capable to form a complex nonlinear boundary among different classifications, which is considered as an appropriate way to mitigate the nonlinear impairments in optical communication system. In this paper, first, we introduce the operation principle of the traditional linear equalizer (LE) and NLE such as volterra equalizer (VE). Then we make an analogy among the LE, VE and ANN-NLE. After that, a novel pruning method is applied to reduce the complexity of ANN. The BER performance of ANN-NLE outperforms VE after fiber transmission. After 60 km fiber transmission, ANN-NLE decreases the BER by about one order of magnitude compared to VE. By implementing the proposed pruning method, the connections of ANN reduced by a factor of 10x while keeping the BER under the threshold of 3.8x10^{−3}.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

The emergence of big data, internet of things and artificial intelligence-intensive services have led to the explosive growth of cloud-based IP traffic, which results in the dramatic capacity requirements in data centers. The sixth Cisco Global Cloud Index (2015-2020) shows that the amount of annual data center traffic will triple to reach 15.3 zettabytes (ZB) per year by 2020 as compared to 4.7 zettabytes (ZB) in 2015 [1]. For short-reach 100 Gb/s applications, like inter- and intra- data center interconnects (DCIs), solutions that use intensity modulation with direct detection (IM/DD) are seen as more practical [2]. To reduce the baud rate and the bandwidth requirements for optical and electronic components, many advanced modulation formats such as pulse amplitude modulation (PAM), discrete multi-tone (DMT) modulation and IM/DD based quadrature amplitude modulation (QAM) by using Kramers-Kronig (KK) coherent receiver have gained significant interests [3–9]. Considering the system implementation and power consumption, 4-ary pulse amplitude modulation (PAM4) is a more attractive advanced modulation format for 100 Gb/s, 200 Gb/s and 400 Gb/s short reach applications [10]. For C-band PAM4 system, the signal distortion induced by chromatic dispersion (CD) is more serious than O-band, which significantly decrease the system performance. In the view of flexible deployment for both single-lane and wavelength division multiplexed (WDM) systems, single-sideband (SSB) modulation is more suitable than dispersion compensation fiber (DCF). Since a commercially available dual-drive Mach-Zehnder modulator (DD-MZM) can be used to generate SSB signal without much complexity, many experiments of DD-MZM based SSB transmission over C-band have been done [11–14].

For high-speed PAM4 transmission, limited bandwidth and nonlinear impairments of fiber and electro-optical components can severely limit the transmission distance as well as the achievable capacity [15]. Therefore, it is crucial to compensate the bandwidth limitation and mitigate the nonlinearity of the system. In our previous work [12], linear pre-equalization is applied to compensate the bandwidth limitation. As to the nonlinear impairments of the system, nonlinear equalizer (NLE) such as Volterra equalizer (VE) has been proposed to mitigate it [11]. However, the complexity of three-order VE is proportional to O(N^{3}), where N is the memory length of input samples. It requires costly computation power for signal processing [16]. Recently, with the rise of machine learning (ML), lots of ML algorithms is applied in optical communication. In [17–19], support vector machines (SVM) and artificial neural network (ANN) are experimentally demonstrated for in-band optical signal to noise ratio (OSNR) estimation and modulation format classification. In [20], SVM is used as a performance monitoring and failure prediction method in optical networks. Since ANN is capable to form a complex nonlinear boundary among different classifications, it is considered as an appropriate way to mitigate the nonlinear impairments in optical communication system. In recent studies [16, 21–25], ANN is widely used as a NLE in radio over fiber (RoF) system, coherent reception system and IM/DD system. However, all of the ANNs applied in these work is fully connected, which produce huge computation load.

In this paper, we design an ANN based NLE (ANN-NLE) for an IM/DD system. Since the fully-connected ANN produce huge computation load, a novel pruning method is applied to reduce the complexity of ANN. An experiment of 112 Gb/s SSB-PAM4 signal transmission over 80 km dispersion-uncompensated SSMF is designed and implemented to evaluate the performance of the proposed method. As a comparison, LE and VE are applied as different digital signal processing (DSP) schemes. Our experimental results show that ANN-NLE outperforms VE after fiber transmission and the longer the transmission distance is, the better the performance improves. Besides that, the ANN-NLE can well combat the nonlinearity induced by fiber, which results the same BER performance for different fiber reach at same OSNR. By implementing the pruning method, the connections of ANN reduced by a factor of 10x while keeping the BER under the 7% overhead hard-decision forward-error correction with bit error rate (BER) threshold of 3.8x10^{−3}.

## 2. Principles of equalizer

#### A. Principles of LE, VE and ANN-NLE

The LE, such as feedforward equalizer shown in Fig. 1(a), is always used to combat the linear impairments in optical system. However, it can only provide limited capability in overcoming the nonlinearity induced by fiber, electro-optic modulation and signal-signal beating interference (SSBI) after square-law detection [11]. As to the nonlinearity in optical communication system, Volterra series is considered an appropriate way to describe it [26]. The output of LE with memory length *L* and *k*-order VE is expressed as Eq. (1) and Eq. (2), respectively.

*x*(

*n*) is the

*n*-th sample of the received signal,

*h*(

_{w}*n*) is the

*n*-th sample of the output signal after equalizer,

*w*(·) is the

_{i}*i*-th order kernel,

*L*is the

_{i}*i*-th order memory length. As can be seen from the equations, the VE considers high times of the input samples rather than only first degree in LE. By this way, a complex nonlinear function can be fitted to model the transmission system. Although the nonlinearity can be accurate modeled when the

*k*is large enough, the computation complexity grows rapidly at the same time. In order to make a tradeoff between the computation complexity and accuracy, three-order VE is enough to model the transmission system [26]. However, the computation complexity of three-order VE with long memory length

*N*(proportional to O(N

^{3})) is still too large for applications.

Recently, with the rise of AI, ANN is proved as a high accuracy classifier in image recognition and natural language processing area [27, 28]. Figure 1(b) shows the schematic of ANN-NLE. As shown in the Fig. 1(b), each circle represents a neuron and it can be model as a logistic unit. The output of a neuron can be expressed as:

Where*x*is the input to the neuron,

_{i}*w*is the corresponding weight and

_{i}*f*(

*x*) is the activation function of the neuron. Since the input features to ANN only consider the first degree of the input samples (same as LE), activation function is used to add nonlinear factor for ANN. Compared with ANN, LE can be regard as a single neuron machine with pure-linear function as the activation function. The pure-linear function is shown in Fig. 1(c). For hidden layers in ANN, Rectified Linear Units (ReLU), as shown in Fig. 1(d), is always applied as activation function. It is more like a real neuron in our body and results in much faster training for ANN compared to sigmoid function or Tanh function. For output layer, softmax function, as shown in Fig. 1(e), is commonly selected as the activation function. The output of softmax function is a categorical probability distribution, which reveals the probability that any of the classes are true [29]. For example, a four-output ANN with a column vector

*h*(

_{w}*x*) = [0.7, 0.15, 0.05, 0.1]

^{T}means that the output has 70 percent probability belong to the first class.

For the adaption of LE and VE, decision-directed least-mean-square (DD-LMS) algorithm and recursive least square (RLS) algorithm can be selected. In our system, DD-LMS is used due to its low complexity and high numerical stability. DD-LMS algorithm aims to minimize the mean square error (MSE) by using gradient descent method. The cost function for LE/VE at the *n*-th iteration can be expressed as:

*h*(

_{w}*x*(

*n*)) is the output of equalizer,

*y*(

*n*) is the reference value of

*x*(

*n*). The weight vector of DD-LMS at the

*n*-th iteration is expressed as:Where

*μ*is the step size,

**w**(

*n*) is the weight vector,

**e**(

*n*) is the error signal vector,

**x**(

*n*) is the input signal vector and (·)* represents the complex conjugate matrix of (·). As to the training processing for ANN-NLE in our system, MSE is also selected as the optimization criterion. The cost function for ANN-NLE at the

*n*-th iteration can be expressed as:

*K*represents the number of classes, [

*h*(

_{w}*x*(

*n*))]

*is the output of ANN-NLE belongs to class*

_{k}*k*and

*y*(

_{k}*n*) is the reference value of

*x*(

*n*) which belongs to the class

*k*. In our system, gradient descent method is also used to minimize the cost function for ANN-NLE. The expression is:In order to compute the partial derivative term for ANN, back-propagation (BP) algorithm is applied [30]. According to the Eq. (4) to (7), we can find the adaption processing for LE /VE and the training processing for ANN-NLE are similar.

#### B. Principle of ANN Pruning

For fully connected ANN, each neuron is connected to every neuron in the previous layer and the latter layer, as shown in Fig. 2(b). This is the most common ANN structure. However, there is significant redundancy for fully connected ANN, which results in a waste of both computation and memory [31]. Therefore, pruning the network connections is significant and necessary when applying ANN as a NLE.

The pipeline for ANN pruning is shown in Fig. 2(a). First, an initial training processing is implemented to get the fully connected ANN. Then, a weight threshold λ is set and the connections whose weight lower than this threshold is pruned. After that, the important connections are kept and the unimportant connections are removed. However, system’s performance decrease rapidly with the increase of λ. In [32], the retrain processing is employed to solve this problem. After retraining, the remaining connections can compensate for the connections been removed. Since we retrain the weights from the survived pruning (rather than re-initialize the weights), the retraining processing is much faster than the former training processing. By iteratively implementing the pruning processing and retraining processing, we got the sparsely connected ANN with right connections. The sketch map of sparsely connected ANN is shown in Fig. 2(c).

## 3. Experimental setup

The experimental setup is shown in Fig. 3. A continuous-wave (CW) optical carrier at 1551.76 nm generated from a laser diode (LD) is injected to a 25 GHz DD-MZM (FUJITSU FTM7937EZ202) biased at its quadrature point. The single-drive half-wave voltage Vpi of the DD-MZM is roughly 7 V. The PAM4 signal with pattern length of 2^{13} symbols is generated by a 65 GSa/s arbitrary waveform generator (AWG, Keysight M8195A) with a 20 GHz analog bandwidth. The two PAM4 signals with a peak-to-peak voltage of ~1 V drive the two arms of the DD-MZM respectively. The generated SSB-PAM4 optical signal is launched to a spool of dispersion-uncompensated SSMF with up to 80 km reach. Since the output optical power of DD-MZM is low, an Erbium-doped fiber amplifier (EDFA, Amonics AEDFA-23-B-FA) is used to adjust the launched optical power. After fiber transmission, another EDFA and a variable optical attenuator (VOA) is employed to load optical noise and adjust the optical power to a certain level. A 1-nm optical band-pass filter (OBPF) is inserted after the EDFA to suppress the amplified spontaneous emission (ASE) noise. The resulting OSNR is measured by an optical spectrum analyzer (OSA). After a 50 GHz PIN photodetector (PD, u2t XPDV2120R), the electrical signal is sampled by a 160 GSa/s digital sampling oscilloscope (DSO, Lecroy LM10-59Zi-A) with a 59 GHz analog bandwidth for offline processing.

To investigate the efficiency of ANN-NLE in our system, three different DSP schemes are compared. DSP-1 and DSP-2 deploy LMS-based LE and VE, respectively. At first, the data stream is resampled to two samples per symbol to enable the proposed equalizer. After that, the equalizer working on training mode is applied to equalize the signals. Since the decision part is integrated in the equalizer as shown in Fig. 1(a), Gray-demapping and bit error rate (BER) calculation are directly following the equalizer. In DSP-3, the data stream is first resampled to two sample per symbol and preprocessed by LE. After that, ANN-NLE is employed. Since the output of ANN-NLE reveals the probability that any of the classes are true, decision part is employed to decide the output.

In our experiment, linear pre-equalization is applied to compensate the bandwidth limitation. Therefore, in Optical back to back (Opt-B2B) case, we first transmitted an impulse probe signal with a repetition rate of 100 MHz through the AWG. Inset (a) in Fig. 3 is the impulse probe signal of AWG and inset (b) in Fig. 3 is the received signal of DSO. The envelope of the inset (b) is the channel frequency response. After getting the channel response, we pre-equalized the transmitted signal by using the inverse of it. In the following experiments, the pre-equalized signal is transmitted through the system.

## 4. Results and discussion

The measured frequency response of the system is shown in Fig. 4(a). Since the frequency response is got by transmitting an impulse probe signal, the bandwidth limitation from the electrical and optical components is all measured. As can be seen from Fig. 4(a), the 10 dB bandwidth is roughly 20 GHz. The optical spectra of SSB-PAM4 signal with and without pre-equalization are plotted in Fig. 4(b). A sideband suppression ratio of roughly 20 dB is achieved.

After optimizing the equalizer parameters, the memory lengths of LE in DSP-1 and DSP-3 are both set as 91 and the involved samples L_{1}, L_{2}, L_{3} of VE are set as 91, 21, 11, respectively. As to the ANN-NLE, the input layer is designed with 51 neurons, the single hidden layer is also assigned with 51 neurons. The activation function ReLU and softmax function are chosen for hidden layer and output layer, respectively.

Figure 5 shows the BER as a function of the received optical power to PD for Opt-B2B case. The performance improvement provided by proposed VE/ANN-NLE indicating a successful compensation of the nonlinearity mainly from SSBI and electro-optic modulation [11]. The further performance improvement provided by pre-equalization indicating a successful compensation of the bandwidth limitation. With ANN-NLE and pre-equalization, a BER below 1x10^{−5} is achieved when the injected power to PD is 7 dBm. Figure 6 shows the BER performance after 80 km dispersion-uncompensated SSMF transmission as a function of the launched optical power. It is obvious that ANN-NLE outperforms the VE after 80 km SSMF transmission. We can see that the optimum optical power launched to fiber is 9 dBm for 80 km SSMF transmission. Therefore, in the following experiments, we keep the injected optical power to PD as 7 dBm and the launched power as 9 dBm for each case.

Furthermore, we carry out a series of experiments for different fiber reach with the optimized launched optical power. As can be seen from Fig. 7, the BER performance of ANN-NLE is more flat than VE with the increase of fiber reach and it outperforms VE after fiber transmission. Besides that, ANN-NLE decreases the BER by about one order of magnitude compared to VE after 60 km fiber transmission. Due to the interaction of fiber nonlinearity and dispersion, stochastic linear and nonlinear effects, Volterra series cannot describe the system accurately. However, ANN can build a more complex model to describe the system with the help of multi-neurons. Therefore, ANN-NLE is more suitable for longer fiber reach transmission with complex nonlinearity.

Since optical amplification is required in our system, OSNR is of important concern. Figure 8 shows the BER performance versus OSNR after 80 km SSMF. Without NLE, the BER performance cannot reach 7% hard-decision forward-error correction (FEC) threshold. The required OSNRs to reach FEC threshold for VE and ANN-NLE are 39 dB and 35.5 dB, respectively. Figure 9 shows the BER performance versus OSNR for different fiber reach, the required OSNRs of VE to reach FEC threshold with fiber reach of 0, 40 and 80 km are 36, 37 and 39 dB, respectively. As to ANN-NLE, the required OSNRs to reach FEC threshold for different fiber reach are roughly the same, about 35 dB. At same OSNR, the BER performance of VE decreases with the increase of fiber reach. However, the ANN-NLE can well combat the nonlinearity induced by fiber, which results the same BER performance for different fiber reach at same OSNR. Thus, we can conclude that the performance of VE is mainly limited by OSNR as well as fiber nonlinearity and ANN-NLE is mainly limited by OSNR.

After comparing fully-connected ANN-NLE with VE, we implement proposed pruning method to reduce the complexity of ANN. Figure 10 shows the BER performance and connections as functions of pruning weight threshold λ for different fiber reach. The retraining processing improves the BER performance significantly, and the connections reduce at the same time. By selecting different weight threshold, a tradeoff between BER performance and computation complexity can be made. Figure 11 shows the BER performance and connections of fully-connected ANN-NLE and sparsely-connected ANN-NLE. It is noted that sparsely-connected ANN-NLE reduces the number of connections by a factor of 10x while keeping the BER under the same threshold.

## 5. Conclusion

An ANN based NLE is proposed to compensate the nonlinear impairments in IM/DD system. Compared to traditional NLE, like VE, ANN-NLE can achieve better BER performance at fiber transmission case, which the nonlinearity cannot be accurately described by Volterra series. As a result, the performance of ANN-NLE is mainly limited by OSNR. To reduce the computation complexity and redundancy brought by fully-connected ANN, a novel pruning method is proposed to reduce the complexity of ANN. By implementing this method, the connections of ANN reduced by a factor of 10x while keeping the BER under the threshold. In light of the proposed scheme, 112 Gb/s SSB-PAM4 signal transmission over 80 km dispersion-uncompensated SSMF was experimentally demonstrated.

## Funding

National Natural Science Foundation of China (NSFC) Project (No. 61431003, 61601049, 61625104), Beijing University of Posts and Telecommunications (IPOC2017ZT08), BUPT Excellent Ph.D. Students Foundation, Shanghai Jiao Tong University.

## References and links

**1. **Cisco, “Cisco global cloud index: forecast and methodology, 2015-2020”, (Cisco White Paper, 2016). http://www.cisco.com/c/dam/en/us/solutions/collateral/service-provider/global-cloud-index-gci/white-paper-c11-738085.pdf.

**2. **J. C. Cartledge and A. S. Karar, “100 Gb/s intensity modulation and direct detection,” J. Lightwave Technol. **32**(16), 2809–2814 (2014).

**3. **M. Zhu, J. Zhang, X. Yi, Y. Song, B. Xu, X. Li, X. Du, and K. Qiu, “Hilbert superposition and modified signal-to-signal beating interference cancellation for single side-band optical NPAM-4 direct-detection system,” Opt. Express **25**(11), 12622–12631 (2017). [PubMed]

**4. **L. Shu, J. Li, Z. Wan, F. Gao, S. Fu, X. Li, Q. Yang, and K. Xu, “Single-lane 112-Gbit/s SSB-PAM4 transmission with dual-drive MZM and Kramers–Kronig detection over 80-km SSMF,” IEEE Photonics J. **9**(6), 7204509 (2017).

**5. **Q. Zhang, N. Stojanovic, J. Wei, and C. Xie, “Single-lane 180 Gb/s DB-PAM-4-signal transmission over an 80 km DCF-free SSMF link,” Opt. Lett. **42**(4), 883–886 (2017). [PubMed]

**6. **L. Zhang, T. Zuo, Y. Mao, Q. Zhang, E. Zhou, G. N. Liu, and X. Xu, “Beyond 100-Gb/s transmission over 80-km SMF using direct-detection SSB-DMT at c-band,” J. Lightwave Technol. **34**(2), 723–729 (2016).

**7. **A. Mecozzi, C. Antonelli, and M. Shtaif, “Kramers–Kronig coherent receiver,” Optica **3**(11), 1220–1227 (2016).

**8. **Z. Li, M. S. Erkılınç, K. Shi, E. Sillekens, L. Galdino, B. C. Thomsen, P. Bayvel, and R. I. Killey, “SSBI mitigation and the Kramers–Kronig scheme in single-sideband direct-detection transmission with receiver-based electronic dispersion compensation,” J. Lightwave Technol. **35**(10), 1887–1893 (2017).

**9. **S. Liang, J. Li, Z. Wan, Y. Fan, F. Yin, Y. Zhou, Y. Dai, and K. Xu, “56-Gb/s single-photodiode 16QAM transmission over 140-km SSMF using Kramers-Kronig detection,” in *Asia Communications and Photonics Conference*, OSA Technical Digest (online) (Optical Society of America, 2017), paper M2B.2.

**10. **K. Zhong, X. Zhou, T. Gui, L. Tao, Y. Gao, W. Chen, J. Man, L. Zeng, A. P. T. Lau, and C. Lu, “Experimental study of PAM-4, CAP-16, and DMT for 100 Gb/s short reach optical transmission systems,” Opt. Express **23**(2), 1176–1189 (2015). [PubMed]

**11. **Z. Wan, J. Li, L. Shu, S. Fu, Y. Fan, F. Yin, Y. Zhou, Y. Dai, and K. Xu, “64-Gb/s SSB-PAM4 transmission over 120-km dispersion-uncompensated SSMF with blind nonlinear equalization, adaptive noise-whitening postfilter and MLSD,” J. Lightwave Technol. **35**(23), 5193–5200 (2017).

**12. **Z. Wan, J. Li, L. Shu, S. Fu, Y. Fan, F. Yin, Y. Zhou, Y. Dai, and K. Xu, “56-G b/s SSB-PAM4 transmission over 100-km dispersion-uncompensated SSMF with linear pre-equalization and blindly adaptive nonlinear post-equalization,” in *Asia Communications and Photonics Conference*, OSA Technical Digest (online) (Optical Society of America, 2017), paper M2G.5.

**13. **N. Kaneda, J. Lee, and Y. Chen, “Nonlinear equalizer for 112-Gb/s SSB-PAM4 in 80-km dispersion uncompensated link,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2017), paper Tu2D.5.

**14. **Q. Zhang, N. Stojanovic, T. Zuo, L. Zhang, C. Prodaniuc, F. Karinou, C. Xie, and E. Zhou, “Single-lane 180 Gb/s SSB-duobinary-PAM-4 signal transmission over 13 km SSMF,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2017), paper Tu2D.2.

**15. **D. Zibar, O. Winther, N. Franceschi, R. Borkowski, A. Caballero, V. Arlunno, M. N. Schmidt, N. G. Gonzales, B. Mao, Y. Ye, K. J. Larsen, and I. T. Monroy, “Nonlinear impairment compensation using expectation maximization for dispersion managed and unmanaged PDM 16-QAM transmission,” Opt. Express **20**(26), B181–B196 (2012). [PubMed]

**16. **C. Chuang, C. Wei, T. Lin, K. Chi, L. Liu, J. Shi, Y. Chen, and J. Chen, “Employing deep neural network for high speed 4-PAM optical interconnect,” in *Proceedings of European Conference on Optical Communication (ECOC)**(**2017**)*, paper W.2.D.

**17. **J. Thrane, J. Wass, M. Piels, J. C. M. Diniz, R. Jones, and D. Zibar, “Machine learning techniques for optical performance monitoring from directly detected PDM-QAM signals,” J. Lightwave Technol. **35**(4), 868–875 (2017).

**18. **F. N. Khan, K. Zhong, X. Zhou, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. T. Lau, “Joint OSNR monitoring and modulation format identification in digital coherent receivers using deep neural networks,” Opt. Express **25**(15), 17767–17776 (2017). [PubMed]

**19. **Y. Cui, M. Zhang, D. Wang, S. Liu, Z. Li, and G. K. Chang, “Bit-based support vector machine nonlinear detector for millimeter-wave radio-over-fiber mobile fronthaul systems,” Opt. Express **25**(21), 26186–26197 (2017). [PubMed]

**20. **Z. Wang, M. Zhang, D. Wang, C. Song, M. Liu, J. Li, L. Lou, and Z. Liu, “Failure prediction using machine learning and time series in optical network,” Opt. Express **25**(16), 18553–18565 (2017). [PubMed]

**21. **S. Liu, M. Xu, J. Wang, F. Lu, W. Zhang, H. Tian, and G. Chang, “A multilevel artificial neural network nonlinear equalizer for millimeter-wave mobile fronthaul systems,” J. Lightwave Technol. **35**(20), 4406–4417 (2017).

**22. **E. Giacoumidis, S. Mhatli, J. Wei, S. T. Le, I. Aldaya, M. F. Stephens, M. McCarthy, A. Ellis, N. J. Doran, and B. Eggleton, “Intra and inter-channel nonlinearity compensation in WDM coherent optical OFDM using artificial neural network based nonlinear equalization,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2017), paper Th2A.62.

**23. **R. Rios-Müller, J. M. Estaran, and J. Renaudier, “Experimental estimation of optical nonlinear memory channel conditional distribution using deep neural networks,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2017), paper W2A.51.

**24. **J. Estaran, R. Rios-Müller, M. A. Mestre, F. Jorge, H. Mardoyan, A. Konczykowska, J.-Y. Dupuy, and S. Bigo, “Artificial neural networks for linear and non-linear impairment mitigation in high-baudrate IM/DD systems,” in Proceedings of European Conference on Optical Communication (ECOC)*(*2016*)*, M.2.B.2.

**25. **C. Ye, D. Zhang, X. Huang, H. Feng, and K. Zhang, “Demonstration of 50Gbps IM/DD PAM4 PON over 10GHz class optics using neural network based nonlinear equalization,” in *Proceedings of European Conference on Optical Communication (ECOC)**(**2017**)*, paper W.2.B.

**26. **K. V. Peddanarappagari and M. Brandt-Pearce, “Volterra series approach for optimizing fiber-optic communications system designs,” J. Lightwave Technol. **16**(11), 2046–2055 (1998).

**27. **A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM **60**(6), 84–90 (2017).

**28. **M. M. Lopez and J. Kalita, “Deep Learning applied to NLP,” ArXiv170303091 Cs (2017).

**29. **J. Yang, “ReLU and softmax activation functions”, (2017). https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-Softmax-Activation-Functions.

**30. **D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature **323**, 533–536 (1986).

**31. **M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. Freitas, “Predicting parameters in deep learning,” in *Advances in Neural Information Processing Systems*, (2013), pp. 2148–2156.

**32. **S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” in *International Conference on Neural Information Processing Systems* (Massachusetts Institute of Technology, 2015), pp. 1135–1143.