## Abstract

Volterra equalization (VE) presents substantial performance enhancement for high-speed optical signals but suffers from high computation complexity which limits its physical implementations. To address these limitations, we propose and experimentally demonstrate an elastic net regularization-based pruned Volterra equalization (ENPVE) to reduce the computation complexity while still maintain system performance. Our proposed scheme prunes redundant weight coefficients with a three-phase configuration. Firstly, we pre-train the VE with an adaptive EN-regularizer to identify significant weights. Next, we prune the insignificant weights away. Finally, we retrain the equalizer by fine-tuning the remaining weight coefficients. Our proposed ENPVE achieves superior performance with reduced computation complexity. Compared with conventional VE and L1 regularization-based Volterra equalizer (L1VE), our approach show a complexity reduction of 97.4% and 20.2%, respectively, for an O-band 80-Gbps PAM4 signal at a received optical power of −4 dBm after 40 km SMF transmission.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## Corrections

13 July 2021: A typographical correction was made to Fig. 7.

## 1. Introduction

A study conducted by Cisco has demonstrated that the global number of Internet users will increase from 3.9 billion (51% of the population) in 2018 to about 5.3 billion (66% of the population) by 2023 [1]. This continuing growth in Internet access is fueled by various bandwidth-hungry applications, such as video streaming, cloud-based computing, data storage, virtual and augmented reality. This exponential growth drives the scales of data centers. Therefore, optical interconnections are crucial in inter-and intra-data center communications. Although remarkable advances have been achieved with coherent detection, recent technology trends in inter-data center interconnects still focus on the intensity modulation based direct detection (IM/DD) signal formats for their simple transmitter and receiver configurations for over 40 km transmissions [2,3]. Several advanced IM/DD techniques, such as pulse amplitude modulation (PAM), discrete multi-tone modulation (DMT), and carrier-less amplitude and phase modulation (CAP), had been proposed [4]. Among them, PAM is considered as a better solution in real-time applications. High-speed IM-DD PAM-4 systems suffer from several linear and nonlinear degrading effects, such as band-limited electro-optical devices, modulation nonlinearity, SOA nonlinearity, and fiber nonlinearity, in O-band, which need to be overcome. There are many digital signal processing (DSP) based nonlinear equalizers, including Volterra equalizers (VEs), have been proposed to mitigate these degrading effects for high-speed optical interconnects [5–8]. With its remarkable ability to approximate the input-output relationship of the nonlinear channel responses, VEs have been widely employed to cope with the nonlinear deformations in optical communication systems. However, its high-quality performance significantly relies on complicated computation complexity,which limits its feasibility in real-time implementations [9]. Many research efforts have been devoted to reduce the computation complexity in VEs [10–14]. Among them, the L1-regularization can find the equalizer's sparse structure by minimizing the residual sum of squares with the L1 penalty term, so the insignificant weights can be identified and enforced to zero. Thus only the remaining weights perform the equalization process which can greatly reduce computation complexity. However, this complexity reduction is at a cost of sacrificing system performance [15] and is still far-away from the real-time employment which impedes its applications in the inter- or intra- data centers interconnects. In image processing, L2-regularization has been employed in machine learning scheme to improve the image quality [16]. However, it cannot simplify the equalizer’s structure. Thus, the computation complexity is still too high to be implemented.

In order to justify the trade-off between performance and complexity, in this research, we propose and experimentally demonstrate an elastic net regularization-based pruned Volterra equalization (ENPVE), which combines the L1- and L2- regularizations, to implement an efficient Volterra equalizer for 80-Gbps PAM-4 IM-DD systems. The proposed ENPVE scheme employs pruning algorithm with elastic net regularization to remarkably reduce the computation complexity while still maintains excellent system performance. The complexity reduction is helped by a simple sparsity metric which calculates the overall sparsity of the weight coefficients and precisely identifying the insignificant weight coefficients that need to be pruned away without significantly degrade system performances. Meanwhile, the loss of information,due to heavily pruning, can be recovered after the retraining phase to further uphold the desired accuracy. We set up an 80-Gbps PAM-4 IM-DD optical transmission links over 40 km standard single-mode fiber (SSMF) in O-band to demonstrate the feasibility of the proposed ENPVE, and compare its performance with L1VE and conventional VE. The experimental outcomes show that, compared with L1VE, ENPVE attains better sparse characteristics and achieve lower complexity to keep the performance below the HD-FEC limit. Without retrain, compared with conventional VE and L1VE, ENPVE shows up to 96% and 31% reduction in complexity, respectively, at a received optical power (ROP) of −4 dBm for back-to-back (BTB). With the same scenario, 90% and 27.5% reductions are obtained after 40-km SSMF transmission. After retrain, when compared with conventional VE, ENPVE attains 98% and 97.4% complexity reduction for BTB and 40-km SSMF transmission, respectively, while 16.8% and 20.2% reductions in complexity are obtained when compared with L1VE, all at a ROP of −4 dBm.

## 2. Principles of an equalizer

#### 2.1 Principle and complexity of the Volterra equalizer

The Volterra series [17] is an efficient technique to mitigate joint linear and nonlinear channel distortions. A *K ^{th}*-order Volterra series expansion is stated in Eq. (1) as:

*r*is the order of the Volterra equalizer,

*N*is the memory length of each order, $x(k )$ and $\; y(k )$ are respectively the ${k^{th}}$ samples of the input and output of the Volterra series, and ${w_r}({{k_1},{k_2}, \ldots {k_r}} )$ denotes the ${r^{th}}$-order Volterra coefficients. It is noted that Volterra coefficients contain a symmetric property, i.e., ${w_2}({{k_1},\; {k_2}} )$ = ${w_2}({{k_2},\; {k_1}} )$, which is considered in Eq. (1) to avoid the redundancy [18]. The number of coefficients, $NC({N,\; K} )$, in the

_{r}*K*-order Volterra equalizer is written as

^{th}From Eq. (2), it can be deduced that the weight coefficients rise rapidly with the growing filter order (*K*) and memory length (*N*). To implement an equalizer that compensates for both the linear and nonlinear distortions accumulated in the communication channel, it is necessary to select a proper equalizer order and memory lengths to balance the performance and its computation complexity. In accordance with [7] and [9], our investigation employs the third-order Volterra equalizer to jointly compensate for the most linear and nonlinear distortions, as stated in Eq. (3):

In Eq. (3), the 3^{rd}-order Volterra equalizer has three memory lengths, ${N_1}$, ${N_2}$ and ${N_3}$, to represent the 1^{st}- 2^{nd}-, and 3^{rd}-order of VE, respectively. The complexity, in terms of the number of real multiplications, $NM({N,\; K} )$, is stated as Eq. (4) [9]:

Equation (4) explains the complexity of the *K*^{th}-order Volterra equalizer with different memory lengths *N* = [${N_1}$, ${N_2}$, …… ${N_r}$] corresponding to the ${1^{st}}$ to ${K^{th}}$ orders of Volterra equalizer. The total number of multiplications to implement the Volterra equalizer increases as the memory length increases. Thus, the implementation of such an equalizer may not be possible because of the huge complexity resulted from the long memory length and filter order. Such difficulty becomes even more severe in some adaptive applications or adaptive algorithms, such as recursive least-squares (RLS) [19] and affine projection [20].

#### 2.2 Sparsity metric

The implementation of the third-order VE is limited by its huge computation complexity. To bring it more practical, it is quite important to find out the sparsity of the VE coefficients by adopting the algorithm at a low computation cost. Several works have reported effective approaches to find sparsity [21,22]. In this work, we employ a blind technique that doesn’t require to examine the individual VE coefficient [23]. Therefore, we utilize the idea of overall sparsity to determine the sparsity of the VE coefficients without considering complex calculations. We bring up a sparsity metric (ξ), which varies between 0 and 1 and is a crucial parameter in dealing with the sparsity of the coefficients. Such sparsity metric ξ is expressed as

where ${\vert\vert w \vert\vert_0}$ represents the number of non-zero weight coefficients by simply counting non-zero entries in the weight vector as expressed in Eq. (2), and $NC({N,K} )$ denotes the number of coefficients in the third-order equalization. Since Eq. (5) does not handle the individual value and location of the VE coefficients, it estimates the overall sparsity without searching or sorting the coefficients in complex calculations.#### 2.3 Principles of Volterra pruning and retraining

As described in [7], conventional Volterra equalization methods consist of equalization, training data, target data, error signal, and weight coefficients updating blocks. After the equalization, most of the obtained weight coefficients are not zero. Thus it demands very high computation complexity by employing such adaptive Volterra equalizers. To estimate the percentage of non-zero weight coefficients, sparsity estimation is employed. Meanwhile, to compact the equalization structure and reduce computation complexity, the L1-regularization-based Volterra equalizer (L1VE) was proposed as an attractive solution [13]. In Fig. 1(a), we depict the block diagram of the L1VE and use it as a benchmark to compare with our proposed scheme, which is illustrated in Fig. 1(b). Thus in Fig. 1(a), Volterra equalizer will be trained with the L1-adaptive algorithm, which uses the L1-norm penalty in the cost function to update the weight coefficients. This algorithm, along with the separate sparsity estimation, can effectively control the sparsity of the coefficients, which is calculated in Eq. (5).

The proposed ENPVE scheme consists of three phases: pre-training, pruning, and retraining phase. Figure 1(b) demonstrates the illustrative framework of the proposed ENPVE scheme. The first phase is a pre-training phase, consisting of the equalization, training data, target data, error signal, the adaptive algorithm with the elastic-net (EN) regularization block, and weight coefficient updating block. This phase utilizes the supervised learning process to train the model with training data (input feature) embedded in the transmission signal. In our model, *X* is a design matrix that comprises the 1^{st}-, 2^{nd}-, and 3^{rd}-order signal beating terms, $y\; $ is the target data and the equalizer trainable weight parameters, *w*, is a vector that consists of the 1^{st}-, 2^{nd}-, and 3^{rd} -order weight coefficients of equalizers, ${w_1},$ ${w_2}$ and ${w_3}$, respectively. The pre-training phase can be solved by the sparsity constraint condition:

*h*(

*w*) is set to 1, the L2 penalty term becomes zero, and only the L1 penalty is present. While α=0, the L1 penalty term becomes zero, and the L2 penalty exists, which means it does not deliver the sparse weight coefficients. In L1VE, the overall sparsity, ξ, increases as the weight regularization control parameter λ increases, and thus reduces the complexity in calculation, but at a price of losing signal integrity. For EN regularization, L1 and L2 regularizations are combined by the mixing parameter α, ranged between 0 and 1. In this model, the L1 portion of the penalty produces a sparse model, and the L2 portion of the penalty eliminates the limitation of variable selection and stabilizes the L1 regularization path, which results in better model accuracy while still maintains the magnitude of the weight coefficients within a small value. With this EN regularization, we can easily identify significant and insignificant weight coefficients. A detailed analysis of the accuracy and complexity of the pre-trained model will be shown in the results and discussion section.

The second phase is pruning, where we prune away those small-valued weight coefficients that do not have a significant impact on system performance with the following steps:

- 1) Select a pruning sparsity to prune away a fraction of weight coefficients from the model.
- 2) Sort the weight coefficients in descending order using the histogram analysis as
*w*= [ ${w_1} \ge \; {w_2} \ge \ldots {w_j} \ldots . \ge {w_M}]$, where ${w_j}$ represents the*j*^{th}lowest weight coefficients. - 3) Calculate the pruning index $l = ({1 - s} )\times {w_0}$, where ${\vert\vert w {\vert\vert}_0}$ denotes the number of non-zero weight coefficients in vector
*w*, as in Eq. (5), and $\; s$ is a pre-determined pruning sparsity. - 4) Find the trimming threshold ${\beta _s} = w(l )\; $ corresponding to the specified sparsity,
*s*, through the magnitude pruning approach. - 5) Prune away the weight coefficients below the specified threshold, as calculated in Step 4, and then estimate the overall sparsity ξ of the weight coefficients.

In the pruning phase, weight coefficient’s significance is measured by determining which weight coefficient needs to be dropped or kept. It is seen that the trimming threshold is set by using a pre-defined pruning sparsity of weight coefficients. When the pruning sparsity is high, more weight coefficients are removed, which reduces the size of the training data (input feature), so as to the computation complexity. However, the accuracy of the received data sharply drops after pruning. The third phase is retraining, which improves the accuracy loss by freezing the zero-weight coefficients after pruning and dropping out the input feature and target elements that correspond to zero-weight coefficients, but using the remaining parameters to retrain the model by fine-tuning the other non-zero parameters to recuperate the accuracy loss. Fine-tuning with EN allows better recovery even for higher pruning sparsity.

## 3. Experimental setup

Figure 2 illustrates our experimental setup. Four independent 20 Gbit/s OOK signals with PRBS 2^{15} – 1 patterns are generated by Anritsu pulse pattern generators. These four data channels are divided into two pairs. Each pair is fed into an Anritsu G0374A DAC to double the bit rate to a 40 Gbit/s OOK signal. Then, these two 40 Gbit/s OOK signals are amplified with different amplitudes and combined into a 40 Gbaud/s PAM-4, i.e., 80 Gb/s, signal. Next, this PAM-4 signal is fed into a 1290 nm electro-absorption modulated laser (EML). To meet the limited receiving sensitivity of the employed photo-detector, a semiconductor optical amplifier (SOA) is introduced to increase the laser power to 11 dBm for supporting longer transmission distance at O-band wavelength. Before the SOA, a VOA is utilized to adjust the laser’s output power at −2 dBm to optimize the amplified signal performance. After 40-km SSMF transmission, another VOA is employed to control the received optical power, and the signals are detected through a fast photo-diode (PD) with 30-GHz bandwidth. The received electrical signal is recorded by an oscilloscope with 30-GHz bandwidth (Tektronix DPO73304DX) and followed by the offline DSP to mitigate for the nonlinear distortions caused by the transmission system. In the proposed ENPVE DSP module, each data frame comprises 260000 transmitting PAM-4 symbols, in which 20800 random PAM-4 symbols are chosen for the training process, and the remaining 239200 random PAM-4 symbols are the payload symbols which will used for the BER evaluation and computation complexity analysis.

## 4. Results and discussion

In our experimental demonstration, DSP is performed offline using MATLAB. Whereas the third-order Volterra equalizer is trained with the least mean squares algorithm due to its low computation cost and is more robust in comparison with the RLS [7]. Here, we begin the analysis of equalizer by increasing memory length in different orders. The memory lengths of *N*_{1}, *N*_{2}, and *N*_{3} in the 3rd-order VE (*N*_{1}, *N*_{2}, *N*_{3}) are increased by a step size of 4, 3, and 2 from VE (1, 1, 1) to VE (33, 25, 17). Figure 3(a) shows the BER as a function of the complexity for both the BTB and 40- km transmission cases with conventional VE. The BER values gradually decrease as the complexity increases. However, such third-order memory lengths demand a lot of computation burdens, which is a major obstacle of the conventional VE. Therefore, to relieve the computation burden yet retain a BER criterion below the HD-FEC limit of BER=1E-3 [25], [26], we pick up an adequate VE(25, 19, 13) in the rest of this study for both the BTB and 40-km transmission. The eye diagrams of the received PAM-4 signal for BTB and 40-km transmission with and without conventional VE are also depicted in the insets of Fig. 3(a). In comparison, a linear equalizer (LE) is also conducted with an optimum tap of 25. The BER curves for both BTB and 40-km transmission plotted against the received optical power (ROP) using LE and VE are shown in Fig. 3(b). Apparently, the LE fails to meet HD-FEC threshold after 40-km transmission, even after applying a longer memory length to 50 (not shown here). On the other hand, the Volterra equalizer with VE (25, 19, 13) can appropriately decrease the BER below the desired HD-FEC limit for 40-km transmission at an ROP of −4dBm. The eye diagrams of received signal BTB and 40-km transmission using LE and VE at ROP of −4 dBm are illustrated in the insets of Fig. 3(b). However, the total complexity associated with VE (25, 19, 13) is 1770, which is still a big barrier for real-time implementations.

The detailed analysis of the accuracy and complexity of the pre-trained model via the EN regularization is discussed in Fig. 4. After 40-km, the BER performance and complexity at an ROP of −4 dBm are outlined respectively in Fig. 4(a) and (b) with respect to λ under different values of α. From Fig. 4(a) and (b), they exhibit that, for different values of α with a step of 0.25 from 0 to 1, the computation complexity reduces at the cost of BER performance as λ increases from 1E-5 to 3.31E-3. That is to say, we can obtain the best BER performance, but it demands the most computations at α = 0, whereas α = 1 provides the best complexity reduction with the worst BER performance. We further evaluate the computation complexity in terms of weight coefficient distribution by the histogram at α = 0.25, as illustrated in Fig. 4(c) with insets (i), (ii), (iii), and (iv) for λ = 0, 5E-5, 1E-3, and 2E-3, respectively. Our results show that the non-zero weight coefficient numbers shrink, i.e., more zero-weight coefficients appear, as λ increases, which results in complexity reduction, Fig. 4(b), but degrades BER performance, Fig. 4(a).

Since the sparsity increases, the computation complexity reduces [14], [13]. When the pruning sparsity is small, there is no significant reduction in complexity and obtains the best performance gain, whereas high pruning sparsity leads to a large complexity reduction, but a decline in BER performance will be found. Therefore, the main concern here is to determine the sparsity to find the balance between BER performance and complexity. Figures 5(a) and (b) reveal the BER curves versus sparsity of the weight coefficients for both the BTB and 40-km transmission at an ROP of −4 dBm. For L1VE the sparsity ξ increases as the weight threshold λ increase from 2.4E-4 to 9.6E-3 with a step of 2.4E-4. Hence, BER performances gradually degrade as the sparsity ξ increases for both BTB and 40-km scenarios. From Fig. 5(a) and (b), to meet the HD-FEC limit, L1VE requires a sparsity ξ of 0.81 and 0.65 with and without retraining for BTB and 0.77 and 0.63 with and without retraining after 40-km transmission at the ROP of −4 dBm, respectively. Thus the results indicate that the ones with retrain in L1VE can effectively relieve the information loss due to high percentage pruning. Figures 5(a) and (b) also show that the complexity decreases with the increase of sparsity from 0 to 1 for both the cases of BTB and 40 km transmissions. Figures 6(a) and (b) individually exhibit the BER curves plotted as a function of ROPs at different sparsity values: ξ=0.45, 0.60, and 0.75 for BTB and 40-km transmission. Under the above sparsities, the BER gradually decreases as ROP increase from −7dBm to −2dBm in all scenarios. We can easily observe that conventional VE achieves the best BER performance at all ROPs for both the BTB and 40-km transmission. Without retrain, as sparsity ξ increases, the BER performance drastically degrades. When ξ=0.75, the performance cannot meet the HD-FEC threshold at all ROPs, even at BTB, as shown in Fig. 6(a). We have to reduce sparsity ξ, i.e., sacrificing complexity, to enhance the signal’s integrity after L1VE. On the other hand, when we do apply retraining since the weight coefficients have been redistributed and updated after pruning, the BER performance is greatly enhanced. Even when the sparsity is as high as 0.75, after 40-km transmission, the performance can still meet the HD-FEC criterion at an ROP of −4 dBm, as depicted in Fig. 6(b).

For the EN regularization, the α values ranged between 0 and 1, are employed to find the balance between the BER performance and complexity for the ever-increasing value of $\lambda $. We also evaluate the computation complexity in terms of weight coefficient distribution on the histogram at α = 0.25, as illustrated in Fig. 4(c) with insets (i), (ii), (iii), and (iv) for λ = 0, 5E-5, 1E-3, and 2E-3, respectively. Our results show that the non-zero weight coefficient numbers shrink, i.e., more zero-weight coefficients appear, as λ increases, which results in complexity reduction, Fig. 4(b), but degrades BER performance, Fig. 4(a). To overcome this dilemma between performance and complexity, we study the EN regularization for various combinations of λ and α and apply the pruning and retraining algorithms to reduce complexity without degrading performance. In the proposed ENPVE, the received signal is trained with fixed values of α = 0.25 and λ = 1.5E-4, and then the pruning algorithm is applied to prune away the pre-defined weight coefficients from the model to measure the BER performance and complexity. From Fig. 7(a) and (b), to meet the HD-FEC limit, sparsity ξ using ENPEV can be reached to as high as 0.98 and 0.96 for BTB with and without retraining, while for 40-km transmission, they are 0.97 and 0.90 with and without retraining, respectively, at an ROP of −4dBm. It may also be noted that the number of multiplications decreases as sparsity increases from 0 to 1 for both the BTB and 40 km transmissions. Meanwhile, for both the BTB and 40-km transmission, the BERs performances, with and without retrain, almost remain the same as the sparsity value ξ below 0.7, which indicates that the proposed EN regularization can efficiently differentiate the significant and insignificant weights.

Figures 8(a) and (b) exhibit the BER curves as a function of ROPs at different sparsity values: ξ = 0.75, 0.90, and 0.98 for the BTB and 40 km transmission cases. When the sparsity ξ equals 0.75, we can discern that the BER performances are almost the same for the conventional-VE and ENPVE with and without retraining at all ROPs in both the BTB and 40-km transmission. Thus at this sparsity, the retrain does not improve the performance much. This is because, when ξ = 0.75, the significant weight coefficients are not heavily pruned away in the pruning phase. Therefore, retrained weight coefficients do not recover too much information loss. Meanwhile, similar BER performances from ENPVE indicates that there exist tremendous redundant weight coefficients in conventional-VE, which gigantically increases computation complexity. As the sparsity ξ increases to 0.90, the performances without retraining exhibit significant performance loss, compared with the ones retrain, in both the BTB and 40-km transmission. And the BER performance of ENPVE without retrain can barely meet the HD-FEC limit at −4 dBm after 40-km transmission. However, after retraining, the performances in both scenarios have greatly been enhanced and are even a little bit better than those at ξ = 0.75. Such BER improvement demonstrates that the information loss due to heavy pruning can be recovered after retrain. When the sparsity rises to 0.98, we can find that the BER performance without retraining can no longer meet the desired HD-FEC threshold in both cases. For the 40-km transmission case, only when we retrain the weight coefficients after pruning can barely approach to the HD-FEC criterion. These obtained results show that, at such a high sparsity, the BER almost cannot be recovered even after retrain because too much information is lost. Thus, it’s well-conceived that the proposed ENPVE can achieve a higher sparsity without significantly degrade the system performance after retraining.

Figures 9(a) and (b) show the plots of complexity measures versus ROPs for BTB and 40-km transmission, respectively. The results clearly demonstrate that at a lower ROP, the Volterra equalizer requires more complexity in handling the equalization process in order to meet the HD-FEC limit due to lower received SNR. Compare with conventional VE, at an ROP of −4 dBm, ENPVE respectively offers 98% and 96% reduction in complexity for BTB with and without retraining as shown in Fig. 9(a), while 97.4% and 90% reduction is observed for 40-km transmission in Fig. 9(b). Furthermore, under the same operation conditions, ENPVE outperforms L1VE by 16.8% and 31% reduction in the complexity when it is processed with and without retraining, respectively in BTB, Fig. 9(a), and 20.2% and 27.5% reduction after 40 km, Fig. 9(b). Moreover, from the 40-km transmission case in Fig. 9(b), the complexity cannot be reduced by using L1VE at an ROP of −6dBm, but our scheme can cut the complexity by 91% and 75% for the cases with and without retraining, respectively. Therefore, the proposed ENPVE outperforms L1VE.

In order to see the computation complexity reduction effects by using ENPVE, we also investigate the multiplication counts in each polynomial order of the 3rd Volterra equalizer. In Table 1, we summarize them for VE (25, 19, 13) at ξ=0.75 for both the conventional VE and the proposed ENPVE for BTB at a ROP of −4dBm. In this table, it can be clearly seen that the computation complexity increases as the polynomial order increases, while the redundancy also soars accordingly in the conventional VE. However, in the proposed ENPVE, we can identify the insignificant weighting coefficients and prune them away, so the computation complexity can be gigantically reduced. As can be seen, almost 86% of computation complexity has been saved in the 3rd order via the proposed ENPVE.

## 5. Summary

To realize a highly efficient and low-cost high-speed IM-DD PAM-4 transmission system, the EN regularization-based pruned Volterra equalizer has been implemented to mitigate nonlinear distortions. Compared with L1VE and fully connected VE, we successfully reduce computation complexity, while still maintaining a satisfactory BER performance, using ENPVE in an experimental study with 80-Gbps PAM-4 IM-DD transmission over 40 km SSMF with EML and SOA in O-band. Our results showed that ENPVE has obtained better sparse characteristics owing to low complexity to meet the HD-FEC limit in comparison with L1VE. Besides, using ENPVE, retraining the weight coefficients of the equalizer delivers significant performance improvements and obtains weight coefficients with better accuracy. Without retraining, ENPVE appreciates up to 96% and 90% complexity reduction for BTB and 40-km transmission at an ROP of −4 dBm, respectively, and shows 31% and 27.5% additional reduction compared with L1VE for BTB and 40-km transmission, respectively., However, with retraining, under the same ROP, the computation reduction is enhanced to 98% and 97.4% for BTB and 40-km transmission, respectively, and 16.8% and 20.2% more reduction for BTB and 40-km transmission case are achieved, respectively, compared with L1VE. The obtained results demonstrate the feasibility of ENPVE for high-speed IM-DD PAM-4 transmission systems.

## Funding

Ministry of Science and Technology, Taiwan (108-2221-E-007-088, 109-2221-E-007-109-MY3).

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **Cisco Annual Internet Report, [Online]: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html.

**2. **N. Bitar, S. Gringeri, and T. J. Xia, “Technologies and protocols for data center and cloud networking,” IEEE Commun. Mag. **51**(9), 24–31 (2013). [CrossRef]

**3. **H. Mardoyan, M. A. Mestre, J. M. Estarán, F. Jorge, F. Blache, P. Angelini, A. Konczykowska, M. Riet, V. Nodjiadjim, and J.-Y. Dupuy, “84-, 100-, and 107-GBd PAM-4 intensity-modulation direct-detection transceiver for datacenter interconnects,” J. Lightwave Technol. **35**(6), 1253–1259 (2017). [CrossRef]

**4. **K. Zhong, X. Zhou, T. Gui, L. Tao, Y. Gao, W. Chen, J. Man, L. Zeng, A. P. Lau, and C. Lu, “Experimental study of PAM-4, CAP-16, and DMT for 100 Gb/s short reach optical transmission systems,” Opt. Express **23**(2), 1176 (2015). [CrossRef]

**5. **M. S. Alfiad, D. Van den Borne, F. N. Hauske, A. Napoli, A. M. J. Koonen, and H. de Waardt, “Maximum-likelihood sequence estimation for optical phase-shift keyed modulation formats,” J. Lightwave Technol. **27**(20), 4583–4594 (2009). [CrossRef]

**6. **D. Maiti and M. Brandt-Pearce, “Modified nonlinear decision feedback equalizer for long-haul fiber-optic communications,” J. Lightwave Technol. **33**(18), 3763–3772 (2015). [CrossRef]

**7. **N. Stojanovic, F. Karinou, Z. Qiang, and C. Prodaniuc, “Volterra and Wiener equalizers for short-reach 100G PAM-4 applications,” J. Lightwave Technol. **35**(21), 4583–4594 (2017). [CrossRef]

**8. **R. D. Nowak and B. D. Van Veen, “Volterra filter equalization: A fixed point approach,” IEEE Trans. Signal Process. **45**(2), 377–388 (1997). [CrossRef]

**9. **J. Tsimbinos and K. V. Lever, “Computational complexity of Volterra based nonlinear compensators,” Electron. Lett. **32**(9), 852–854 (1996). [CrossRef]

**10. **Y. Yu, M. R. Choi, T. Bo, Z. He, Y. Che, and H. Kim, “Low-Complexity Second-Order Volterra equalizer for DML-Based IM/DD Transmission System,” J. Lightwave Technol. **38**(7), 1735–1746 (2020). [CrossRef]

**11. **N.-P. Diamantopoulos, H. Nishi, W. Kobayashi, K. Takeda, T. Kakitsuka, and S. Matsuo, “On the complexity reduction of the second-order Volterra nonlinear equalizer for IM/DD systems,” J. Lightwave Technol. **37**(4), 1214–1224 (2019). [CrossRef]

**12. **Y. Yu, T. Bo, Y. Che, D. Kim, and H. Kim, “Low-complexity nonlinear equalizer based on absolute operation for C-band IM/DD systems,” Opt. Express **28**(13), 19617–19628 (2020). [CrossRef]

**13. **W.-J. Huang, W.-F. Chang, C.-C. Wei, J.-J. Liu, Y.-C. Chen, K.-L. Chi, C.-L. Wang, J.-W. Shi, and J. Chen, “93% complexity reduction of Volterra nonlinear equalizer by ℓ 1-regularization for 112-Gbps PAM-4 850-nm VCSEL optical interconnect,” in Optical Fiber Communications Conference and Exposition OSA Technical Digest (CD) (Optical Society of America, 2018), San Diego, CA, USA, Paper,pp. M2D.7.

**14. **S.-Y. Lu, C.-C. Wei, C.-Y. Chuang, Y.-K. Chen, and J. Chen, “81.7% complexity reduction of Volterra nonlinear equalizer by adopting L1 regularization penalty in an OFDM long-reach PON,” in European Conference on Optical Communication (ECOC), Gothenburg, Sweden, 2017, pp. 1–3.

**15. **L. Shu, J. Li, Z. Wan, Z. Yu, X. Li, M. Luo, S. Fu, and K. Xu, “Single-photodiode 112-Gbit/s 16-QAM transmission over 960-km SSMF enabled by Kramers-Kronig detection and sparse I/Q Volterra filter,” Opt. Express **26**(19), 24564–24576 (2018). [CrossRef]

**16. **S. Zeng, J. Gou, and L. Deng, “An antinoise sparse representation method for robust face recognition via joint l1 and l2 regularization,” Expert Syst. Appl. **82**(11), 1–9 (2017). [CrossRef]

**17. **V. J. Mathews and G. Sicuranza, * Polynomial signal processing* (John Wiley & Sons, Inc., 2000).

**18. **M. Schetzen, * The Volterra and Wiener theories of nonlinear systems* (Wiley, 1980).

**19. **B. Farhang-Boroujeny, * Adaptive filters: theory and applications* (John Wiley & Sons, 2013).

**20. **K. Ozeki and T. Umeda, “An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties,” Electron. Comm. Jpn. Pt. I **67**(5), 19–27 (1984). [CrossRef]

**21. **N. Hurley and S. Rickard, “Comparing measures of sparsity,” IEEE Trans. Inf. Theory **55**(10), 4723–4741 (2009). [CrossRef]

**22. **G. Hirano and T. Shimamura, “A modified IPNLMS algorithm using system sparseness,” in 2012 International Symposium on Intelligent Signal Processing and Communications Systems, Taipei, Taiwan, pp. 876–879.

**23. **E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag. **25**(2), 21–30 (2008). [CrossRef]

**24. **Q. Lin, X. Chen, and J. Peña, “A sparsity preserving stochastic gradient methods for sparse regression,” Comput. Optim. Appl. **58**(2), 455–482 (2014). [CrossRef]

**25. **L. Liu, S. Xiao, J. Fang, L. Zhang, Y. Zhang, M. Bi, and W. Hu, “High performance and cost effective CO-OFDM system aided by polar code,” Opt. Express **25**(3), 2763–2770 (2017). [CrossRef]

**26. **J. Luo, J. Li, Q. Sui, Z. Li, and C. Lu, “40 Gb/s mode-division multiplexed DD-OFDM transmission over standard multi-mode fiber,” IEEE Photonics J. **8**(3), 1–7 (2016). [CrossRef]