## Abstract

The transmission performance of 112 Gbit/s PAM-4 signal with commercial 25 G-class EML and APD is experimentally studied by using advanced digital signal processing (DSP) algorithms, i.e. pre-equalization (Pre-EQ), error-table based pre-correction (ETC), least-mean square (LMS) based equalization, direct detection faster than Nyquist (DD-FTN) algorithm. Among them, Pre-EQ and ETC are implemented at the transmitter, and ETC is a symbol-pattern-dependent pre-compensation algorithm based on the look-up-table approach. In order to obtain these pre-compensated parameters readily, a joint equalization and error table generation (JEEG) module is proposed. Employing the combination of ETC, LMS, and DD-FTN, a single line 112 Gbit/s PAM-4 40 km amplifier-less transmission with a record receiver sensitivity of −16.6 dBm (at 7% HD-FEC threshold) is experimentally demonstrated. In addition, the computational complexities of different DSP schemes are analyzed and discussed in detail. The receiver computational complexity can be effectively reduced by employing appropriate ETC and Pre-EQ in the transmitter.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Internet of things (IoT), cloud computing, storage and other advanced applications have driven demands for high speed short-haul optical transmission including 5G fronthaul, datacenter interconnects and metro access [1]. To satisfy the low cost and power-per-bit requirements for short-haul optical applications, intensity-modulation combined with direct-detection (IM-DD) systems are most practical because they are technologically simple, cost effective and have small form-factors [2]. Single channel rate over 100 Gbit/s using IM-DD transceivers with various advanced modulation schemes have been widely demonstrated, including pulse amplitude modulation (PAM) [3,4], discrete multi-tone (DMT) modulation [5,6], and carrier-less amplitude and phase (CAP) modulation [7]. However, the IM-DD systems are inherently sensitive to the fiber dispersion, especially at high baud rates. For this reason, IM-DD systems are proposed to be operated at the O-band (1310 nm) within 20 km single mode standard fiber (SSMF) transmission, since there is negligible chromatic dispersion (CD) and acceptable power loss. However, once the transmission distance is extended up to 40 km for the extended range (ER) applications, we must face a dilemma between the significant CD distortion at C-band and the large power loss at O-band. If the C-band transmission is selected, either CD pre-compensation [8] or single-side band (SSB) [9] technique is required to deal with the CD problem under direct-detection. In this case, the complex electrical-field modulation is required instead of the intensity modulation, therefore a dual-drive MZM (DDMZM) or an IQ modulator has to be used instead of the electro-absorption modulated laser (EML) or the direct modulation laser (DML). These systems have the advantage of being highly dispersion tolerant, but the complex transmitter with DDMZM or IQ modulator might not be cost- and power-effective [10]. Besides, an optical amplifier becomes necessary for the short-haul transmissions. Alternatively, IM-DD systems operated at O-band need to overcome a intrinsic problem of the large fiber loss. Here, the transmission loss is approximately 14 dB over 40 km SSMF. In this case, the high power transmitter such as DML can be adapted to increase the transmission distance. W. Wang et al. demonstrated 112 Gbit/s PAM-4 amplifier-free transmission over 40 km SSMF using 1.3 um high power DML [11]. Another option is to employ avalanche photodiode (APD), which has a higher responsivity than conventional photodiodes so as to achieve longer reach transmissions under the IM-DD situation. However, there is a lack of high bandwidth APD for a long period of time. Until 2012, M. Nada et al. reported an APD at 1310 nm with a 3-dB bandwidth of 18 GHz for 25 Gbit/s operations [12]. Recently, M. Nada et al. experimentally demonstrated optical amplifier-less transmission of 106 Gbit/s PAM-4 over 40 km at 1310 nm using the fabricated APD with 42 GHz 3-dB bandwidth [13]. In 2017, we experimentally demonstrated a 40 km amplifier-less transmission of single-lane 112 Gbit/s PAM-4 signal for the first time, which also achieved a record of receiver sensitivity −14.8 dBm at the BER of 3.8E-3 by using 25 G-class EML and APD [14].

In this paper, we experimentally study and compare the performance of 112 Gbit/s PAM-4 amplifier-less transmission with different DSP algorithms and their combinations. Here, the studied algorithms include least-mean square (LMS) based equalization, direct detection faster than Nyquist (DD-FTN) algorithm, pre-equalization (Pre-EQ) and error-table-based pre-correction (ETC). In order to obtain the error-table simply, a joint equalization and error table generation (JEEG) module is proposed. Combining the use of ETC, LMS, and DD-FTN algorithms, the best receiver sensitivity is updated to −16.6 dBm. To the best of our knowledge, it is the latest record of receiver sensitivity for a single lane 112 Gbit/s PAM-4 amplifier-less transmission over 40 km SSMF link at O-band. In addition, employing ETC and Pre-EQ both in the transmitter can reduce the computational complexity of the receiver effectively. In this case, 80% of the number of equalizer taps can be decreased and DD-FTN is no longer required in the receiver. Finally, the detailed computational complexity analyses are given for different DSP schemes.

## 2. Experimental setup

The experimental setup for a single-lane PAM-4 amplifier-less transmission system with 25 G-class commercial EML and APD is depicted in Fig. 1.

At the transmitter side, a pseudo random binary sequence (PRBS) with a length of 2^{16}-1 is mapped into PAM-4 format. Then, the digital PAM-4 signal after being dealt with different DSP steps (as described in Sec. 3) is uploaded into an arbitrary waveform generator (AWG) (Keysight M8196A) operated at 90 GSa/s for digital-to-analog (D/A) conversion. Then, a linear electrical amplifier (SHF 807) is used to boost the electrical signal to a peak-to-peak voltage of 2.1 V, and the amplified electrical signal drives a commercial 25 Gbps EML (Mitsubishi FU-412REA) with 17 GHz 3-dB bandwidth. The center wavelength is 1304 nm. In order to determine the optimum bias point, the P-V curve was measured as shown in Fig. 2(a). Finally, the bias point is optimized to be −1.74 V. The power of output optical signal is ~0.5 dBm.

At the receiver side, a variable optical attenuator (VOA) is placed after SSMF to adjust the received optical power (ROP). The total transmission loss at O band is 14 dB for a 40 km SSMF link. Then, the transmitted signal is detected using an APD + TIA receiver (SiFotonics, RLNS200-400) with a 3-dB bandwidth of 17 GHz and further sampled by using a real-time oscilloscope (RTO) with 80 GSa/s sampling rate. Here, the bandwidth of the RTO is internally limited to 30 GHz for suppressing the high frequency noise. In order to investigate the bandwidth limitation of the whole system, the end-to-end channel frequency response is measured by performing a discrete frequency sweep with the AWG as shown in Fig. 2(b). It can be seen that the 3-dB bandwidth is reduced to 16 GHz due to a somewhat influence of the cascaded filtering.

## 3. Experimental results

Based on the uniform experimental setup with commercial 25 G-class EML and APD, we experimentally study and compare the performance of 112 Gbit/s PAM-4 signal with different DSP configurations. For each DSP scheme, one or more advanced algorithms, i.e. ETC, Pre-EQ, and DD-FTN, are employed. In Sec.3.2, a JEEG module is proposed to obtain the pre-compensation parameters for ETC and Pre-EQ.

#### 3.1 Transmission performance with DD-FTN

Figures 3(a) and 3(b) show the DSP procedures in the transmitter and the receiver respectively. At the transmitter side, the PAM-4 symbol sequence is up-sampled to 2 sample-per-symbol and goes through a root-raised cosine (RRC) filter with a roll-off factor of 0.1 for Nyquist pulse shaping. Then, a pre-emphasis is employed by using a 31-taps finite impulse response (FIR) to match an inverse digital Gaussian filter with a 3-dB bandwidth of 17 GHz to compensate the EML-induced low-pass filtering effect somewhat. Then, the output signal is re-sampled to match the AWG sampling rate (90 GSa/s) and operated with an amplitude clipping to reduce signal peak-to-average ratio (PAPR) induced by Nyquist pulse shaping and fractional up-sampling. Here, the clipping ratio [15] is kept at 2.2.

At the receiver DSP, the detected digital signal is normalized and re-sampled to 2 samples per symbol. Digital square and filtering algorithm [16] is used for timing recovery. Then, an adaptive equalization based on T/2-spaced FIR filter is used to compensate the channel distortion. Here, a training symbol aided LMS algorithm is first used to initialize the equalizer taps. After the equalizer is fully converged, it will be switched to a decision directed mode. Here, the down-sampling operation from two samples to one sample per symbol is also completed during LMS equalization process. Then, a DD-FTN [4] module is employed, which includes a digital post filter with a transfer function of *H*(*z*) = 1 + *αz*^{−1} and a maximum likelihood sequence estimation (MLSE) with memory length of 3 (corresponding to 64-state). In DD-FTN, the post filter is used to suppress the enhanced in-band noise at high frequency components that is induced by the LMS-based linear equalization, and MLSE is used to eliminate the strong inter-symbol interference (ISI) that is induced by the front post filter. Finally, the BER is measured by a bit error counter after de-mapping.

First, the key parameters influences of the LMS-based equalizer and DD-FTN on BER performance are investigated for the 112 Gbit/s PAM-4 signal using 25 G-class EML and APD. Figure 4(a) shows BER performances as a function of LMS equalizer taps at back-to-back (BTB) scenario with a ROP of −15 dBm. For comparison, BER results are measured before and after DD-FTN respectively. As expected, the BER performance improves with increasing tap number of equalizer, and there are basically same BER variation trends for the two cases. Due to the serious narrowband filtering effect, the stable BER performance is achieved when the number of taps is larger than 251. With the aid of DD-FTN, the PAM-4 system shows much better performance than that of employing LMS-based equalization solely. For the case of using DD-FTN, 161 taps are sufficient to reach an acceptable BER performance below the 3.8E-3 (7% hard-decision feedforward error correction (HD-FEC) limitation) at a ROP of −15 dBm. Thus, the number of taps is fixed to 161 for the rest of the paper unless otherwise noted. Here, the computation complexity of the equalizer with such large tap length can be reduced effectively by using frequency domain equalization (FDE). The detailed complexity analysis can be found in Sec. 4. Regarding DD-FTN, the coefficient α of the post filter is important, which defines its shape and bandwidth. The optimum value of α is investigated for the 112 Gbit/s PAM-4 system at BTB and after 20/40 km transmissions, as shown in Fig. 4(b). Here, ROP is also kept at −15 dBm. It can be seen that the optimal performance can be obtained at *α* = 0.5 for the all transmission scenarios without being sensitive to the transmission length. For the following cases, *α* is fixed at 0.5.

Figure 5 presents the BER performances as a function of the ROP at BTB and after 20 and 40 km SSMF transmissions. It can be seen that significant performance improvements can be achieved when DD-FTN algorithm is employed. Using the optimized parameters, the receiver sensitivities at a BER of 3.8E-3 are −15 dBm, −15.5 dBm and −16 dBm for BTB, 20 km and 40 km scenarios, respectively. Since the EML-induced chirp effects are counteracted by the fiber chromatic dispersion [17], the receiver sensitivities for 20 km and 40 km transmission exhibit the better performance than the BTB case. After 40 km amplifier-less SSMF transmission, there is 2 dB power margin for the 112 Gbit/s PAM-4 system.

#### 3.2 Transmission performance with ETC + DD-FTN

For high-speed PAM-4 signals, the linearity and bandwidth limitations induced by the drive amplifier, modulator and detector can exhibit a pattern-dependent distortion or non-linear inter-symbol interference with memory. Pattern-dependent look-up table approach can be used to mitigate this impairment [18–21]. For this reason, this approach is employed in the 112 Gbit/s PAM-4 system to further improve system performance. For obtaining the error-table simply, JEEG module is proposed as shown in Fig. 6.

In the joint module, *d*(*n*) denotes the reference pilot symbol. The output error *ε*(*n*) from LMS error function is not only used to update the tap weights [*w*_{0,} *w*_{1,} *w*_{2,…,}*w _{n}*] of FIR filter, but also to generate a pattern-dependent error table for ETC. After the equalizer is converged, the output error

*ε*(

*n*) will be stored in the error table

*ET*. It is noted that the storage address

*i*of the

*n*

^{th}symbol corresponds to the symbol pattern

*d*(

*n-k*:

*n*:

*n + k*) of length 2

*k*+ 1. Thus, each pattern combination will uniquely correspond to an index

*i*of

*ET*. Therefore, the memory size of the error table is related with the signal modulation level

*M*(

*M*= 4 for PAM-4 signals) and the symbol length of a pattern combination 2

*k*+ 1, which is equal to

*M*

^{2}

^{k}^{+1}. In addition, a counting table

*CT*will record the occurrence times of each pattern combination

*i*. Finally, the noise influence on the pattern-dependent error can be averaged as $\overline{ET}(i)=ET(i)/CT(i)$.

Figures 7(a) and 7(b) show the DSP procedures in the transmitter and the receiver respectively using the pattern-dependent ETC based on the proposed JEEG module. In the experiment, the error table $\overline{ET}$ is generated by using 26768 symbols after equalization convergence at BTB scenario with the maximum allowable ROP, which is −11 dBm due to the input optical power limit of APD. For *k* = 1, a 3-symbols *ET* is created, which includes 64 error correction values for 4^{3} pattern combinations, as shown in Fig. 8(a). As well, 5-symbols *ET* with 1024 error correction values (corresponding to 4^{5} pattern combinations) can be created for *k* = 2, as shown in Fig. 8(b). With the increasing of *n*, there are fewer number of occurrence times to each 2*k* + 1 pattern combination to average the noise of error values as a result. In our experiment, the amplitude correction with only 3- or 5-symbols *ET* is considered, and the generated error table will be fixed and used for all scenarios with different ROPs and fiber lengths. In the transmitter DSP, the PAM-4 symbols after bit-to-symbol mapping will be first pre-compensated the pattern-dependent error based on the generated error table, as shown in Fig. 7(a). Here, the error correction value of a certain PAM-4 symbol has to be selected from the error table depending on the same store rule of index *i.* Figs. 9(a), 9(b) and 9(c) illustrate the amplitude distributions of PAM-4 signal after bit-to-symbol mapping, ETC with 3-symbols and 5-symbols respectively. It can be seen that the four-level distribution of the PAM-4 signal via the pattern-dependent correction is no longer symmetric and equally spaced. Regarding the ETC cases, signals in the two middle levels are compressed as expected, which is consistent with the compensation rule for the modulation nonlinearity.

After using ETC in the transmitter, the BER performance versus ROP are investigated for 112 Gbit/s PAM-4 amplifier-less transmission in BTB, 20 km and 40 km scenarios. Figures 10(a) and 10(b) show the measured BER results using ETC with 3-symbols and 5-symbols respectively. Compared with the results of Fig. 5, further BER performance improvement is achieved by employing ETC. When DD-FTN is not employed, ETC with 5-symbols exhibits better performance than that of 3-symbols. However, when DD-FTN is implemented, there is no obvious performance difference between them, since residual pattern-dependent distortions can as well be mitigated by MLSE. Furthermore, it is worthy noted that DD-FTN is still very useful to improve system performance after adopting ETC in the transmitter DSP. Based on the 3-symbols ETC, 161-taps LMS equalization and 64-states DD-FTN, the optimum receiver sensitivity at 7% HD-FEC limit of 3.8E-3 BER can be improved from −16 dBm to −16.6 dBm for 112 Gbit/s PAM-4 transmission over 40 km SSMF without optical amplifier. To the best of our knowledge, this is the best receiver sensitivity for 112 Gbit/s transmission over 40 km amplifier-less SSMF with commercial 25 G-class electro-optical devices.

#### 3.3 Transmission performance with ETC + Pre-EQ + DD-FTN

For the ER access and metro links, the downstream receiver is much close to the client side, which would be more sensitive than the transmitter towards the computational complexity and power consumption. Here, an effectively possible way to reduce its cost and complexity is to transfer some signal processing computations from the receiver to the transmitter. For this reason, the digital inverse of the channel impulse response, i.e. the converged tap weights of LMS-based equalizer in JEEG, is also stored in the receiver (see Fig. 11(b)), and implemented as a static FIR filter in the transmitter (see Fig. 11(a)).

It is noteworthy to mention that since the Pre-EQ module can compensate of the bandwidth limitation effect, the pre-emphasis operation is here no longer required at the transmitter, as sketched in Fig. 11(a). Then, the variation of BER performance and the potential benefit for reducing receiver complexity are further investigated. Firstly, the BER performance as a function of the number of receiver equalization taps are measured at the BTB scenario with the maximum allowable ROP of −11dBm while employing Pre-EQ at transmitter with 101, 131, and 161 taps respectively, as depicted in Fig. 12. Here, all the tap weights of Pre-EQ are obtained at BTB situation with a ROP of −11 dBm and stored after equalization converged. Here, ETC and DD-FTN operations are not employed. From Fig. 12, it can be seen that BER performance can reach steady state quickly with the increasing tap number of equalizer at receiver while adopting Pre-EQ. For all the Pre-EQ configurations, a steady BER performance can be achieved using only 31 receiver-taps. However, the tap-length of Pre-EQ has significant impact on system performance. Among the three tap-length options, the best performance is obtained while Pre-EQ is with 161 taps. For the following tests, a fixed equalization configuration is used, which consists of 161 taps Pre-EQ at the transmitter and 31 taps LMS-based equalization at the receiver.

Next, BER performances as a function of ROP with this equalization configuration are measured while employing ETC with 3-symbols and 5-symbols, as shown in Figs. 13(a) and 13(b) respectively. Here, these error tables are created at the BTB scenario with a ROP of −11 dBm and under above decided equalization scheme. In this case, note that the DD-FTN algorithm is no longer useful to improve the system performance, because the noise on high frequency components is not enlarged significantly when employing Pre-EQ at the transmitter. On the contrary, DD-FTN might become detrimental as an aggressive compensation. In this case, the best receiver sensitivity is −15 dBm at the BER of 3.8E-3 after 40 km amplifier-less transmission by using 3-symbols ETC and 161-taps Pre-EQ without DD-FTN, which has 1.6 dB receiver sensitivity penalty with respect to above best performance result of Sec.3.2, i.e., −16.6 dBm, because of the increasing PAPR. In order to further clarify this problem, the complementary cumulative distribution function (CCDF) of the PAPR is measured, as shown in Fig. 14. It can be seen that the increase of PAPR appears while using ETC with larger of memory, and Pre-EQ also leads to the increase of PAPR. The higher PAPR will inevitably cause the increase in quantization noise and low power efficiency, therefore the system performance is somewhat influenced.

Since the most of computation effort is transferred from the receiver to the transmitter, the flexible optimization ability is inevitably reduced. The more accurate pre-compensations will be favorable to system performance. In Fig. 15, the BER performance is measured again for the 112 Gb/s PAM-4 40 km amplifier-less transmission with the optimized pre-compensation coefficients. Here, the 3-symbols error table and 161-tap weights of Pre-EQ are generated at the scenario of 40 km amplifier-less transmission with a maximum achievable ROP of −14 dBm. These parameters are fixed and used for different ROP scenarios. At the receiver, only a 31-taps LMS-based equalizer is used without DD-FTN. To facilitate comparison, two reference lines are also shown in Fig. 15. It is shown that we can obtain 1 dB receiver sensitivity improvement by using the pre-compensation parameters generated at 40 km over BTB scenario, because these pre-compensation parameters are more suitable for the actual state of channel. The receiver sensitivity at a BER of 3.8E-3 is improved from −15 dBm to −16 dBm based on 3-symbols ETC and 161-taps Pre-EQ. It exhibits only 0.6 dB receiver sensitivity penalty by using the simplified receiver compared to the best sensitivity result. Finally, Table 1 summarizes the properties of 112 Gbit/s PAM-4 system using 25 G-class commercial EML and APD over 40 km un-amplified transmission by employing different DSP schemes introduced in this section.

## 4. Computational complexity analysis

The complexity is one of the most attention issue for short reach optical communications. In this section, the computational complexities of above mentioned DSP algorithms are analyzed and discussed in detail. Here, their required number of mathematical operations, including real multiplication (RM), real addition (RA), comparison and LUT, are evaluated for every symbol.

Among the implementations of pre-emphasis, Pre-EQ, and LMS equalization, FIR filter is a common computational module. For a static real-valued FIR with *N*-taps, it requires *N* RMs and *N-*1 RAs for every output sample. If to deal with twofold oversampled sequences, there are 2*N* RMs and 2*N-*2 RAs required for every symbol. For T/2-spaced LMS equalizer, the tap updating, error calculation and equalization are done every two samples. Therefore, it requires 2*N* + 1 RMs and 2*N* RAs for every adaptive iteration, namely every symbol. Regarding ETC, the complexity for the error table generation is here ignored, because a fixed error table is employed for all transmission scenarios. In this case, ETC only requires 1 RA and 1 LUT for every symbol. For DD-FTN module, it includes two DSP blocks. In the first post filter - *H*(*z*) *=* 1 *+ αz*^{−1}, it requires 1 RM and 1 RA for every symbol, respectively. After then, for a MLSE with 3-symbol pattern memory, it requires *M*^{2} RMs, 3*M*^{2} RAs, and *M*(*M−*1) comparisons for M-ary PAM signals [22]. Hence, there are 16 RMs, 48 RAs and 12 comparisons for PAM-4. Table 2 summarizes the computational complexities for each DSP module.

In addition, FDE is a useful approach to reduce the complexity of FIR with large number of taps. An overlap-save method with 50% overlap is used to convert circular convolution to linear convolution [23], so in order to obtain *N*/2 equalized symbols from a static equalization in the frequency domain, we need 2*N* complex multiplications for compensation calculations of one block, 2 2*N*-point FFT/IFFT processes, which include 1 2*N*-point FFTs for signal inputs, 1 2*N*-point IFFTs for signal outputs [24]. Here, 1 2*N*-point FFTs for tap coefficients is ignored because they are invariable. As each complex multiplication requires 4 RMs and 2 RAs, and for real data the complexity of an 2*N*-point FFT/IFFT implementation, it requires 2*N*log2(*N*) RMs and 3*N*log2(*N*) RAs by using the classical radix-2 algorithm. Therefore, the total computational complexity for a static FIR implemented in the frequency domain includes 16 + 8log2(*N*) RMs and 8 + 12log2(*N*) RAs for every symbol.

As in the case of the standard LMS algorithm implemented in the frequency domain to update tap weights, it requires additional 1 2*N*-point FFT, *N* RMs and *N-*1 RAs to construct the gradient, and 4*N* RMs and 4*N* RAs to update the tap-weight for one block. In summary, adaptive LMS-based FDE requires a number of RMs per symbol equal to

Figure 16 depicts the total numbers of the computational operations for every symbol as a function of the tap length for different equalization approaches. It can be seen that it will be more efficient to implement a fixed FIR filter by converting the signal to the frequency domain if the number of taps is larger than 30. Regarding adaptive LMS equalization, the reversal point is round at the tap length of 70. Therefore, if the number of taps is larger than the values in the corresponding scenarios, FDE can be employed to reduce the computational complexity.

In Table 3, the total concrete numbers of computational operations per symbol are counted by using different DSP strategies of Table 1. Here, with the use of DSP strategy in Sec 3.2, to achieve the best receiver sensitivity requires total 847 operations, and the computational operations can be reduced to 543, owing to 161-taps LMS equalization implemented in the frequency domain. In this case, about 36% computational complexity reduction can be achieved for 161-taps LMS by using FDE. Regarding the preferred DSP strategy in Sec 3.3, namely employing ETC and Pre-EQ at the transmitter and without using DD-FTN at the receiver side, the receiver computational complexity can be reduced by 83% (1-125/(645 + 78)), and total complexity of whole transceiver can also be decreased slightly (from 847 to 769). In addition, if implementing Pre-EQ in the frequency domain, the total computational operations can be reduced to 297.

Power consumption and latency are important factors for short reach optical transmission systems. In [25], the energy consumptions in terms of the number of digital operations performed on each bit of input data were analyzed in detail. There is a small power variation when the number of digital operations per bit is within 1000. Depending on the use of above three DSP strategies, the required numbers of digital operations per bit are in the range of 149 to 424. Therefore, the energy consumptions for employing these three DSP strategies will be little changed. In terms of latency, the FDE latency with FFT sizes between 64~256 is during 1~10 ns depending on the baud rate [26], but the latency of HD-FEC for 100 G links is more than 100 ns [27]. Therefore, the DSP latency is still dominated by FEC. Nevertheless, with the exponential increase in capability of the complementary metal-oxide-semiconductor (CMOS) integrated circuits, the powerful DSP technologies will be easily implemented.

## 5. Conclusion

In this paper, we have experimentally studied the performance of single-lane 112 Gbit/s PAM-4 amplifier-less transmission at O-band with commercial 25 G-class EML and APD by employing different DSP strategies, including ETC, Pre-EQ, LMS-based equalization and DD-FTN algorithms. Here, all the pre-compensation parameters were created by the proposed JEEQ module. With the use of 3-symbols ETC, 161-taps LMS and 64-states DD-FTN, the best receiver sensitivity at the BER of 3.8E-3 can be reached −16.6 dBm after transmitting over 40 km amplifier-less link. To the best of our knowledge, this is a new record receiver sensitivity for 112 Gbit/s amplifier-less transmission at O-band over 40 km SSMF. Furthermore, we have also analyzed the computational complexity of different DSP modules in detail. It is interesting to notice that employing ETC and Pre-EQ in the transmitter can effectively reduce the computational complexity of the receiver at the cost of sacrificing small receiver sensitivity.

## Funding

National Natural Science Foundation of China (61671053, 61435006, 61871030); Fundamental Research Funds for the Central Universities (FRF-BD-17-015A); Foundation of Beijing Engineering and Technology Center for Convergence Networks and Ubiquitous Services; State Key Laboratory of Advanced Optical Communication Systems Networks, China ; Hong Kong Polytechnic University (1-ZVGB, G-SB65, and 4-BCCK); RGC of Hong Kong SAR government (152248/15E).

## References

**1. **K. Zhong, X. Zhou, J. Huo, C. Yu, C. Lu, and A. P. T. Lau, “Digital signal processing for short-reach optical communications: A review of current technologies and future trends,” J. Lightwave Technol. **36**(2), 377–400 (2018). [CrossRef]

**2. **J. Wei, Q. Cheng, R. V. Penty, I. H. White, and D. G. Cunningham, “400 Gigabit Ethernet using advanced modulation formats: performance, complexity, and power dissipation,” Commun. Mag. **53**(2), 182–189 (2015). [CrossRef]

**3. **E. El-Fiky, M. Chagnon, M. Sowailem, A. Samani, M. Morsy-Osman, and D. V. Plant, “168-Gb/s single carrier PAM4 transmission for intra-data center optical interconnects,” Photon. Technol. Lett. **29**(3), 314–317 (2017). [CrossRef]

**4. **K. Zhong, X. Zhou, Y. Gao, W. Chen, J. Man, L. Zeng, A. P. T. Lau, and C. Lu, “140Gb/s 20km transmission of PAM-4 signal at 1.3um for short reach communications,” Photonic. Tech. Lett. **27**(16), 1757–1761 (2015). [CrossRef]

**5. **W. Yan, T. Tanaka, B. Liu, M. Nishihara, L. Li, T. Takahara, Z. Tao, J. C. Rasmussen, and T. Drenski, “100 Gb/s optical IM-DD transmission with 10G-class devices enabled by 65 Gsamples/s CMOS DAC core,” in Optical Fiber Communication Conference, 2013, paper OM3H.1. [CrossRef]

**6. **K. Zhong, X. Zhou, T. Gui, L. Tao, Y. Gao, W. Chen, J. Man, L. Zeng, A. P. T. Lau, and C. Lu, “Experimental study of PAM-4, CAP-16, and DMT for 100 Gb/s short reach optical transmission systems,” Opt. Express **23**(2), 1176–1189 (2015). [CrossRef] [PubMed]

**7. **M. I. Olmedo, T. Zuo, J. B. Jensen, Q. Zhong, X. Xu, S. Popov, and I. T. Monroy, “Multiband carrier-less amplitude phase modulation for high capacity optical data links,” J. Lightwave Technol. **32**(4), 798–804 (2014). [CrossRef]

**8. **J. Shi, J. Zhang, Y. Zhou, Y. Wang, N. Chi, and J. Yu, “Transmission performance comparison for 100-Gb/s PAM-4, CAP-16, and DFT-S OFDM with direct detection,” J. Lightwave Technol. **35**(23), 5127–5133 (2017). [CrossRef]

**9. **S. Fan, Q. Zhuge, M. Sowailem, M. Osman, T. Hoang, F. Zhang, M. Qiu, Y. Li, J. Wu, and D. V. Plant, “Twin-SSB direct detection transmission over 80 km SSMF using Kramers-Kronig receiver,” in European Conference of Optical Communication, 2017, Paper W.2.D.

**10. **X. Liu and F. Effenberger, “Emerging optical access network technologies for 5G wireless,” J. Opt. Commun. Netw. **8**(12), B70–B79 (2016). [CrossRef]

**11. **W. Wang, P. Zhao, Z. Zhang, H. Li, D. Zang, N. Zhu, and Y. Lu, “First demonstration of 112 Gb/s PAM-4 amplifier-free transmission over a record reach of 40 km using 1.3 μm directly modulated laser.” in optical Fiber Communication Conference, 2018, Paper Th4B. 8.

**12. **M. Nada, Y. Muramoto, H. Yokoyama, T. Ishibashi, and S. Kodama, “High-sensitivity 25 Gbit/s avalanche photodiode receiver optical sub-assembly for 40 km transmission,” Electron. Lett. **48**(13), 777–778 (2012). [CrossRef]

**13. **M. Nada, T. Yoshimatsu, Y. Muramoto, T. Ohno, F. Nakajima, and H. Matsuzaki, “106-Gbit/s PAM4 40-km transmission using an avalanche photodiode with 42-GHz bandwidth,” in Optical Fiber Communication Conference, 2018, Paper W4D. 2. [CrossRef]

**14. **K. Zhong, X. Zhou, J. Huo, H. Zhang, J. Yuan, Y. Yang, C. Yu, A. P. T. Lau, and C. Lu, “Amplifier-less transmission of single channel 112 Gb/s PAM4 signal over 40km using 25G EML and APD at O band,” in European Conference of Optical Communication, 2017, Paper P2.SC6.21. [CrossRef]

**15. **X. Li and J. L. J. Cimini, “Effects of clipping and filtering on the performance of OFDM,” in Vehicular Technology Conference, IEEE 47th. 1997, vol. 3, 1634–1638.

**16. **M. Oerder and H. Meyr, “Digital filter and square timing recovery,” Trans. Commun. **36**(5), 605–612 (1988). [CrossRef]

**17. **R. A. Salvatore, R. T. Sahara, M. A. Bock, and I. Libenzon, “Electroabsorption modulated laser for long transmission spans,” J. Quantum Electron. **38**(5), 464–476 (2002). [CrossRef]

**18. **A. Rezania and J. C. Cartledge, “Transmission performance of 448 Gb/s single-carrier and 1.2 Tb/s three-carrier superchannel using dual-polarization 16-QAM with fixed LUT based MAP detection,” J. Lightwave Technol. **33**(23), 4738–4745 (2015). [CrossRef]

**19. **J. H. Ke, Y. Gao, and J. C. Cartledge, “400 Gbit/s single-carrier and 1 Tbit/s three-carrier superchannel signals using dual polarization 16-QAM with look-up table correction and optical pulse shaping,” Opt. Express **22**(1), 71–83 (2014). [CrossRef] [PubMed]

**20. **P. Gou, L. Zhao, K. Wang, W. Zhou, and J. Yu, “Nonlinear look-up table predistortion and chromatic dispersion precompensation for IM/DD PAM-4 Transmission,” Photonics J. **9**(5), 1–7 (2017). [CrossRef]

**21. **C. Chen, X. Tang, and Z. Zhang, “Transmission of 56-Gb/s PAM-4 over 26-km single mode fiber using maximum likelihood sequence estimation,” in Optical Fiber Communication Conference, 2015, Paper Th4A–5. [CrossRef]

**22. **J. Li, E. Tipsuwannakul, T. Eriksson, M. Karlsson, and P. A. Andrekson, “Approaching nyquist limit in WDM systems by low complexity receiver-side duobinary shaping,” J. Lightwave Technol. **30**(11), 1664–1676 (2012). [CrossRef]

**23. **J. J. Shynk, “Frequency-domain and multirate adaptive filtering,” IEEE Signal Process. Mag. **9**(1), 14–37 (1992). [CrossRef]

**24. **N. Benvenuto and G. Cherubini, *Algorithms for Communications Systems and Their Applications* (John Wiley & Sons, 2002), Chap. 3.

**25. **R. S. Tucker and K. Hinton, “Energy consumption and energy density in optical and electronic signal processing,” Photonics J. **3**(5), 821–833 (2011). [CrossRef]

**26. **M. Sharif, J. K. Perin, and J. M. Kahn, “Modulation schemes for single-laser 100 Gb/s links: single-carrier,” J. Lightwave Technol. **33**(20), 4268–4277 (2015). [CrossRef]

**27. **S. Bates, M. Gustlin, and J. Slavick, “FEC options,” IEEE P802.3bj, Newport Beach, 2011.