Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Adaptive parallel decision deep neural network for high-speed equalization

Open Access Open Access

Abstract

The equalization plays a pivotal role in modern high-speed optical wire-line transmission. Taking advantage of the digital signal processing architecture, the deep neural network (DNN) is introduced to realize the feedback-free signaling, which has no processing speed ceiling due to the timing constraint on the feedback path. To save the hardware resource of a DNN equalizer, a parallel decision DNN is proposed in this paper. By replacing the soft-max decision layer with hard decision layer, multi-symbol can be processed within one neural network. The neuron increment during parallelization is only linear with the layer count, rather than the neuron count in the case of duplication. The simulation results show that the optimized new architecture has competitive performance with the traditional 2-tap decision feedback equalizer architecture with 15-tap feed forward equalizer at a 28GBd, or even 56GBd, four-level pulse amplitude modulation signal with 30dB loss. And the training convergency of the proposed equalizer is much faster than its traditional counterpart. An adaptive mechanism of the network parameter based on forward error correction is also studied.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The explosive growth of Internet Protocol (IP) traffic is driving data centers (DC) into the so-called "Zettabyte (ZB) Era", as predicted by the Cisco Report [1] that expects annual global IP traffic to exceed 4.8ZB by 2022. This drives the bandwidth of optical data center interconnect (DCI) rapidly upgrade towards 800Gb/s, to ensure sufficient connection between servers and improve resource utilization [2]. Generally, there are three approaches to enhance the bandwidth, including the symbol rate, the modulation, and the parallelism. Except the last one, both of the rest require for stronger equalization to recover the signal integrity, as the symbol rate enhancement leads to a stronger inter-symbol interference (ISI), and the advance modulation such as the multi-level pulse amplitude modulation (PAM) compresses the eye height.

The nonlinear decision feedback equalizer (DFE), which enables the transmission beyond 14Gb/s, plays a pivotal role in the traditional signaling system. However, except for the misidentification-induced burst error, the DFE has another intrinsic defect caused by its feedback mechanism. As the following decision relies on the preceding decision, the feedback timing must be limited in a single unit interval (UI), the reciprocal of the symbol rate. Such timing constraint thresholds the applicability of the DFE for the further high-speed scenario. To optimize the feedback timing at the 25GBd era, the DFE tap has already been reduced to the least at the cost of the redundant equalization capability. Circuit design tricks such as unrolling the preceding tap can loose the timing [3], but the prediction logic is doubled for the non-return-to-zero (NRZ) signal or even quadrupled for the four-level pulse amplitude modulation (PAM-4) signal. A reduced-state sequence detector (RSSD) is also developed to break the feedback chain [4,5]. To tackle this challenge from another approach, a novel low power implementation of feed-forward equalizer targeted for wire-line receivers is presented in high-speed Serializer/deserializer (SerDes) applications [6].

Meanwhile, SerDes trend to transfer into ADC (analog-to-digital converter)-DSP (digital signal processor) architecture, to take advantage of digital circuits, rather than analogue circuits, to better perform equalization functions. The introduction of ADC-DSP architecture provides an opportunity for the more complex and efficient equalization algorithm, such as the deep neural network (DNN). The DNN is introduced for channel equalization was studied in the past 90’s [7,8]. In recent years, inspired by the advances in machine learning (ML), DNN-based nonlinear equalization has found its applications in long reach interconnect [9,10] and high frequency optical fibre communication system [1117] and visible light communication system [18], where a stronger equalization capability is required.

The operating mechanism of the mainstream ML-based equalizers can be divided into the nonlinear classification and the regression. Several nonlinear equalization methods, like Volterra [13,15], MLSE [16], novel Gaussian kernel-aided deep neural network (GK-DNN) [14,18], functional-link neural network (FLNN) [17] has been also employed. These applications, mainly in a long-halt communication, pay fewer concerns in power consumption and the hardware resources. However, the power consumption is also a matter of significant importance in DC system. Future DCIs will thus be expected to carry more data while consuming less. If inherit the existing technology, the power consumption at the front panel of the future switch board with 51.2Tb/s switch capability will exceed 500W, mainly contributed by DSP chips inside optical transceivers for signal equalization over the inter-board data communication. The energy efficiency of data transmission will have to be reduced to around 1pJ/b from several tens of pJ/b today [19]. That is quite a challenge for the data movement system. However, the DSP-based multi-layer neural network is too complex and too power-hungry for resource-restricted hardware.

Some low complexity algorithms, such as a light-weight polynomial activation neural network (PANN)-based equalizer [20], are investigated for mitigation of the ISI and nonlinear distortion effects for short-range links in DCIs. The parallel equalizer, which shares one DNN to equalize multiple symbols, is another proposal to improve the utilization of hardware resources [21,22]. They employ the parallel symbols outputted by the last output layer directly for the DNN training and inference, and adopt the soft decision method to acquire performance gain at the cost of decision logic complexity. The logical complexity grows exponentially, or at least super-linearly, with the sequence length. However, the parallel procession distributes the continuous symbol stream into several groups, and thus weakens the ISI correlation between symbols in each group. As a result, the performance gain of the soft decision mechanism will disappear at high parallelism in the communication of approximate random data. Parallel equalizers based on low-complexity recurrent neural network (RNN) is built [23,24] to take advantage of its effective use of time series information. To solve the problems of gradient disappearance and gradient explosion in long sequence training, a multi-symbol output long short-term memory (LSTM) equalizer is designedfor single-channel 212Gbps IM/DD PAM-4 system [25]. Both of these networks introduced feedback loop to enhance the performance.

In this paper, an parallel DNN equalizer based on hard decision (HD) is proposed to simplify the equalization possessing. The design balances between the performance and the hardware overhead. The network is trained by data generated from pseudo-random binary sequence (PRBS), and then the forward error correction (FEC) algorithm is employed to offer expected labels for further adaptation in the inference phase with real-world transmitted data and to rectify the overestimation caused by the PRBS-based training [26,27].

The remainder of the paper is organized as follows. Section 2 presents the parallel modification and the FEC-based adaptation framework. In section 3, simulation results are presented and finally section 4 concludes the paper.

2. Parallelization and adaptivity

2.1 Parallel neural network

The traditional DNN equalizer is shown in Fig. 1 (top). One way to parallel the network is to expand the output layer for multi-symbol decision. However, due to the soft-max decision algorithm, the width of the output layer in such network increases exponentially, as the estimation category expands exponentially. For instance, if the parallelism degree is $n$, it needs $2^n$ output neurons for a NRZ system, or even $4^n$ ones for a PAM-4 system. The expansion of the output layer will dramatically decrease the estimation accuracy if hidden layers in the network do not correspondingly expand. As a result, its hardware resource consumption may be even larger than $n\times$ neuron.

 figure: Fig. 1.

Fig. 1. Structure of the parallel DNN equalizer (bottom) and its traditional counterpart (top). In the top figure, the equalizer processes one symbol with its $n_\mathrm {-}$ previous taps and $n_\mathrm {+}$ post taps. At the output layer (Layer $M$), the outputs $\boldsymbol {a}^M$, of which each element corresponds to one estimation category, are fed to the decision layer to determine the symbol with the principle of soft-max, and generate the cost $C$ along with the expected labels $\boldsymbol {y}_{\mathrm {exp}}$. In the bottom figure, the equalizer processes $n$ sequenced symbols with $n_\mathrm {-}$ previous taps and $n_\mathrm {+}$ post taps. At the output layer (Layer $M$), the normalized outputs $\boldsymbol {a}^M$, of which each element corresponds to one decision symbol, are fed to the decision layer to determine the middle $n$ symbols with the principle of proximate, and generate the cost $C$ along with the expected labels $\boldsymbol {y}_{\mathrm {exp}}$. In both scenarios, The stochastic gradient descent (SGD) algorithm is employed to train the network.

Download Full Size | PDF

An improved parallelization is to decide the parallel symbols outputted by the last output layer directly. As shown in Fig. 1 (bottom), to restrict the scale of the neural network, it is suggested to employ interception HD for category determination, rather than the soft-max decision algorithm in the traditional DNN equalizer. Each symbol is determined from an element $a^M_k$ of the DNN output $\boldsymbol {a}^M$ simply following the principle of proximate. Correspondingly, at the input layer, denoted as Layer 0, the inputs of the network, $\boldsymbol {a}^0=\{a^0_k\}$, are quantified values of the sampling symbol and its $n_\mathrm {-}$ pre-taps and $n_\mathrm {+}$ post-taps from ADCs. They are 8-bit unsigned integers mapped from 7-bit signed digital values.

Specifically for an 8-bit DNN system, because all element values are limited between 0 and 255, the algorithm takes 128 as the boundary between symbol 0 and 1 for NRZ symbols, and takes 43, 128, and 213 as boundaries between symbol 0, 1, 2, and 3 for PAM-4 symbols. More specific to the hardware implementation, it only intercepts the first one or two bits at the big-end of the output value as the decision result for NRZ or PAM-4 modulation, respectively.

The presupposition of interception HD implementation is normalization prepossessing, which is completed by the activation function. Modified from the popular rectified linear units (ReLU), we employ the following activation function for both the output layer and hidden layers.

$$\sigma(x)= \left\{\begin{matrix} 255, & x>255 \\ x, & 0\leq x\leq 255 \\ 0, & x<0 \end{matrix}\right.$$

This nonlinear transformation is easy to implement by an 8-bit register, and normalizes intermediate values with the utmost precision. This activation function is simple but effective, and its derivative is simply

$$\sigma'(x)= \left\{\begin{matrix} 1, & 0\leq x\leq 255 \\ 0, & \textrm{elsewhere} \end{matrix}\right.$$

Taking advantage of the modified decision mechanism, an M-layer neural network is constituted by only $M(n-1)$ more neurons when the parallelism degree increases to $n$. The neuron increment during parallelization is only linear with the layer count, rather than the neuron count in the case of duplication.

The back-propagation algorithm [28] with stochastic gradient descent (SGD) method is used to train the network and minimize the cost $C$, which is defined to measure the discrepancy between the network output $\boldsymbol {a}^M$ and the expected output $\boldsymbol {y}_{\mathrm {exp}}$.

2.2 Symbol distribution

In the practical implementation, the parallelism degree of a DNN equalizer depends on the width of front-end ADCs $N$, which is commonly a power exponent of 2, such as 32 or 64. The parallelization of just one parallel neural network is hardly cover such requirement. Balance between the complexity and the performance of the network is a matter of delicacy, as discussed in the following section. Duplicating the parallel neural network is still a necessary supplementary mean. Practically, $P$ copies of the parallel neural network equalizer are instantiated to parallel equalize the deserialized input symbols from the front-end ADCs. Consequently, the parallel input symbols are distributed into $P$ groups, each contains $n_\mathrm {-}$ pre-tap symbols, $n$ target symbols, and $n_\mathrm {+}$ post-tap symbols, as shown in Fig. 2. Target symbols in adjacent groups are piped from front-end head to tail. Pre-tap symbols in each group are not equalized by the present network, but the previous one. And post-tap symbols in each group are not equalized by the present network either, but the next one. The last $n_\mathrm {+}$ symbols piped in from the front-ends cannot be processed in the present cycle as their post-taps have not arrived yet, so they are buffered to be processed in the next cycle. To process these $n_\mathrm {+}$ symbols received in the previous cycle, $n_\mathrm {-}$ symbols before them are also needed to be buffered as the pre-taps them. As a result, $n_\mathrm {+}+n_\mathrm {-}$ symbols at the tail of each deserialized symbol sequence are buffered as the first $n_\mathrm {+}+n_\mathrm {-}$ symbols processed by Network 0 in the next cycle, followed by symbols arriving at the next cycle. Then the choices of the network parameters must satisfy the following restriction.

$$N = nP+n_{\mathrm{+}}+n_{\mathrm{-}}$$

As mentioned previously, the first several taps of the first DNN equalizer have no valid inputs at the very beginning. But it does not matter because the training procedure experiences thousands of cycles to converge. The consequence of these invalid inputs vanishes during the training. Any random values, or even all zeros, can be assigned to them via the intermediate buffers when initialization.

 figure: Fig. 2.

Fig. 2. Symbol distribution of the parallel DNN equalizer. Buffers are inserted to provide pre- or post-taps to DNN equalizers at boundaries.

Download Full Size | PDF

2.3 FEC-based adaptation

Labels are indispensable to train a DNN equalizer. As mentioned in the preceding section, both two traditional label sources, the estimation result and the PRBS prediction, have their own imperfections, such as the small probability of misidentification and the overestimation. They are competent for the coarse adjustment of network parameters during the phase of network training. However, a succeeding mechanism is essential to finely adjust network parameters based on credible expectations from true random bit sequences or real-world transmission data. On the other hand, no channel is immutable, no matter electrical or optical, and its transmission property is affected by the ambient temperature and humidity, and the physical deformation. As a result, the equalization DNN should be finely adjusted adaptively to keep performance without interrupting.

In this section, a parameter feedback architecture based on FEC mechanism, as shown in Fig. 3, is introduced to fine adjust the DNN equalizer adaptively after the training procedure based on received real-world data. In the architecture, the commonly used FEC algorithm is employed to generate real-time labels for unpredictable business data during the inference procedure. As the DNN equalizer is previously coarsely trained, the bit error rate (BER) of this equalization system is low enough for normal operation of the FEC mechanism, and thus the FEC decoder can offer reliable labels with few errors. Based on these recovered labels, the neural network parameters are continuously adjusted.

 figure: Fig. 3.

Fig. 3. Parameter feedback architecture for adaptive training of the DNN equalizer. The data stream is pushed forward along the main data path with no loops, and only parameters are feedback to update the neural network.

Download Full Size | PDF

One thing to be taken notice is that only the labels generated from the data, but not the data itself, feedback to the SGD algorithm to update the DNN parameters. The data stream is pushed forward along the main data path without loops, and thus no timing constraint problem is caused by such feedback.

Another thing worthy of attention is the FEC decoding latency. For general Reed-Solomon (RS) FEC mechanism in commercial Ethernet system, such as RS(528,514) and RS(544,514), the decoding module takes 55 to 71 cycles to finish the correction. The 3-way interleaved error checking and correcting (ECC) FEC, employed by the latest PCI Express Base Specification, can reduce the correction latency to no fewer than 5 cycles even at the expense of its correction capability. Such significant delay of label generation requires unacceptable memory resources to buffer the inference result at the output layer for the cost computation until the arrival of the corresponding label. A cluster sampling method is employed to tackle this challenge. The data stream is divided into sequential clusters. Only the first output group $\boldsymbol {a}^M$ in a cluster is buffered to wait its own label for cost computation, and the rest are all discarded after deciding. The decision of the first output group is corrected by the FEC decoder to generate the reliable label and then is fed back to compute the cost alone with the original outputs waiting in the buffers. By properly choose the size of the cluster, the first output group of the next cluster is just generated at the exact cycle when the last feedback label arrive and the cost is computing. And the buffers are just ready to store the new output of the next cluster. Taking advantage of the cluster sampling method, memories to store the outputs in only one cycle are enough no matter how long the FEC decoder generates reliable labels. The drawback is that the adjust of the network costs more time, as a large amount of effective information is discarded. However, the negative impact is not significant, because the proposed FEC-based adaptation only provides fine adjusting of the neural network, and the parameter offset is subtle.

3. Evaluation

3.1 System setup

The evaluation system is setup as follows. The signal generated from a source is transformed by a channel and then sampled into the evaluated equalizer. To evaluate the equalization performance, the BER is calculated by compared the equalization result with the original signal at the source.

The signaling performance of different equalizers is evaluated under a PAM-4 signal source in the amplitude set $\{-1, -1/3, 1/3, 1\}$, mapping from a random 2-bit binary data sequence. The reason for choosing the PAM-4 modulation is that it is the industry mainstream at the age of 56Gb/s and 112Gb/s, as such technology can improve the frequency spectrum efficiency of a high speed serial link and gain better channel performance. The channel used in the simulation is a model with $S$ parameters extracted from a channel with 317mm Megtron7N PCB trace and two connectors, as characterized in Fig. 4 (left). The large fluctuation of the amplitude-frequency characteristic curve at 28GHz shows that the channel introduces nonlinear impairments. The transmission function of the channel is a 48th order function fitted from its $S$ parameters. Figure 4 (right) shows the sampled data points of PAM-4 signal changing with time after the channel, and it can be seen that the signal-to-noise ratio is very low.

 figure: Fig. 4.

Fig. 4. The amplitude-frequency and phase-frequency characteristics of the simulated channel (left) and the received sample distribution (right). The channel is with a 317mm-long PCB trace and two connectors, and the sampled signal is PAM-4 signal at 112Gbps. The solid line is of the magnitude and the dotted line is of the detrended phase.

Download Full Size | PDF

Traditional $B$-tap DFE with $K$-tap feed forward equalizer (FFE) is evaluated as a reference. The tap weights of both the FFE and DFE are determined by the least mean square (LMS) algorithm. Specifically, the timing constrain is discarded in the evaluation, so multi-tap DFE is also simulated even the practical equalizer at the age of 56Gb/s generally has only one DFE tap. The evaluation criterion is the BER rather than the symbol error rate (SER), because the BER is what the upper layer of a transmission system really care about. As a result, the decision result of the PAM-4 symbol is decoded into the binary bit stream before comparing with the original source. In fact, different choice of the coding method for the PAM-4 modulation has influence on the final BER in condition of the same SER. The widely-used gray coding is employed, of which the mapping between a PAM-4 symbol and a binary bit pair is from set $\{-1, -1/3, 1/3, 1\}$ to set $\{00, 10, 11, 10\}$.

3.2 Equalization performance

Firstly, the DFE architecture is evaluated for reference. A 28GBd PAM-4 signal, with 56Gb/s bit rate and 14GHz Nyquist frequency, is transmitted for signaling, of which the transmission attenuation is about 15dB. Here we illustrate the training convergences with different FFE tap number $K$ and DFE tap number $B$, as shown in Fig. 5 (left). It is obvious that increasing the tap number of FFE filters can improve the equalization performance and accelerate the convergency. And the DFE mechanism can significantly enhance the BER performance of an FFE equalizer. The BER performance gap between equalizer with and without DFE is remarkable. Against our expectation, the DFE tap number has little influence on the BER performance. One additional DFE tap improves little with a 15-tap FFE. However, the situation is completely different when the input signal quality gets worse, and the performance gets considerable improvement even adding one addition DFE tap. While the input symbol rate is enhanced to 56GBd with about 30dB transmission attenuation, the equalizer even with one DFE tap nearly fails to converge.

 figure: Fig. 5.

Fig. 5. Training convergences of equalizers with different FFE-DFE taps at 28GBd (left) and 56GBd (right) PAM-4 signal over the channel with about 15dB attenuation.

Download Full Size | PDF

As shown in Fig. 5 (right), its final BER is as high as $1/4$ even with as many as 15 FFE taps. Only the 2-tap DFE successfully completes the training and converges, and the final simulated BER can prospectively reach 8E-3. At 28GBd PAM-4 signal over the channel with about 15dB attenuation, the DNN equalization performance is significantly better than that of a DFE equalizer. The optimized BER can reach as low as 7E-4 with only one hidden layer, as shown in Fig. 6 (left), while the DFE equalizer even with 15-tap FFE can only reach no lower than 1E-3, as shown in Fig. 5 (left). In the case of the 30dB-loss 56GBd PAM-4 channel, as shown in Fig. 6 (right), the performance of an optimized DNN equalizer can reach at the BER level of 8E-3, which rivals a 2-tap DFE equalizer with 15-tap FFE, as shown in Fig. 5 (right).

 figure: Fig. 6.

Fig. 6. Training convergences of different DNN equalizers at 28GBd (left) and 56GBd (right) PAM-4 signal over the channel with about 15dB attenuation.

Download Full Size | PDF

On the other hand, the DNN equalizer has better training efficiency. In general, for traditional DFE architecture equalizer, no matter what combination of the FFE and DFE taps are configured, the coefficients require no less than $200\mu s$ to converge, even for 56GBd symbol stream. As shown in Fig. 5, no DFE equalizer completes the convergency at the end of the $180\mu s$ simulation. All these curves are still downward trend at the tail. The convergency of a DNN equalizer is much faster. For a compact network of which the parallelism $n=5$, the network coefficients converge at no more than $100\mu s$, as shown in Fig. 6 (left). Once enhanced the symbol rate to 56GBd, as shown in Fig. 6 (right), the convergence time can even be condensed into $50\mu s$.

3.3 Adaptation performance

To evaluate the adaptation performance of the DNN equalizer, the channel character is slightly modified after the network is well trained and has switched to the adaptive FEC-tuning mode. As shown in Fig. 7, the BER of the transmission system jumps to a higher level at $60\mu s$ and $120\mu s$ respectively because of a channel disturbance. The first disturbance at $60\mu s$ is quite weak, and the BER only slightly jumps to 2E-2 level and quickly drops below 1E-2. However, the second disturbance results in a plateau on the converging curve and the BER curve drops much slower. Analysis reveals that the appearance of the plateau is due to the high bare BER from the equalizer exceeding the correction threshold of the FEC mechanism. In such condition, the decoder may fail in correcting some input FEC codewords occasionally and generate unreliable labels. All these unreliable labels, marked by the FEC decoder, are discarded. As a result, the adaptation slows down due to the interruption of adaptation procedure caused by the label lost. Further more, the higher the bare BER is, the slower the adaptation curve drops. As the uncorrected codeword rate increases as the bare BER increases.

 figure: Fig. 7.

Fig. 7. The BER adaptive convergency with channel disturbances. The DNN equalizer quickly adapts the disturbed channel after the disturbance at $60\mu s$. And after a severe disturbance at $120\mu s$, the adaptation efficiency is reduced.

Download Full Size | PDF

3.4 Network optimization

The prior thing to design a DNN architecture is to optimized that how many hidden layers is needed. The number of hidden layers is optimized in the case of parallelism 5 with 5 pre-symbols and 5 post-symbols. the BER of DNN equalizers without hidden layer is only 3.2E-2, one 10-neuron hidden layer can improve the performance to 8.6E-3, and the effect of an additional 10-neuron hidden layer is not so remarkable, of which the BER only slightly drops to 7.9E-3. Further adding more hidden layers may contribute no positive effect, as a DNN with three 10-neuron hidden layers can only provide a BER of 3.0E-1, even worse than an equalizer with no hidden layer. The reason why more hidden layers degrade the DNN performance is probably as following. More neurons expand the optimize space and thus the convergency of the algorithm has a stronger dependence on the initial point. And the result is that the algorithm may fails to converge to the optimal point if the initial point is not properly set.

On the other hand, the implementation of more hidden layers requires more hardware resource. From the position of hardware implementation, a better choice is to reduce the hidden layers. As a conclusion, for the application scenario of equalization, one hidden layer is sufficient.

Except the number of neurons in hidden layer $n_{\mathrm {h}}$, 3 parameters are used to determine how many neurons the input layer and output layer consist of. They are the parallelism $n$, and numbers of the pre-taps and post-taps, $n_\mathrm {-}$ and $n_\mathrm {+}$. In the following contents, we use an express as ‘$n_\mathrm {-}-n-n_\mathrm {+}-n_{\mathrm {h}}$ network’ to indicate a specific 3-layer DNN. As mentioned above, the compact network with less neurons has better convergency performance than the complex one. So neurons in each layer are not the more the better.

The parallelism $n$ determines the size of a DNN equalizer. The DNN equalizer evaluation results show that once the network scale is sufficient, equal scale expanding it barely has the same BER performance with the original lightweight network. For instance, a 6-10-6-20 network has the same 1.1E-2 BER with a 3-5-3-10 network. In another word, once the DNN has sufficient scale, copying or expanding it have the same effect. Though the two ways require the same neurons, expanding a network makes the interconnects between layers much more complex, and thus needs more ‘multiply–accumulate’ operators. As a conclusion, copying a primary network with sufficient scale is a better choice than expanding it.

During the transmission, each symbol has diminishing effect to several symbols following it. As a result, each received symbol is affected by its previous symbols, and received symbols following also contain information of the current symbol. This is the reason why to introduce pre- and post-taps in the system. Both parameters, $n_\mathrm {-}$ and $n_\mathrm {+}$, are determined by the reach of such inter-symbol interference. The evaluation results reveal that in the 56GBd PAM-4 experiment, 5 pre-taps and 3 post-taps are sufficient for a DNN with parallelism $n=5$. The BER performance of such 5-5-3-10 network can reach 8.5E-3. More pre- or post-taps may enhance the BER performance, but the promotion is not obvious.

The DNN performance increases as the the hidden layer gets more neurons, as shown in Fig. 6 (right), however, the performance growth slows down. As a rule of thumb, it is a good compromise between the BER performance and the hardware resource that $n_{\mathrm {h}}$ is double of $n$.

4. Conclusion

In this paper, taking advantage of the HD, the parallel deep neural network architecture is proposed to cut down the hardware resources. For both 28GBd and 56GBd PAM-4 long reach transmission, an optimized 3-layer DNN equalizer can provide a level of BER performance that rivals traditional 2-tap DFE along with 15-tap FFE equalizer with faster training process. Assisted by the FEC mechanism, the DNN equalizer is finely tuned and further adapted to the channel disturbance with the real world transmission data. The evaluation results show great implementation potential of the DNN architecture in the equalization application.

Funding

National Key Research and Development Program of China (2021YFB2206600).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Cisco, “Cisco visual networking index: Forecast and trends, 2017-2022,” White Paper (2018).

2. A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, H. Liu, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat, “Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network,” Commun. ACM 59(9), 88–97 (2016). [CrossRef]  

3. H. Yueksel, G. Cherubini, R. D. Cideciyan, A. Burg, and T. Toifl, “Design considerations on sliding-block viterbi detectors for high-speed data transmission,” in 10th International Conference on Signal Processing and Communication Systems, T. A. Wysocki and B. J. Wysocki, eds. (IEEE, 2016), pp. 1–6.

4. H. Yueksel, M. Braendli, A. Burg, G. Cherubini, R. D. Cideciyan, P. A. Francese, S. Furrer, M. A. Kossel, L. Kull, D. Luu, C. Menolfi, T. Morf, and T. Toifl, “A 4.1 pj/b 25.6 gb/s 4-pam reduced-state sliding-block viterbi detector in 14 nm CMOS,” in 42nd European Solid-State Circuits Conference (IEEE, 2016), pp. 309–312.

5. M. V. Eyuboglu and S. U. H. Qureshi, “Reduced-state sequence estimation with set partitioning and decision feedback,” IEEE Trans. Commun. 36(1), 13–20 (1988). [CrossRef]  

6. R. L. Munagala and U. K. Vijay, “A novel 3-tap adaptive feed forward equalizer for high speed wireline receivers,” in International Symposium on Circuits and Systems (IEEE, 2017), pp. 1–4.

7. G. Kechriotis, E. Zervas, and E. S. Manolakos, “Using recurrent neural networks for adaptive communication channel equalization,” IEEE Trans. Neural Netw. 5(2), 267–278 (1994). [CrossRef]  

8. G. Kechriotis, E. Zervas, and E. S. Manolakos, “Using recurrent neural networks for adaptive communication channel equalization,” IEEE Transactions on Neural Networks, pp. 267–278 (1994).

9. J. He, “J.-W. Li and A.-y. Yang, “High speed channel modeling based on machine learning,” Computer Engineering & Science 43, 984–988 (2021).

10. X.-t. Guo, Y.-w. Lei, and Y. Guo, “Design and implementation of dual-channel serial rapidio for multiple transmission modes,” Computer Engineering & Science 41, 233–239 (2019).

11. W. Zhiquan, L. Jianqiang, S. Liang, L. Ming, L. Xiang, F. Songnian, and X. Kun, “Nonlinear equalization based on pruned artificial neural networks for 112-gb/s ssb-pam4 transmission over 80-km ssmf,” Opt. Express 26(8), 10631–10642 (2018). [CrossRef]  

12. F. N. Khan, Q. Fan, C. Lu, and A. Lau, “An optical communication’s perspective on machine learning and its applications,” J. Lightwave Technol. 37(2), 493–516 (2019). [CrossRef]  

13. L. Zhang, X. Hong, X. Pang, O. Ozolins, and J. Chen, “Nonlinearity-aware 200 gbit/s dmt transmission for c-band short-reach optical interconnects with a single packaged electro-absorption modulated laser,” Opt. Lett. 43(2), 182–185 (2018). [CrossRef]  

14. L. Zhang, X. Pang, A. Udalcovs, O. Ozolins, and J. Chen, “Kernel mapping for mitigating nonlinear impairments in optical short-reach communications,” Opt. Express 27(21), 29567–29580 (2019). [CrossRef]  

15. H. Yamazaki, M. Nagatani, H. Wakita, Y. Ogiso, and Y. Miyamoto, “Transmission of 400-gbps discrete multi-tone signal using >100-ghz-bandwidth analog multiplexer and inp mach-zehnder modulator,” in European Conference on Optical Communication (2018).

16. N. Stojanovic, C. Prodaniuc, L. Zhang, and J. Wei, “210/225 gbit/s pam-6 transmission with ber below kp4-fec/efec and at least 14 db link budget,” in European Conference on Optical Communication (2018).

17. J. Zhang, P. Lei, S. Hu, M. Zhu, Z. Yu, B. Xu, and K. Qiu, “Functional-link neural network for nonlinear equalizer in coherent optical fiber communications,” IEEE Access 7, 149900–149907 (2019). [CrossRef]  

18. N. Chi, Y. Zhao, M. Shi, P. Zou, and X. Lu, “Gaussian kernel-aided deep neural network equalizer utilized in underwater pam8 visible light communication system,” Opt. Express 26(20), 26700–26712 (2018). [CrossRef]  

19. S. Rumley, M. Bahadori, R. P. Polster, S. D. Hammond, D. M. Calhoun, K. Wen, A. Rodrigues, and K. Bergman, “Optical interconnects for extreme scale computing systems,” Parallel Comput. 64, 65–80 (2017). [CrossRef]  

20. J. Zhou, H. Qian, X. Lu, Z. Duan, H. Huang, and Z. Shao, “Polynomial activation neural networks: Modeling, stability analysis and coverage bp-training,” Neurocomputing 359, 227–240 (2019). [CrossRef]  

21. N. Kaneda, Z. Zhu, C.-Y. Chuang, A. Mahadevan, B. Farah, K. Bergman, D. Van Veen, and V. Houtsma, “Fpga implementation of deep neural network based equalizers for high-speed pon,” in Optical Fiber Communications Conference and Exhibition (2020), pp. 1–3.

22. Z. Xu, C. Sun, S. Dong, J. H. Manton, and W. Shieh, “Towards low computational complexity for neural network-based equalization in pam4 short-reach direct detection systems by multi-symbol prediction,” in Optical Fiber Communications Conference and Exhibition (2021), pp. 1–3.

23. X. Huang, D. Zhang, X. Hu, C. Ye, and K. Zhang, “Low-complexity recurrent neural network based equalizer with embedded parallelization for 100-gbit/s/λ pon,” J. Lightwave Technol. 40(5), 1353–1359 (2022). [CrossRef]  

24. X. Huang, D. Zhang, X. Hu, C. Ye, and K. Zhang, “Recurrent neural network based equalizer with embedded parallelization for 100gbps/λ pon,” in Optical Fiber Communications Conference and Exhibition (2021), pp. 1–3.

25. B. Sang, J. Zhang, C. Wang, M. Kong, Y. Tan, L. Zhao, W. Zhou, D. Shang, Y. Zhu, H. Yi, and J. Yu, “Multi-symbol output long short-term memory neural network equalizer for 200+ gbps im/dd system,” in European Conference on Optical Communication (2021), pp. 1–4.

26. T. A. Eriksson, H. Bülow, and A. Leven, “Applying neural networks in optical communication systems: Possible pitfalls,” IEEE Photonics Technol. Lett. 29(23), 2091–2094 (2017). [CrossRef]  

27. I. Kai, Y. Otsuka, Y. Fukumoto, and M. Nakamura, “Overestimation problem with ann and vstf in optical communication systems,” Electron. Lett. 55(19), 1051–1053 (2019). [CrossRef]  

28. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back propagating errors,” Nature 323(6088), 533–536 (1986). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Structure of the parallel DNN equalizer (bottom) and its traditional counterpart (top). In the top figure, the equalizer processes one symbol with its $n_\mathrm {-}$ previous taps and $n_\mathrm {+}$ post taps. At the output layer (Layer $M$), the outputs $\boldsymbol {a}^M$, of which each element corresponds to one estimation category, are fed to the decision layer to determine the symbol with the principle of soft-max, and generate the cost $C$ along with the expected labels $\boldsymbol {y}_{\mathrm {exp}}$. In the bottom figure, the equalizer processes $n$ sequenced symbols with $n_\mathrm {-}$ previous taps and $n_\mathrm {+}$ post taps. At the output layer (Layer $M$), the normalized outputs $\boldsymbol {a}^M$, of which each element corresponds to one decision symbol, are fed to the decision layer to determine the middle $n$ symbols with the principle of proximate, and generate the cost $C$ along with the expected labels $\boldsymbol {y}_{\mathrm {exp}}$. In both scenarios, The stochastic gradient descent (SGD) algorithm is employed to train the network.
Fig. 2.
Fig. 2. Symbol distribution of the parallel DNN equalizer. Buffers are inserted to provide pre- or post-taps to DNN equalizers at boundaries.
Fig. 3.
Fig. 3. Parameter feedback architecture for adaptive training of the DNN equalizer. The data stream is pushed forward along the main data path with no loops, and only parameters are feedback to update the neural network.
Fig. 4.
Fig. 4. The amplitude-frequency and phase-frequency characteristics of the simulated channel (left) and the received sample distribution (right). The channel is with a 317mm-long PCB trace and two connectors, and the sampled signal is PAM-4 signal at 112Gbps. The solid line is of the magnitude and the dotted line is of the detrended phase.
Fig. 5.
Fig. 5. Training convergences of equalizers with different FFE-DFE taps at 28GBd (left) and 56GBd (right) PAM-4 signal over the channel with about 15dB attenuation.
Fig. 6.
Fig. 6. Training convergences of different DNN equalizers at 28GBd (left) and 56GBd (right) PAM-4 signal over the channel with about 15dB attenuation.
Fig. 7.
Fig. 7. The BER adaptive convergency with channel disturbances. The DNN equalizer quickly adapts the disturbed channel after the disturbance at $60\mu s$. And after a severe disturbance at $120\mu s$, the adaptation efficiency is reduced.

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

σ ( x ) = { 255 , x > 255 x , 0 x 255 0 , x < 0
σ ( x ) = { 1 , 0 x 255 0 , elsewhere
N = n P + n + + n
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.