## Abstract

Deep neural network has been used to compensate the nonlinear distortion in the field of underwater visible light communication (UVLC) system. Considering the tradeoff between the equalization performance and the network complexity is the priority in practical applications. In this paper, we propose a novel hybrid frequency domain aided temporal convolutional neural network (TFCNN) with attention scheme as the post-equalizer in CAP modulated UVLC system. Experiments illustrate that the proposed TFCNN can achieve better equalization performance and remain the bit error rate (BER) below the 7% hard-decision forward error correction (HD-FEC) limit of 3.8×10^{−3} when other equalizers loss effectiveness under serious distortion condition. Compared with the standard deep neural network, TFCNN shows 76.4% network parameters complexity reduction.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Underwater visible light communication (UVLC) is an emerging technology in the field of underwater wireless communication in recent years. UVLC can utilize blue/green visible light as the transmission medium whose wavelengths (450∼550nm) are less absorbed by water [1], supporting higher speed and lower latency compared with the widely applied underwater acoustic communication and radio frequency communication. It is to be expected that UVLC would become an indispensable auxiliary of underwater wireless communication to address the ever-increasing demand of future maritime applications such as ocean environmental monitoring, offshore oil field exploration, underwater vehicle surveillance, etc. [2–4].

Laser diodes (LDs) and light emitting diodes (LEDs) are two widely used transmitters in UVLC system. LDs can take the advantage of high light density, and achieve longer transmission distance and higher data rate in point-to-point scenarios. However, it needs accurate alignment for the transceivers, which requires extra effort to ensure normal operation. Unlike LDs, LEDs have the advantages of eye safe, low cost and wide divergence [5]. It is reported that tens of gigabit-per-second transmission speed using LED has been achieved [6]. Generally, due to the complex underwater channel and the intrinsic nonlinear characteristics of the light source etc., nonlinear distortion is non-trivial obstacle encountered in LED based UVLC system [7], which would distort the received signal and become particularly harmful when high order modulation formats are applied for the purpose of high speed transmission.

To alleviate this problem, nonlinear post equalizers have been proposed to compensate the impairments. Volterra series model [8] is a traditional used equalizer by using high order kernels with limited ability. It is noted that the transfer model of UVLC channel is very complicated and is difficult to be expressed in analytical function [9]. Thanks to the superiority of learning to approximate complicated functions [10], deep neural network (DNN) has been utilized to greatly improve the ability of nonlinear distortion compensation in UVLC system recently. The basic DNN structure normally consists one input layer, multiple hidden layer and one output layer, which is also called multi-layer perception (MLP). Different post equalization methods based on MLP have been proposed to improve the system performance. [11] utilized a Gaussian kernel-aided MLP to reduce the training iterations. [12] proposed a dual-branch MLP (DBMLP) as a post equalizer in UVLC system. In [13], a joint time-frequency post equalizer based on deep neural network (TFDNet) was proposed. The total number of network parameters in TFDNet is more than twenty thousand. Such a huge requirement of network spatial complexity would limit its application scenarios, especially in resource-constrained system. For MLP based network, it is difficult to consider the tradeoff between system performance and network complexity concurrently. When enough neural nodes or network parameters are used, they could improve the learning ability of MLP [10], thus increasing the performance yet with high storage requirements. When small size MLP is used, the insufficient neural nodes in the hidden layers have limited learning ability which could lead to the degeneration of system performance. Another thing to notice is that the above-mentioned methods based on MLP are all using densely connected layers, which could bring large computation redundancy. [14] illustrated the redundant parameters in the densely connected layers of DBMLP [12] and proposed to prune the network to achieve similar performance with about half size of the original network parameters. However, pruning technology requires network re-training, which is very time consuming compared with non-pruning methods. In order to enhance the applicability of neural network based post equalizers when it comes to practical applications, we focused our research on the reduction of network size with acceptable performance in the meanwhile. Convolutional neural network (CNN) has drawn our attention since it has been proven to be very useful in feature extraction not only in images [15], but also in one dimension tasks such as natural language processing (NLP) [16]. By resorting to parameter-sharing scheme, CNN has the advantage of lower network complexity than MLP. However, the using of CNN for equalization in UVLC is still in its infancy in literatures up to now, whose ability of equalization cannot be ensured. In addition to exploiting CNN, the inherent properties of signal contain more than temporal domain. Information from frequency domain of signal could also provide valuable messages. The combination of information from temporal domain and frequency domain remains a problem as the contribution for the final results from temporal domain and frequency domain might be different.

Motivated by the observation mentioned above, we for the first time propose a hybrid frequency domain aided temporal convolutional neural network (TFCNN) to compensate nonlinear distortion in UVLC system in this paper. Furthermore, attention scheme is employed to assign diverse weights for information extracted from two domains by self-learning. With the aid of extra information extracted from frequency domain, TFCNN gains more understanding about the signal than temporal network. Such a hybrid structure with attention scheme, taking the advantage of the combination of time-frequency domain simultaneously, could not only outperform other equalizers based on MLP which only process the signal in temporal domain, but also have relative low amount of network parameters by utilizing parameter-sharing convolutional neural network without performance degeneration. Experiments conducted in a 1.2m underwater channel using 64QAM-CAP modulation illustrate that the proposed TFCNN has better equalization performance than Volterra [8], MLP [11], and SDBMLP [14], with a network complexity reduction of 76.4% of MLP. TFCNN also remains the bit error rate (BER) below the 7% hard-decision forward error correction (HD-FEC) limit of 3.8×10^{−3} at 2.85Gbps when other equalizers loss effectiveness under serious distortion condition, which verifies the practical feasibility of TFCNN utilized in UVLC system.

## 2. Principle

In LED based UVLC system, the complex underwater environments such as turbulence, scattering and absorption would distort both the amplitude and phase of the output signal [7]. What’s worse, nonlinear distortion caused by the nonlinear intrinsic characteristics of devices such as the nonlinear electro-optic conversion of LED at the transmitter, the response of electronic amplifier (EA), the square-law detection and saturation effect of photodiode (PD) at the receiver [7] would seriously affect the recovery of the original signal from the received signal. Considering the original signal $x(n)$ which is supposed to be transmitted in UVLC system, the received signal $y(n)$ can be expressed as [13]

Where $H$ function is the channel response which consists nonlinear distortion, noise denotes the additive noise generated in the system. The recovery process is try to obtain signal $\bar{x}(n)$ from the received signal $y(n)$. The closer $x(n)$ and $\bar{x}(n)$ are, the better equalization performance it can achieve. By exploiting the combination of hybrid time-frequency domain information and parameters-sharing CNN, the proposed TFCNN is illustrated in Fig. 1. The details are described below.Firstly, the distorted real-value time series signal $y(n) = [{y_1},{y_2}, \cdots ,{y_n}]$ are considered as input data, which have $n$ sequence data in total. Then the input data are transformed from cascade to parallel with stride of one to form a dataset $T = \{{[{y_1},{y_2}, \cdots ,{y_k}],[{y_2},{y_3}, \cdots ,{y_{k + 1}}], \cdots } \}$, $k$ denotes the number of data in a subset, which is determined by the incidence of inter-symbol-interference (ISI). To obtain frequency domain information, we resort to short time Fourier transformation (STFT) [17]. STFT transform is a better choice than discrete Fourier transform (DFT) for our parallel dataset. [11] verified the inter-symbol interference (ISI) and nonlinearity in UVLC system, and proposed to preprocess the input signal by a Gaussian window function before network training, which has shown improved performance by reducing the influence of ISI. In other words, the interferences of other signal for the middle signal decrease along with the increasing distance away from the middle signal. The main difference between STFT and DFT for our parallel dataset is that STFT has a window function to be applied before DFT, and the default Hamming window function is used in the experiments which has a similar effect with Gaussian window. Considering the better performance of STFT based TFCNN, we choose to use STFT transform instead of applying DFT directly in our following experiments. In each sliding window of STFT, DFT is used to calculate the windowed data with a number of points $D$, which equals to the size of a subset $k$. The sliding window traverses the adjacent subsets in the dataset $T$ with a stride of one. After the frequency transformation, the obtained dataset $F$ which contains the frequency domain information is denoted as $F = \{{{F_1},{F_2}, \cdots } \}$ . The subset of $F$ can be expressed as

where $g(n)$ is the window function, $R$ is the stride. Since ${F_c}$ has complex values, we intersect the real values (denoted as $Re$) and imaginary values (denoted as ${\mathop{\rm Im}\nolimits}$) as ${F_c} = [{Re _c},{{\mathop{\rm Im}\nolimits} _c},{Re _{c + 1}},{{\mathop{\rm Im}\nolimits} _{c + 1}}, \cdots ,{Re _{c + k-1}},{{\mathop{\rm Im}\nolimits} _{c + k-1}}]$ for convenience.After the temporal domain and frequency domain data are prepared, they are sent into the hybrid time-frequency neural network, which has two branches, namely temporal branch and frequency branch. In temporal branch, time dataset $T$ is processed by two sub-branches simultaneously. One sub-branch is the parameter-sharing convolutional layer. Unlike the densely connected layer which has parameter redundancy, convolutional layer includes several filters which indicate the depth dimension of the output columns of the feature maps. Each of these filters within a layer has predefined size which corresponds to the local receptive field of a single unit. For one dimension data, width of filters is considered. The parameter-sharing scheme makes the parameters within one layer more concrete with reduced amount of network parameters. ReLU (Rectified Linear Unit) is chosen as the nonlinear activation function. Two convolutional layers are considered here. Another sub-branch is skip mapping layer which is from the input to the output inspired by ResNet [18] with identity mappings. Then the outputs of two sub-branches are added together to form the temporal branch output ${\bar{x}_T}$, which can be expressed as

where $W_{j,i}^c$ denotes the weight matrix from layer $i$ to layer $j$ in convolutional layer, $W_1^s$ denotes the weight matrix in the skip mapping, ${B_T}$ denotes the bias weights.The frequency branch is similar with the structure of temporal branch, except for the number of input nodes, which is two times of that in temporal branch. One interesting thing to notice is that the output of DFT satisfies Hermitian symmetry [19] which means half of the output could be used as the input of frequency branch. However, we found that the network with full size input of frequency branch outperforms the halved size input by a non-negligible margin in experiments. The reason maybe because that in TFCNN, the frequency domain information is transformed to temporal domain information through the network mapping directly. Correlations of the input signal maybe incomplete for the remapping process of the network with half input in the frequency branch. Thus, we choose to use the whole output of DFT as the input of frequency branch considering the overall performance of TFCNN. Hermitian symmetry aided with other prior knowledge is worth exploring to further simplify the network structure of TFCNN in future work. The output of frequency branch ${\bar{x}_F}$ can be expressed as

In order to let the network decide the contributions of different domain information by its own, we utilize attention layer to assign different weights for the outputs of temporal branch and frequency branch learned by the network. The final output of the network $\bar{x}$ is expressed as where $\alpha$ denotes the learned weight of temporal branch, $\beta$ denotes the learned weight of frequency branch, $\bar{x} = [{\bar{x}_1},{\bar{x}_2}, \cdots ,{\bar{x}_n}]$.The network optimization algorithm is Adam [20]. The loss function of the network is mean square error (MSE). The original transmitted signal $x(n)$ are considered as labels to aid the learning of network.

## 3. Experimental setup

The experimental setup is illustrated in Fig. 2. The format of modulation used in the LED based UVLC system is 64QAM-CAP modulation. Firstly, the original serial data are mapped into 64QAM using QAM mapping function. After 4 times of up sampling, the enlarged data are divided into the inphase (I) and quadrature (Q) parts by IQ separation using digital pulse shaping filters. Then CAP modulation is used followed by normalization. The final data are ready to be transmitted through hardware circuits and underwater channel. In the hardware parts, the data are converted from digital signal to analog electrical signal through an arbitrary wave generator (Tektronix AWG710B) with a sampling frequency varying from 1.7GSa/s to 2.0GSa/s according to different data rate. Then the electrical signal passes through the pre-equalization circuit and power amplifier circuit (PA, Mini-Circuit ZHL-6A-S+) in sequence, and is coupled with bias tee circuit (Mini-Circuit ZFBT-4R2GW- FT+) to drive a silicon substrate blue LED, then through the 1.2m water tank. As a preliminary experiment, static water is used in the experiments. Other disturbances of underwater channel would be considered in future experiments.

At the receiver, the received signal passes through hardware circuits and then is processed by software to recover the original signal. In hardware circuits, the received signal is transformed from light signal into electrical signal by a PIN photodiode (Hamamatsu S10784) with differential power amplifier circuit. Then, the received signal is resampled by an oscilloscope (Keysight DSO9404A) with a sample rate of 2 GSa/s for software processing. In software procedure, the signal is synchronized firstly. Then the proposed TFCNN is used as a post equalizer. Other post equalization methods are also used here. The following processes are the down-conversion and down sampling, during which the down-sampled points maybe not the optimal points with the minimum MSE, which would bring non-negligible linear distortion. The residual linear distortion is compensated by the LMS algorithm which is considered as the second-stage equalizer in the experiments [21]. The usage of LMS does not affect the relative performance of different methods (Volterra, MLP, SDBMLP, TFCNN) as they are evaluated in fair comparisons. Finally, the original data is recovered from QAM de-mapping.

## 4. Results and discussion

We first conduct experiments on choosing the optimal hyperparameters of the proposed TFCNN. Then we evaluate different parts of TFCNN. Finally, we compare TFCNN with other post equalization methods.

The hyperparameters in TFCNN including the number of input nodes in the input layer, the number of filters and the size of filter kernels can determine the equalization performance. In order to choose the proper values of hyperparameters suitable for this UVLC system, bit-error-rate (BER) versus different settings of hyperparameters are shown in Fig. 3. BER value is a measurable indicator of the system performance, which is expected to be lower as it indicates better system performance. Figure 3(a) illustrates BER versus different number of input nodes. The number of input nodes is influenced by the level of ISI. BER decreases along with the increase of the number of input nodes, which is corresponding to the coverage of ISI. The lowest BER is achieved when the number of input nodes is 35. Further increasing the number of input nodes beyond 35 does not improve BER. When the number of input nodes exceeds the coverage of ISI, no more valid information brought by the extra larger input nodes could be used. What’s worse, the noise brought by the extra input data would disturb the performance of the network. Thus, the input nodes of TFCNN is set at 35, which means the input nodes of temporal branch is 35 and the input nodes of frequency branch is 70. For one dimension convolutional layer, the number of filters and the size of filter kernels are mainly responsible for its ability of feature extraction. It is noted that in TFCNN, the temporal branch and frequency branch have the same setting values of hyperparameters. For the tradeoff between performance and complexity, two convolutional layers are used. For the first convolutional layer both in temporal branch and frequency branch, BER versus different number of filters and kernel size are shown in Fig. 3(b). The kernel size has three values, which are 1, 3, 5, respectively. When the kernel size is fixed, BER decreases while increasing the number of filters, which is reasonable as more filters can extract more features. It can be seen from Fig. 3(b) that when the kernel size is 5 and the number of filters is 9, BER remains relatively stable without further decreasing. Therefore, the number of filters is 9 and the kernel size is 5 in the first convolutional layer. For the second convolutional layer, the number of filters is preset at 1 for reducing the network parameters. When different kernel sizes including 1, 3, 5 are tested, BER equals 0.003742, 0.003615 and 0.002979, respectively. Thus, the kernel size of second convolutional layer is 5. It is noteworthy that more filters within one layer and more convolutional layers within the network could be added into the proposed TFCNN for further system performance. The setting values of hyperparameters in TFCNN discussed above are used to maintain equalization performance and low spatial complexity in the meanwhile.

Figure 4(a) shows the weight heatmap obtained from the weight matrix between the input layer and the first hidden layer of MLP. Figure 4(b) shows the weight heatmap between the input layer and the first convolutional layer in temporal branch of TFCNN. The color depth represents the normalized weight values. It is indicated that MLP has many small weight values (light color blocks). Comparing with MLP, the dark color blocks are the majority as shown in the heatmap of CNN. It may be a hint that CNN has less redundancy than MLP by its parameter-sharing scheme, thus reducing the amount of network parameters.

To illustrate the superiority of combining temporal domain and frequency domain than merely considering temporal domain or frequency domain, temporal branch and frequency branch in TFCNN are trained as separate networks, whose performance are evaluated at different peak-to-peak voltage (Vpp) with a fixed bias current in Fig. 5(a). We can see that temporal branch achieves BER below the forward error correction limit of 3.8×10^{−3} which indicates that the convolutional layers can be applied for equalization. Frequency branch also achieves BER under the error threshold, which means that frequency domain information can be transformed to temporal domain information through the network mapping directly. This experiment illustrates that the neural network has strong ability to approximate complex mappings. It is noted that BER of the frequency branch is higher than that of the temporal branch. The reason is mainly because the function of frequency branch is equivalent to frequency equalization. The remapping from frequency domain to temporal domain may have dropout during remapping process. When combining temporal branch and frequency branch together, complementary domain information is learned by the network than merely in temporal domain or frequency domain. An attention scheme is used to learn to assign different weights for the outputs of temporal branch and frequency branch in TFCNN. The weights of temporal branch and frequency branch are different which means the network has different attentions on the extracted features from temporal domain and frequency domain. The changing tendency of weights learned by the neural network in Fig. 5(b) is consistent with the results in Fig. 5(a) to some extent. In Fig. 5(a), the BER gap between the temporal branch and frequency branch becomes wider along with the increasing of Vpp, and it reaches the maximum at 0.8Vpp. Then the gap tends to be narrower gradually with the further increasing of Vpp. Therefore, one reasonable explanation for Fig. 5(b) is that when the Vpp becomes larger in Fig. 5(b), the neural network gives more attention on the information learned by the temporal branch as the BER gap between the temporal branch and frequency branch becomes wider. The BER gap denotes the equalization performance difference between the temporal branch and frequency branch when trained individually. It is observed that the changing tendency of weights (e.g. alpha for time branch) is closely related to the changing of BER gap. In other words, the network focuses on the relative performance other than the absolute performance of two branches. Therefore, when the BER gap becomes wider, the network would have more gain if larger weight is assigned for the time branch which reflects at the increasing of alpha. One the contrary, when the BER gap becomes narrower, the relative performance of time branch becomes unappealing for the network, which reflects the decreasing of alpha. The learning scheme of different weights helps the network combine different domain information in a better way.

We compare the proposed TFCNN with other post equalization methods including Volterra [8], MLP [11] and simplified DBMLP (SDBMLP) [14]. The structures and the amount of network parameters of different algorithms are summarized in Table 1. MLP has two hidden layers activated by ReLU function, the number of nodes in the input layer, hidden layers and output layer are 35, 32, 16, 1, respectively. SDBMLP has one node in the first branch layer and 10 nodes in the second branch layer, activated by Tanh function the same as [12], whose number of network parameters is similar with TFCNN. The number of filters and kernel size are 9 and 5 in the first convolutional layer of temporal branch of TFCNN, which is denoted as (9,5) as shown in Table 1. The second convolutional layer has 1 filter with kernel size 5. The frequency branch has the same setting values with the temporal branch. MLP has the largest network parameters among different algorithms. The number of network parameters in TFCNN is only 23.6% of MLP. The generated training dataset and test dataset follows [12], which have much larger pseudo-random sequence patterns to avoid the overfitting of the neural network. The number of dataset used in the experiment contains 131072, we use 80% for training within which 10% for validation, and the rest 20% for testing. The batch size is 128. The neural networks are implemented by Keras with TensorFlow [22] as backend. The learning rate of the Adam algorithm is 0.001. Besides, we use Xavier normal initializer [23] which is an efficient algorithm to initialize the weights and we initialize the biases to zeros for the networks. To ensure the robustness of the networks, we use the ‘max absolute value’ normalization function to normalization the signal by ${N_{Max\_Abs}}(x) = x/\max (abs(x))$, which follows [9]. The curves of training loss and validation loss for MLP, SDBMLP and TFCNN are illustrated in Fig. 6. It is noted that the validation loss is slightly higher than training loss, which indicates the good fitting of the networks. The network converges gradually with the increasing of training epochs for MLP, SDBMLP and TFCNN. Since the training loss and validation loss in Fig. 6 arrive at a plateau when the epoch reaches around 50. Thus, the training epoch is 50 in the following experiments. For Volterra, we use 5000 training symbols for 5000 training iterations in one training epoch.

In order to evaluate the equalization ability of these methods, BER versus different Vpp with different equalizers are presented in Fig. 7. The operation range of 3.8×10^{−3} with 7% forward error correction (FEC) is considered as a BER threshold and is drawn out by dash line. TFCNN has better equalization performance overall than other equalizers. In Fig. 7(a), Volterra performs worst, which shows its limited compensation effect. 0.7Vpp is the optimal operating Vpp for Volterra. Generally, nonlinear affect becomes larger with higher Vpp in LED based system [24]. MLP has better nonlinear distortion resistance by learning than Volterra with the increasing of Vpp. It achieves lowest BER in 0.8Vpp. However, MLP becomes powerless and exceeds the error threshold when the Vpp is overhigh, which brings serious nonlinear distortion and noise. Due to the insufficient network parameters, SDBMLP is limited in decreasing BER. TFCNN has a similar amount of network parameters comparing with SDBMLP, outperforms SDBMLP with higher Q factors by 0.48dB. The constellations of A, B and C shown in Fig. 7(a) are the constellations after only LMS, TFCNN and TFCNN with LMS as the second-stage equalizer at 1.0Vpp, respectively. It can be seen that both of the nonlinear and linear distortion are serious in the system. A verifies that LMS is powerless for nonlinear distortion which has shown constellation warping. TFCNN corrects the constellation points to be aligned on a rectangular grid in B. The residual linear distortion caused by the process of down-sampling is further mitigated by LMS in C, which illustrates the enhancement of system performance by utilizing the second-stage LMS equalizer. BER versus different Vpp with a higher data rate is shown in Fig. 7(b). More interference is introduced into UVLC system when the data rate is high. Only TFCNN achieves BER error threshold with a valid operating Vpp range of 0.2V and increases the Q factor by 0.37dB than SDBMLP. These experiments demonstrate the effectiveness of the proposed TFCNN, which indicates the feasibility of nonlinear compensation of convolutional layer as it is the active constituent of TFCNN. Therefore, convolutional layer can be utilized to deal with nonlinear distortion encountered in practical applications.

Figure 8 illustrates the frequency spectra and mismatches of received signal after compensation with different methods. Least mean square (LMS) is used as the linear equalizer. When facing severe nonlinear distortion, the high frequency components in signal are more susceptive to be affected by nonlinear distortion. The received signal without equalization has a diverse frequency spectrum with high frequency fading comparing with the original transmission frequency spectrum in Fig. 8(a). It can be seen that LMS is lack of ability to compensate the nonlinear distortion and has the worst performance. Volterra outperforms LMS by its limited nonlinear compensation ability. Other neural network based methods outperform Volterra. The frequency spectrum by TFCNN is more similar with the original frequency spectrum than that of other equalizers. Figure 8(b) also gives a clear view of the frequency spectra mismatches between the original transmitted signal and compensated signal by different equalizers, which means the compensated signal by TFCNN is much closer to the original signal as the purpose of equalization. From Fig. 8 we can observe the improvements from linear equalization to nonlinear equalization, and the improvements from traditional nonlinear equalization to neural network based nonlinear equalization.

BER versus different data rate is evaluated in Fig. 9. The equalization effect of TFCNN at different data rate is different because the nonlinear distortion and noise brought by the system at different data rate are different. Despite this, TFCNN outperforms other equalizers which is consistent with the experimental results as shown above. It is noted that the amount of network parameters of TFCNN is only 1.6% of TFDNet [13]. Furthermore, different from TFDNet which needs IFFT to transform the frequency domain data into temporal domain data, the proposed TFCNN directly mapping the frequency domain to temporal domain through the network without adding extra computation complexity. Therefore, TFCNN can be a competitive candidate for equalization in UVLC system in terms of the tradeoff between performance and network spatial complexity.

We also evaluate the BER performance of MLP and TFCNN with different number of training data. The test dataset contains 13107 data which is unseen by the networks. MLP has four times larger trainable parameters than TFCNN, which is more likely to be under-fitting due to the inadequate network parameters update, thus more training data is needed for MLP to gain better performance. This is the case shown in Fig. 10 that with the increasing number of training data, BER of MLP decreases gradually. On the contrary, TFCNN takes the advantages of parameter-sharing scheme, achieving lower BER than MLP when training with fewer input data. The BER gap between MLP and TFCNN would further be wider in Fig. 7 and Fig. 9 when small amount of training data is used. It is also noted that the BER gap between MLP and TFCNN gets narrower along with the increasing number of training data. However, the number of training data with labels is limited in practical scenarios. Therefore, TFCNN can be used when limited training data is provided in practice. We consider to use data augmentation, unsupervised learning or transfer learning to alleviate the requirements for large training datasets in our future work.

## 5. Conclusion

In this paper, we proposed for the first time a hybrid frequency domain aided temporal convolutional neural network with attention scheme to compensate nonlinear distortion in UVLC system. Experimental results demonstrate that the proposed TFCNN has better equalization performance with small-size network structure. Besides, TFCNN performs better than MLP when trained with fewer training data, which has shown the effectiveness and practical feasibility of TFCNN in UVLC system. Future research will focus on the further simplification and accelerating of TFCNN.

## Funding

National Natural Science Foundation of China (No.61925104, No.62031011).

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **S. Duntley, “Light in the sea,” J. Opt. Soc. Am. **53**(2), 214–233 (1963). [CrossRef]

**2. **N. Chi, Y. Zhou, Y. Ran, and F. Hu, “Visible light communication in 6G: Advances, Challenges, and Prospects,” IEEE Veh. Technol. Mag **15**(4), 93–102 (2020). [CrossRef]

**3. **N. Chi, H. Haas, M. Kavehrad, T. Little, and X. Huang, “Visible light communications: demand factors, benefits and opportunities,” IEEE Wireless Commun. **22**(2), 5–7 (2015). [CrossRef]

**4. **C. Fei, X. Hong, J. Du, G. Zhang, Y. Wang, X. Shen, Y. Lu, Y. Guo, and S. He, “High-speed underwater wireless optical communications: from a perspective of advanced modulation formats,” Chin. Opt. Lett. **17**(10), 100012 (2019). [CrossRef]

**5. **Y. Wang, Y. Wang, N. Chi, J. Yu, and H. Shang, “Demonstration of 575-Mb/s downlink and 225-Mb/s uplink bi-directional SCM-WDM visible light communication using RGB LED and phosphor-based LED,” Opt. Express **21**(1), 1203–1208 (2013). [CrossRef]

**6. **Y. Zhou, X. Zhu, F. Hu, J. Shi, F. Wang, P. Zou, J. Liu, F. Jiang, and N. Chi, “Common-anode LED on a Si substrate for beyond 15 Gbit/s underwater visible light communication,” Photonics Res. **7**(9), 1019–1029 (2019). [CrossRef]

**7. **N. Chi and F. Hu, “Nonlinear adaptive filters for high-speed LED based underwater visible light communication,” Chin. Opt. Lett. **17**(10), 100011 (2019). [CrossRef]

**8. **G. Stepniak, J. Siuzdak, and P. Zwierko, “Compensation of a VLC phosphorescent white LED nonlinearity by means of Volterra DFE,” IEEE Photonics Technol. Lett. **25**(16), 1597–1600 (2013). [CrossRef]

**9. **Y. Zhao, P. Zou, W. Yu, and N. Chi, “Two tributaries heterogeneous neural network based channel emulator for underwater visible light communication systems,” Opt. Express **27**(16), 22532–22541 (2019). [CrossRef]

**10. **K. Hornik, M. Stinchcombe, and H. White, “Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks,” Neural Networks **3**(5), 551–560 (1990). [CrossRef]

**11. **N. Chi, Y. Zhao, M. Shi, P. Zou, and X. Lu, “Gaussian kernel-aided deep neural network equalizer utilized in underwater PAM8 visible light communication system,” Opt. Express **26**(20), 26700–26712 (2018). [CrossRef]

**12. **Y. Zhao, P. Zou, and N. Chi, “3.2 Gbps underwater visible light communication system utilizing dual-branch multi-layer perceptron based post-equalizer,” Opt. Commun. **460**, 125197 (2020). [CrossRef]

**13. **H. Chen, Y. Zhao, F. Hu, and N. Chi, “Nonlinear Resilient Learning Method Based on Joint Time-Frequency Image Analysis in Underwater Visible Light Communication,” IEEE Photonics J. **12**(2), 1–10 (2020). [CrossRef]

**14. **Y. Zhao and N. Chi, “Partial pruning strategy for a dual-branch multilayer perceptron-based post-equalizer in underwater visible light communication systems,” Opt. Express **28**(10), 15562–15572 (2020). [CrossRef]

**15. **A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM **60**(6), 84–90 (2017). [CrossRef]

**16. **Y. Shen and X. Huang, “Attention-based convolutional neural network for semantic relation extraction,” In the 26th International Conference on Computational Linguistics: Technical Papers, 2526–2536 (2016).

**17. **D. W. Griffin and J. S. Lim, “Signal Estimation from Modified Short-Time Fourier Transform,” IEEE Trans. Acoust., Speech, Signal Process. **32**(2), 236–243 (1984). [CrossRef]

**18. **K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 9908 LNCS, 630–645 (2016).

**19. **G. Proakis and M. Salehi, * Digital Communications: McGraw-Hill Science* (Engineering/Math, 2008).

**20. **D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1–15 (2015).

**21. **P. Zou, Y. Zhao, F. Hu, and N. Chi, “Enhanced performance of MIMO multi-branch hybrid neural network in single receiver MIMO visible light communication system,” Opt. Express **28**(19), 28017–28032 (2020). [CrossRef]

**22. **M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, and M. Kudlur, “Tensorflow: A system for large-scale machine learning,” 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). (2016).

**23. **X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” J. Mach. Learn. Res. **9**, 249–256 (2010).

**24. **Y. Zhou, Y. Wei, F. Hu, J. Hu, Y. Zhao, J. Zhang, F. Jiang, and N. Chi, “Comparison of nonlinear equalizers for high-speed visible light communication utilizing silicon substrate phosphorescent white LED,” Opt. Express **28**(2), 2302–2316 (2020). [CrossRef]