## Abstract

The deep learning-based decoder of polar codes is investigated over free space optical (FSO) turbulence channel for the first time. The feedforward neural networks (NN) are adopted to establish the decoder and some custom layers are designed to train the NN decoder over the turbulence channel. The tanh-based modified log-likelihood ratio (LLR) is proposed as the input of NN decoder, which has faster convergence and better bit error rate (BER) performance compared with the standard LLR input. The simulation results show that the BER performance of NN decoder with tanh-based modified LLR is close to the conventional successive cancellation list (SCL) decoder over the turbulence channel, which verifies that the NN decoder with tanh-based modified LLR can learn the encoding rule of polar codes and the characteristics of turbulence channel. Furthermore, the turbulence-stability is investigated and the trained NN decoder in a fixed turbulence condition also has stable performance in other turbulence conditions.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Recently, deep learning has drawn worldwide attention and achieved astounding results in many fields, such as computer vision, natural language processing and so on [1–2]. In the field of communication, deep learning can be used for channel decoding [3–5]. Compared to the conventional iterative decoding scheme, the one-shot decoding of the neural network (NN) decoder can reduce the latency at receiver.

As one of the novel soft detection forward error correction (FEC) technique in the coding theory community, the polar codes have been proven to achieve the symmetric capacity of an arbitrary binary input discrete memoryless channel under a low complexity successive cancellation decoding scheme [6]. And the polar codes also have been selected as the error correcting coding of the enhanced mobile broadband (eMBB) control channel in the 5th generation (5G) wireless communication system [7]. Although the conventional successive cancellation (SC) decoding method has low computational complexity, it entails high latency for its serial processing mechanism. To solve this problem, the deep learning-based decoding of polar codes has been investigated [8–10]. In [8], the feedforward neural network (NN) decoder is proposed for polar codes, which demonstrates near maximum a posteriori (MAP) bit error rate (BER) performance by learning the full codewords. Furthermore, the sub-blocks of the decoder are replaced by NN-based components and the partitioned NN (PNN) decoder is proposed to overcome the problem of exponential growing complexity in training for long codeword in [9]. The neural successive cancellation (NSC) decoder is also investigated for long codeword in [10]. However, in these research works, only the additive white gaussian noise (AWGN) channel with binary phase shift keying (BPSK) modulation are investigated.

While, in free space optical (FSO) communication system, the fading induced by atmospheric turbulence is the dominant factor degrading the system performance [11]. The channel coding technique can be adopted to combat the turbulence-induced fading [12–15]. In our previous work, the polar codes have been investigated over FSO turbulence channel, which achieves more than 20 dB coding gains in a wide range of turbulence conditions [14]. Furthermore, the polar codes combined with multiple input multiple output (MIMO) space time coding have also been designed and the BER performance with spatially correlated fading is investigated [15]. The conventional successive cancellation list (SCL) decoding method is adopted in our previous work [16].

In this paper, we investigate the deep learning-based decoding of polar codes over FSO turbulence channel for the first time. The feedforward neural network decoder is adopted to replace the conventional SCL decoder. Some custom layers are designed to train the NN decoder. The tanh-based modified LLR is proposed as the input of the NN decoder, which has faster convergence and better BER performance compared with the standard LLR input. The NN decoder with the tanh-based modified LLR can achieve similar BER performance to the SCL decoder in a wide range of turbulence conditions, which proves that the NN decoder learns not only the encoding rule of polar codes, but also the characteristics of turbulence channel model. The turbulence-stability of the NN decoder is investigated, which indicates that once trained well under a fixed turbulence condition, the NN decoder with tanh-based modified LLR also could have satisfied performance under other turbulence conditions.

The rest of this paper is organized as follow. The system model of the NN decoder over FSO turbulence channel is introduced in section 2. The principle of the tanh-based modified LLR is given in section 3. Section 4 demonstrates the performance of the NN decoder, including the training loss, BER performance and turbulence-stability. Section 5 concludes this paper.

## 2. System model

The typical optical wireless communication system based on intensity modulation with direct detection (IM/DD) is adopted in this paper and the corresponding block diagram is shown in Fig. 1. At the transmitter, the user bits are polar encoded firstly. And then the codewords are modulated on the optical carrier and transmitted into the atmospheric turbulence channel. At the receiver, the optical signals are collected by a fiber collimator and converted into electrical signals by a photo detector. After that, the NN decoder is utilized to replace the conventional polar decoder to recover the user bits.

#### 2.1 Turbulence channel model

The fading induced by the atmospheric turbulence and the thermal noise of the receiver are assumed as the two major elements that aggravate system performance. Therefore, the atmospheric turbulence channel can be represented as a discrete time channel, which is given by

where*x(t)*is the transmitting signal and

*y(t)*is the receiving signal. The on-off keying (OOK) modulation is adopted in our system, which is given as $x(t )\in \{{0,1} \}$. The

*n(t)*is the zero-mean Gaussian white noise. The $\eta $ is the detector responsivity of the photodetector, which equals to

*1*for simplicity. The

*I(t)*is the intensity fading induced by atmospheric turbulence, which is a random variable.

The Gamma-Gamma turbulence model is adopted in this paper for emulating a wide range of turbulence conditions [17]. According to this turbulence model, the fluctuation of light radiation through the turbulence channel consists of small- and large-scale effects, which obey the Gamma distribution. And the PDF of the intensity fading *I(t)* is given by

*D*refers to the propagation distance. $C_n^2$ represents the refractive index structure parameter.

Three different turbulence conditions, namely weak, moderate and strong turbulence conditions, are evaluated in this paper, which are classified by Rytov variance $\sigma _R^2$ [18]. The parameters for these turbulence conditions are given in Table 1.

For the Gamma-Gamma distribution of the intensity fading, the scintillation index (SI) is given by

*1*and $E[{{I^2}} ]$ is adopted to determine the received power. The signal noise ratio (SNR) in this paper is defined as

*Eb/N0*, where $Eb = E[{{{({I \cdot x} )}^2}} ]/R$ and $\textrm{N0 = 2}{{\sigma }^\textrm{2}}$.

It is assumed that the channel state information (CSI) is precisely known to the receiver, which is achievable for FSO system since the turbulence conditions vary slowly compared to the data rate [19]. In this assumption, the LLR of the transmitting signal can be given by

#### 2.2 Polar codes

Polar codes rely on the channel polarization phenomenon. Through polar transform, the subchannels tend to either high or low reliability [6]. The information bits are assigned to the subchannels with high reliability and the frozen bits are assigned to the subchannels with low reliability. The generator matrix of *N* length polar codes ${G_N}$ is represented as

*N*bits, including

*K*information bits and

*N-K*frozen bits. The frozen bits are set to zeros.

#### 2.3 Neural Network decoder

The feedforward NN is adopted as the decoder of polar codes over the FSO turbulence channel in this paper, which is demonstrated in Fig. 2. And the fully connected neural network is used. The NN decoder can be essentially considered as a mapper ${\mathbf f}:{{\mathbb R}^N} \to {{\mathbb R}^k}$ of the *(N, k)* polar codes. The whole network consists of an input layer, *Q* hidden layers and an output layer. The input layer has *N* inputs and the output layer has *k* outputs. For each hidden layer *i*, ${n_i}$ inputs and ${m_i}$ outputs perform the mapper ${{\mathbf f}^{(i )}}:{{\mathbb R}^{{n_i}}} \to {{\mathbb R}^{{m_i}}}$. For each neuron, the output *y* depends on its weighted inputs ${{\theta }_\textrm{i}}{\textrm{x}_\textrm{i}}$ and its activation function *g* can be represented as

**v**is the input of NN decoder and ${{\mathbf f}^{(i )}}({\cdot} )$ is the

*i*hidden layer. The ${\mathbf {\theta}}$ is the train parameters of the NN decoder. The ${\mathbf {out}}({\cdot} )$ represents the output layer. For each hidden layer, the rectified linear unit (ReLU) function is adopted as the activation function. And the sigmoid function is adopted as the activation function for output layer.

^{th}In this way, the NN decoder can be established for decoding. The notation ${n^0} - {n^1} - {n^2}$ is used to describe the NN decoder including three hidden layers with ${n^{(i )}}$ nodes in the *i ^{th}* hidden layer. For example, the layout of

*128-64-32*NN decoder for

*(16, 8)*polar codes is demonstrated in Table 2. The number of train parameters in each layer is also given.

The gradient descent optimization methods and the backpropagation algorithm are used to train the NN decoder. In order to make the NN decoder learn the characteristics of the turbulence channel, some custom layers are designed without train parameters, including modulation layer, turbulence layer, AWGN layer, and soft-demodulate layer. The modulation layer performs the OOK modulation to generate the transmitting symbols. In the case of OOK modulation, the bit *0* of the codeword is mapped into symbol *0* and the bit *1* of the codeword is mapped into symbol *1.* In turbulence layer, the random numbers following Gamma-Gamma distribution are generated and multiplied to the transmitting symbols in each epoch of training. In AWGN layer, the random numbers following Gaussian distribution are generated as the noise added to the signals. And the LLR of each signal is calculated in the soft-demodulate layer through Eq. (7). These layers are indispensable for training the NN decoder. And after training the NN decoder assisted by these layers, the NN decoder unit can be used to decode independently. The layout of these layers for the *(16, 8)* polar codes is given in Table 3 together with the number of train parameters in each layer.

The training set consists of all possible ${2^k}$ codewords. For example, *(16, 8)* polar codes have *256* possible codewords and the training set should consist these codewords. In each epoch of training, the random numbers following Gamma-Gamma distribution and the random numbers following Gaussian distribution are generated by turbulence layer and AWGN layer respectively, i.e. these random numbers are different between the two epochs of training. The epoch means that the complete training of neural network is performed once with all data of the training set. The Adam optimizer is used to train the NN decoder. The mean squared error (MSE) is adopted as the loss function in training the network, which is given by

The Keras with Tensorflow as its back-end is adopted in this paper, which helps us to quickly establish and train the neural network.

## 3. Tanh-based modified LLR

#### 3.1 Limit of standard LLR input over turbulence channel

Firstly, the BER performance of *(16, 8)* polar codes over weak turbulence channel is shown in Fig. 3. The standard LLR is used as the input of the NN decoder, which is calculated through Eq. (7). The testing size is *1E + 6*, i.e. *1E + 6* codewords are used to evaluate the BER performance for each SNR point. The batch size is set to *1024*, where the training set is *1024* and each codewords appears four times in the training set. The learn rate of the Adam optimizer is set to *0.001*. The train SNR is 10 dB for weak turbulence channel. The conventional SCL decoding method with *L = 8* is adopted as the baseline. For *128-64-32* NN decoder, the BER performance becomes better with increased the number of epochs. However, even after training 2^20 epochs, the BER performance is still far away from the baseline. When adding the hidden layer, the BER performance can only approach the baseline in the low SNR. More error bits will appear when the SNR is larger than 15 dB. The reason is that the LLR in the high SNR is far from the zero point, which is not conducive to the NN decoder. Therefore, some modifications should be made to the standard LLR.

#### 3.2 Principle of tanh-based modified LLR

The LLR is the logarithmic ratio of the probability that the transmitted signal is predicted to be 0 and 1 after detecting the received signal. The LLR close to the zero point means that the uncertainty of transmitting signal is large. On the contrary, the LLR far from the zero point means that the uncertainty of the transmitted signal is small. The probability distribution of LLR over weak turbulence channel is shown in Fig. 4. From this figure we can find that the LLR is more likely to stay away from the zero points with the increase of SNR since that the uncertainty of the transmitted signal is smaller in the high SNR. However, the NN decoder is not good at processing the input which is away from zero point. Therefore, the method called tanh-based modified LLR is proposed to modify the input of the NN decoder. The principle of this method is introduced as follow.

Before input the LLR into the NN decoder, the tanh function is used to modify the LLR into a certain range, which is given by

where*s*is the scale parameter and

*v*is the modified LLR. After this modification, the range of

*v*is limited into $({\textrm{ - s},s} )$.

To illustrate further, the example of the tanh function with the scale parameter *s* is demonstrated in Fig. 5. When the LLR is close to the zero point, the uncertainty of the corresponding transmitted signal is large. In this case, more information metric should be retained to the NN decoder. Therefore, the modified LLR *v* is close to the original LLR. On the contrary, when the LLR is far from the zero point, the uncertainty of the corresponding transmitted signal is small. In this case, less information metric is enough to represent the estimation of transmitted signal, i.e. a fixed positive number indicates that the transmitted signal is *0* and a fixed negative number indicates that the transmitted signal is 1. The modified LLR *v* is approaching to *s* or *-s* when the transmitted signal is *0* or *1* respectively in high SNR.

For training the NN decoder with tanh-based modified LLR, a new layer of tanh-based modified should be added after the soft-demodulator layer, which has already demonstrated in Fig. 2.

## 4. Results and discussion

In this section, the performance of NN decoder with tanh-based modified LLR over wide range of turbulence conditions is investigated through Monte Carlo simulation. The *(16, 8)* polar codes are adopted in simulation.

#### 4.1 The comparison of training loss

The training loss of the NN decoder with and without tanh-based modified LLR are compared in Fig. 6. The *128-64-32* NN decoder is adopted. The train parameters are the same with the case in section 3.1. The scale parameter s is set to *10*. From the Fig. 6 we can find that the NN decoder with tanh-based modified LLR has faster training convergence compared with the case of standard LLR input.

#### 4.2 BER performance of NN decoder

The BER performance of NN decoder with tanh-based modified LLR over different turbulence conditions is demonstrated in Fig. 7. The train SNRs are set to 10dB, 15dB and 20dB for the weak, moderate and strong turbulence conditions respectively. The scale parameter *s* is set to 10 in these cases. From the result we can find that the BER performance becomes better with the increase of epochs. In weak turbulence condition, the BER performance is close to the baseline after *2^14* epochs training. Compared with the NN decoder without tanh-based modified LLR in Fig. 3, the NN decoder with tanh-based modified LLR has lower BER with the same epochs. Furthermore, the BER performance of the NN decoder with tanh-based modified LLR that trained in the SNR of 10 dB is also close to the baseline in the high SNR. In moderate and strong turbulence conditions, the BER performance are both close to the baseline with 2^20 epochs training.

#### 4.3 Turbulence-stability of NN decoder

In section 4.2, the NN decoder is trained and evaluated in the same turbulence condition. The BER performance of a trained NN decoder in the other turbulence condition is investigated here, which is called turbulence-stability of NN decoder. The three *128-64-32* NN decoders trained over weak, moderate and strong turbulence conditions are evaluated respectively. The training epochs of these three NN decoders are *2^20*. In weak turbulence condition of Fig. 8(a), the performance of NN decoder trained over moderate and strong turbulence conditions are both close to the baseline. And the same performance in the moderate and strong turbulence conditions are demonstrated in Figs. 8(b) and 8(c) respectively. Therefore, we can conclude that if the NN decoder with tanh-based modified LLR is trained in a fixed turbulence condition, it can also perform approaching to the baseline in the other turbulence conditions.

## 5. Conclusion

In this paper, the NN decoder with tanh-based modified LLR has faster convergence and better BER performance compared with the standard LLR over FSO turbulence channel. The BER performance of NN decoder with tanh-based modified LLR is close to the conventional SCL decoder over turbulence channel. And the trained NN decoder has stable performance in a wide range of turbulence conditions. The codelength of polar codes is limit as 16. The long codelength will increase the complexity of NN decoder. And our future work will investigate the case of longer codelength in low complexity over the FSO turbulence channel.

## Funding

National Nature Science Fund of China (61221001, 61271216, 61431009, 61775137); National “863” Hi-tech Project of China; Natural Science Foundation of Zhejiang Province Grants (LY20F050004).

## Disclosures

The authors declare no conflicts of interest

## References

**1. **A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,” Computational intelligence and neuroscience, (2018).

**2. **T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing [review article],” IEEE Comput. Intell. Mag. **13**(3), 55–75 (2018). [CrossRef]

**3. **T. Wang, C. K. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deep learning for wireless physical layer: Opportunities and challenges,” China Commun. **14**(11), 92–111 (2017). [CrossRef]

**4. **J. Bruck and M. Blaum, “Neural networks, error-correcting codes, and polynomials over the binary n-cube,” IEEE Trans. Inf. Theory **35**(5), 976–987 (1989). [CrossRef]

**5. **E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y. B. ery, “Deep learning methods for improved decoding of linear codes,” IEEE J. Sel. Top. Signal Process. **12**(1), 119–131 (2018). [CrossRef]

**6. **E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory **55**(7), 3051–3073 (2009). [CrossRef]

**7. **3GPP TS 38.212 Technical Specification Group Radio Access Network, NR, Multiplexing and Channel Coding, 2017.

**8. **T. Gruber, S. Cammerer, J. Hoydis, and S. t. Brind, “On deep learning-based channel decoding,” in Proc. Annual Conference on Information Sciences and Systems 2017 IEEE 51st1–6 (2017).

**9. **S. Cammerer, T. Gruber, J. Hoydis, and S. t. Brind, “Scaling deep learning-based decoding of polar codes via partitioning,” in Proc. IEEE Global Commun. Conf. (GLOBECOM)1–6 (2017).

**10. **N. Doan, S. A. Hashemi, and W. J. Gross, “Neural successive cancellation decoding of polar codes,” in Proc. International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) 2018 IEEE 19th1–5 (2018).

**11. **M. A. Khalighi and M. Uysal, “Survey on free space optical communication: A communication theory perspective,” IEEE Commun. Surv. Tutorials **16**(4), 2231–2258 (2014). [CrossRef]

**12. **J. Li and M. Uysal, “Optical wireless communications: system model, capacity and coding,” in Proc. VTC 2003- Fall Vehicular Technology Conference 2003 IEEE 58th, 168–172 (2003).

**13. **I. B. Djordjevic, “LDPC-coded MIMO optical communication over the atmospheric turbulence channel using Q-ary pulse-position modulation,” Opt. Express **15**(16), 10026–10032 (2007). [CrossRef]

**14. **J. Fang, M. Bi, S. Xiao, G. Yang, C. Li, Y. Zhang, T. Huang, and W. Hu, “Performance investigation of the polar coded FSO communication system over turbulence channel,” Appl. Opt. **57**(25), 7378–7384 (2018). [CrossRef]

**15. **J. Fang, M. Bi, S. Xiao, G. Yang, L. Liu, Y. Zhang, and W. Hu, “Polar-coded MIMO FSO communication system over gamma-gamma turbulence channel with spatially correlated fading,” J. Opt. Commun. Netw. **10**(11), 915–923 (2018). [CrossRef]

**16. **I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf. Theory **61**(5), 2213–2226 (2015). [CrossRef]

**17. **L. C. Andrews and R. L. Phillips, * Laser beam propagation through random media*, (SPIE Press, 1998).

**18. **Z. Ghassemlooy, W. Popoola, and S. Rajbhandari, * Optical Wireless Communications: System and Channel Modelling with MATLAB* (CRC Press, 2012).

**19. **X. Zhu and J. Kahn, “Free-space optical communication through atomospheric turbulence channels,” IEEE Trans. Commun. **50**(8), 1293–1300 (2002). [CrossRef]