## Abstract

Free space optic (FSO) is a type of optical communication where the signal is transmitted in free space instead of fiber cables. Because of this, the signal is subject to different types of impairments that affect its quality. Predicting these impairments help in automatic system diagnosis and building adaptive optical networks. Using machine learning for predicting the signal impairments in optical networks has been extensively covered during the past few years. However, for FSO links, the work is still in its infancy. In this paper, we consider predicting three channel parameters in FSO links that are related to amplified spontaneous emission (ASE) noise, turbulence, and pointing errors. To the best of authors knowledge, this work is the first to consider predicting FSO channel parameters under the effect of more than one impairment. First, we report the performance of predicting the FSO parameters using asynchronous amplitude histogram (AAH) and asynchronous delay-tap sampling (ADTS) histogram features. The results show that ADTS histogram features provide better prediction accuracy. Second, we compare the performance of support vector machine (SVM) regressor and convolutional neural network (CNN) regressor using ADTS histogram features. The results show that CNN regressor outperforms SVM regressor for some cases, while for other cases they have similar performance. Finally, we investigate the capability of CNN regressor for predicting the channel parameters for three different transmission speeds. The results show that the CNN regressor has good performance for predicting the OSNR parameter regardless of the value of transmission speed. However, for the turbulence and pointing errors, the prediction under low speed transmission is more accurate than under high speed transmission.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Optical wireless communication (OWC) techniques have emerged as promising solutions that can provide alternatives to the RF spectrum scarcity challenge, especially for future wireless networks such as 5G and 6G. Free space optic (FSO) and visible light communication (VLC) are two main types of OWC for indoor and outdoor communication applications. Their available bandwidth makes them a good alternative to RF to cope with the newly emerging high bandwidth applications and technologies such as the Internet of things and smart cities [1,2].

In the backhaul portion of the future communication networks, fiber is the only viable solution due to its high bandwidth. In some scenarios, fiber installation is not practical and/or expensive. Therefore, its counterpart, i.e. FSO, is the optimal solution in such scenarios. However, FSO channel is subject to different impairments such as pointing errors and turbulence. Such impairments affect the quality of the received signal and limit the performance of the whole system. Monitoring the channel impairments and then optimizing the network performance is a network design and optimization goal. It provides crucial information about the signal quality and helps in automatic system diagnosis [3].

Recently, machine learning (ML) has become a hot topic in the fiber-based optical communication field because it eliminates the need for developing complex analytical/numerical models for optical communication systems and networks [3,4]. ML covers a wide range of applications, such as classifying the modulation formats [5] and monitoring the performance of the optical fiber network such as monitoring the optical signal-to-noise-ratio (OSNR) and the chromatic dispersion [6]. For FSO systems, some applications of ML are reported such as signals detection [7], decoding [8], demodulation [9], and system end-to-end modeling [10]. However, the use of ML for monitoring the signal impairments in FSO systems is still in its infancy.

Since FSO has its own impairments that differ from fiber impairments, it requires using appropriate techniques or algorithms to monitor its impairments. To the best of our knowledge, using ML to monitor an FSO channel impairment in the presence of other (more than one) different impairments has not been yet reported in the literature. Most of the work reported in the literature is related to turbulence impairment compensation in orbital angular momentum (OAM) communication systems, which is a technique for signals multiplexing. In [11] and [12], the authors used computer simulation to investigate monitoring the atmospheric turbulence impairment using convolutional neural network (CNN) in order to compensate it and improve the system performance. The turbulence effect was estimated by processing the modes’ intensity using charge-coupled device (CCD) camera’s output images. Similarly, in [13–15], CNN based methods are used for atmospheric turbulence compensation in OAM communication systems. These methods are able to predict the amount of turbulent phase screens from the intensity distribution of the distorted vortex beams. The obtained information is then used to correct the phase distortion of the multiplexed vortex beams. In [16], the authors experimentally investigated the prediction of visibility range as a parameter to determine the severity of dust storm on the FSO system in a dusty channel environment using different algorithms, including support vector machine (SVM), CNN, and k-nearest neighbor algorithms.

In this work, we apply ML techniques to monitor three different impairments in FSO systems: amplified spontaneous emission (ASE) noise, turbulence, and pointing errors. A parameter defining each impairment is given, and the ML is used to predict its values. We investigate predicting each parameter when the signal is affected by the three impairments collectively. The main contribution of this work is as follows:

- 1. Studying the prediction of three different parameters of FSO channels using ML;
- 2. Investigating the prediction accuracy using asynchronous amplitude histogram (AAH) and asynchronous delay-tap sampling (ADTS) features;
- 3. Comparing the performance of SVM and CNN regressors for predicting parameters of FSO channels;
- 4. Investigating the accuracy of predicting these parameters for different baud rates.

## 2. System and channel model

#### 2.1 FSO model

In this work, we consider an intensity modulation/direct detection (IM/DD) FSO communication system. Such a model supports the transmission of amplitude modulation formats that do not have phase information such as ON-OFF keying (OOK) and pulse position modulation (PPM) [17]. The optical beam at the transmitter side is modulated by the user data. Then, the generated optical signal is transmitted over a turbulent channel that causes irradiance fluctuations called scintillation. The output photocurrent of the receiver is related to the incident optical power by the responsivity parameter $\mathcal {R}$. Assuming clear weather with no atmospheric losses, the received signal is well modeled by [18]

where $x$ is the transmitted signal intensity, and $n$ is an additive white Gaussian noise. The channel state $h_l$ arises due to atmospheric turbulence, and the channel state $h_p$ arises due to pointing errors and geometric path loss.Atmospheric turbulence-induced scintillation is due to small temperature variation leading to fluctuations in the index of refraction. Scintillation index is proportional to the Rytov variance which is commonly used as a measure of the turbulence strength and is defined by [2]

where $C_n^2$ is the index of refraction structure parameter varying from 10$^{-17}$ for weak turbulence to 10$^{-13}$ for strong turbulence [19], $k=2\pi /\lambda$ is the optical wave number, $L$ is the transmission link length, and $\lambda$ is the signal wavelength. Weak turbulence is associated with $\sigma _1^2<1$ and strong turbulence with $\sigma _1^2>1$. The irradiance distribution under weak to strong turbulence is well modeled by Gamma-Gamma probability distribution function which is given by [20]It is worthy to mention that the amount of fading due to turbulence increases as the transmission link length increases, leading to a lower system performance [21].

Pointing accuracy in FSO links is an important factor due to FSO line-of-sight requirement. Wind load, in addition to the sun heating of buildings, causes random building sway, which in turn causes random pointing errors. For a Gaussian beam, with receiver aperture size radius $a$ and received beam waist $w_z$, the channel state due to pointing errors and geometric loss is given by [18]

where $r$ is the radial displacement between the receiver aperture center and the received beam spot center, $A_0=[\textrm {erf}(\nu )]^2$ is the fraction of the collected power at $r$=0, $\nu =(\sqrt {\pi }a)/(\sqrt {2}w_z)$ and $w^2_{z_{eq}}=w_z^2\frac {\sqrt {\pi }\textrm {erf}(\nu )}{2\nu \exp {(-\nu ^2)}}$ is the equivalent beamwidth. The radial displacement $r$ at the receiver is modeled by Rayleigh distribution [18] and defined by where $\sigma _s^2$ is the jitter variance at the receiver. The pointing error severity is determined by the parameter $\zeta =w^2_{z_{eq}}/2\sigma _s$ with values between 0.5 and 5. For high jitter variance, the parameter $\zeta \xrightarrow {}0$. Note that the geometric path loss introduces signal power attenuation because of beam spreading. This power loss depends on some system parameters such as path length, beam divergence angle, and receiver radius size, which are defined in Table 1. In Section 4, the prediction of the three parameters: OSNR, $C_n^2$, and $\zeta$ which represent the severity of the ASE noise, turbulence, and pointing errors, respectively, will be investigated using ML algorithms.#### 2.2 ML-based regressor

- 1. Support vector machine (SVM)
SVM is one of the most widely used ML-based algorithms. It can be used for both classification and regression problems. For a set of training points ${(\textbf {x}_{1}, y_{1}), (\textbf {x}_{2}, y_{2}),\ldots ., (\textbf {x}_{\textrm {N}}, y_{N})}$, where $\textbf {x}_{n} \subset \textbf {R}^{\textrm {D}}$ is a feature vector and $y_{n} \subset R^{1}$ is a target output, the linear fitting function is given as [22]

where $\textbf {w} \subset \textbf {R}^{\textrm {D}}$ and $b$ are the regression model weight and bias parameters, respectively, and $\langle ,\rangle$ denotes the dot product. The idea of SVM regression is to find the function, with at most $\epsilon$ deviation from the targets, that gives the best fit. This is achieved considering some errors (i.e. noise in the training data) which can be expressed as slack $\xi _{n},\xi _{n}^*$ variables, as shown in Fig. 1. To find the optimal values of $\textbf {w}$, the following optimization problem is considered [22,23]$$\begin{aligned} \textrm{Minimize } \frac{1}{2}\parallel \textbf{w}^{2}\parallel{+}\chi\sum_{n=1}^N(\xi_n+\xi_n^*), \\ \textrm{Subject to } \begin{cases} y_n- \langle \textbf{w},\textbf{x}_n\rangle-b \leq\epsilon+\xi_n \\ \langle \textbf{w},\textbf{x}_n\rangle+b-y_n \leq\epsilon+\xi_n^* \\ \xi_n,\xi_n^*\geq0 \end{cases} \end{aligned}$$where $\chi$ is a regularization parameter, always greater than 0. This optimization problem can be solved using Lagrange equation [22], where the optimal $\textbf {w}$ can be obtained in terms of the training samples $\textbf {x}_n$ and Largrange multiplier pairs $\alpha _n, \alpha ^*_n$, as followsHence, the approximate fitting function is given by

The term $\langle \textbf {x}_n,\textbf {x}\rangle$ represents the linear kernel function.

To use SVM in FSO-based monitoring, the mathematical model is constructed from a training dataset that involves AAH and ADTS features. In this work, a linear kernel is used and the value of $\chi$ is set to 1, However, the value of $\epsilon$ was computed as $IQR(y_n)/13.49$, where $IQR(y_n)$ is the interquartile range of the target vector $y_n$ [24,25]. The sequential minimal optimization (SMO) algorithm is utilized for solving the regression optimization problem.

CNN is a type of deep neural networks that uses forward and backward propagation in order to achieve classification or regression. The forward propagation can be implemented using convolutional and pooling processes. The convolutional process performs dot product multiplication between the input image and an arbitrary number of filters that have learnable weights. The pooling process is used to extract distinct features of the input by down-sampling operators. In addition, an activation function (ReLU) is applied in order to introduce non-linearity into the network. These processes/functions are cascaded together to build the regressor. In our work, we have considered different CNN parameters/hyperparameters and studied their effects on the model performance. Then, we selected the best settings for these parameters/hyperparameters, which are the attributes of the CNN reported in Fig. 2.

- 2. Convolutional nueral network (CNN) Backpropagation is used to adjust the free parameters (i.e., the biases and weights) in order to attain the desired network output. Backpropagation calculates the gradient of the loss function in order to minimize errors that affect the performance. To minimize the loss function, mean square error (MSE) between the output $d_o$ and target $d_t$ is used. The loss function (MSE) is defined as where $N$ is the size of the dataset. The CNN is trained using adaptive moment (Adam) optimizer [26] with the learning rate of $3 \times 10^{-5}$. The maximum number of epoch is set to 150 and the batch size is 32.

## 3. Simulation setup description

In this work, we exploit the *VPITransmissionMaker* 10.1 simulator for datasets generation according to the system described in Fig. 3 and the typical simulation parameters listed in Table 1 [18]. In this setup, a laser diode (LD) source that generates a Gaussian light beam at 1550 nm wavelength is applied to a Mach-Zander modulator (MZM) to convert the random generated binary OOK electrical signal (user data) into an optical signal. The signal is then passed through an FSO channel emulator, which includes the effect of turbulence, and pointing errors and geometric loss that are defined in (3) and (6), respectively. After that, an ASE noise is added using an erbium-doped fiber amplifier (EDFA) where the noise power is controlled by a variable optical attenuator (VOA) to define different values for the OSNR parameter. The OSNR parameter is defined as the ratio of the signal power to the amplified spontaneous emission (ASE) noise power that is generated by the EDFA. Finally, the signal is filtered out using an optical bandpass filter (BPF) and then detected at the receiver side using a PIN photodiode. For the purpose of building a dataset for ML algorithm training and testing, two types of features are exploited in this work; AAH and ADTS histogram. These features well represent the monitored parameters, and lead to the need for smaller training datasets compared to the datasets utilizing direct raw data. In the following, the generation of both features will be discussed.

#### 3.1 AAH features

In AAH features, the amplitude of the electrical signal at the output of the PIN is sampled asynchuronsly using one sampler with 500 Msample/sec speed, where 8192 electrical amplitude samples are collected to represent one realization. The sampled signal is then used for building 1D AAH features with 100 bins each. These features are used for training an SVM regressor to predict the channel parameters. Figure 4 shows some examples of AAH features under different impairments.

#### 3.2 ADTS histogram features

In ADTS features, the output electrical signal of the PIN is split into two arms, as shown in Fig. 3, where a physical delay of 25 psec is introduced to one arm before the sampling. Then the signal and its copy are sampled asynchronously at 500 Msample/sec low speed sampling rate, where 8192 electrical amplitude samples are collected to represent one realization. The output of the two samplers is a pair of the signal amplitude ($p_i$, $q_i$) with a time delay between them called tap delay. These pairs are used to create ADTS features which are 2D histogram and used for training and testing a regression algorithm [27]. Because this sampling method uses the tap delay, it is called ADTS, and has the advantages of reducing the sampling speed and eliminating the requirement for timing/clock information [28]. In this work, these ADTS feature are used for training SVM and CNN regressors. For SVM regressor, the pairs of signal amplitude are used to build a 2D 80$\times$80 histogram, the entries of which are concatenated in the form of a vector at the input of the regressor. For CNN regressor, the 2D histogram generated from the pairs of signal amplitude is used as an input during training and testing sessions. CNN is a type of artificial neural networks that uses the convolution operation to process inputs with high dimensionality such as images. Figure 5 shows some examples of ADTS features under different impairments.The severity of the channel is varied according to the values reported in Table 1. The parameters of datasets include: 10 OSNR parameter values ranging from 5- to 23-dB with 2 dB step size, 10 $\zeta$ parameter values from 0.5 to 5 with 0.5 step size, and a list of 10 values for $C_n^2$ parameter that includes 10$^{-17}, 5 \times 10^{-17}$, 10$^{-16}$, 5$\times 10^{-16}$, 10$^{-15}$, 5$\times 10^{-15}$, 10$^{-14}$, 5$\times 10^{-14}$, 10$^{-13}$, and 5$\times 10^{-13}$. Note that decreasing the step size will not noticeably change the prediction accuracy of the regressor. For each parameter value, 200 independent realizations are randomly generated, where each includes 8192 electrical amplitude samples. Hence, 2000 realizations (10 values $\times$ 200) are employed for monitoring each parameter. For ML training, 70% of the dataset values are used, while the remaining 30% is used for testing. Therefore, all parameter’s values are equally represented in terms of number of realizations used for both training and testing. The prediction results are computed from 30 independent runs. To evaluate the prediction accuracy of a regressor for the channel parameters, $R^2$ metric is considered [29]. It is a square of autocorrelation coefficient $R$. The $R^2$ metric is defined as [30]

where $n$ is the number of values in the test sample, $y_i$ is the actual input value to the regressor and $f_i$ is the corresponding predicted value, and $\bar y =\frac {1}{n}\sum _{i=1}^n y_i$ is the sample mean. This metric takes values between 0 and 1. If $R^2\xrightarrow {}1$, this means the model is accurately predicting the target value. However, when $R^2\xrightarrow {}0$, this means poor prediction.## 4. Results and discussion

#### 4.1 Monitoring channel parameters using SVM regressor with AAH and ADTS features

This subsection studies the performance of SVM regressor using AAH and ADTS histogram features. Since SVM regressor accepts 1D signals, the 2D ADTS histogram features are first converted into a vector for the purpose of regressor training and testing. The goal is to understand which features are more powerful for monitoring the parameters of the FSO channel.We investigate the prediction accuracy of each parameter when the other two impairments are present. The investigation is performed for three different values of each parameter; low (OSNR = 5 dB, $C_n^2=10^{-17}$ m$^{-2/3}$, and $\zeta$ = 0.5), medium (OSNR = 13 dB, $C_n^2=10^{-15}$ m$^{-2/3}$, and $\zeta$ = 2.5), and high (OSNR = 23 dB, $C_n^2=10^{-13}$ m$^{-2/3}$, and $\zeta$ = 5). The results of prediction accuracy in Fig. 6(a)-(b) show that the performance of SVM regressor for predicting the $C_n^2$ and $\zeta$ parameters using AAH and ADTS histogram features is comparable, i.e. SVM performance under both features is almost the same. However, the performance of SVM regressor for predicting the OSNR parameter using ADTS histogram features is better than that using AAH features for most of the cases, as illustrated in Fig. 6(c). To have a deeper look at these results, we show in Fig. 7 the OSNR parameter prediction accuracy for three cases: $C_n^2=10^{-13}$ m$^{-2/3}$ and $\zeta$ = 0.5 (left column), $C_n^2=10^{-13}$ m$^{-2/3}$ and $\zeta$ = 2.5, (middle column), and $C_n^2=10^{-13}$ m$^{-2/3}$ and $\zeta$ = 5 (right column). The first row in Fig. 7 shows graphically the prediction results using the boxplot method. A box with a narrow spread represents better prediction accuracy. The $x$-axis represents the true value of the OSNR parameter in decibel while the y-axis represents the predicted value. We notice from the results in Fig. 7(a)-(c) that the SVM regressor using ADTS histogram features is able to predict the OSNR parameter with high accuracy where $R^2\geq$ 0.98. In the boxplot, boxes with smaller height indicate higher prediction accuracy. Note that the high prediction accuracy of the SVM regressor using ADTS histogram features are due to the separability of these features.This separability is illustrated in Fig. 7(d)-(f) using the t-distribution stochastic neighbor embedding (t-SNE) algorithm. This algorithm is used to reduce the data dimensionality of input features vectors from 6400 to 2, while preserving both local and global structure of data, hence it facilitates its visual inspection. That is, the t-SNE algorithm visualizes the data in 2D space, and its results are used as an indicator of separability of features. A detailed description of the t-SNE algorithm is reported in [31]. Note that the t-SNE separability depends on the type of the utilized features such as AAH or ADTS, while the quality of prediction of each parameter depends on the utilized predictor. In Fig. 7(d)-(f) we see that all features pertaining to the 10 values of the OSNR parameter are totally separable with no overlap. Therefore, the SVM regressor using ADTS histogram features would be able to predict the OSNR parameter with high accuracy.

In Fig. 7(g)-(i) we show the boxplot for the OSNR parameter prediction using SVM regressor with AAH features. The performance is poor especially for the worst channel condition described by high pointing errors and turbulence in Fig. 7(g). In this figure, the prediction accuracy becomes worse for the high OSNR values because these values have AAH features that are highly correlated. This difficulty in predicting the OSNR parameter using AAH features is explained using the t-SNE algorithm’s results in Fig. 7(j). This figure clearly shows that the AAH features of high OSNR parameter values overlap. However, the difficulty of predicting the OSNR parameter reduces as the channel conditions improve, as shown in Fig. 7(h)-(i), and the corresponding separability results in Fig. 7(k)-(l).

#### 4.2 Monitoring channel parameters using 2D ADTS histogram features

The results in the previous subsection showed that the ADTS histogram features have improved the performance of the SVM regressor when compared with that of the AAH features. In this subsection, we consider ADTS histogram features as an input for training two types of regressors; SVM and CNN. The goal is to study the capability of these two regressors in predicting the FSO channel parameters. The results of the prediction accuracy of the three parameters are illustrated in Fig. 8. For the OSNR parameter, Fig. 8(a) shows that the prediction accuracy of both regressors have comparable performance ($R^2\geq$0.98) with superiority of CNN regressor in some cases, where CNN regressor achieved $\sim$ 1% improvement in performance. For the $C_n^2$ parameter, the results in Fig. 8(b) indicate that the performance of CNN regressor has slightly higher prediction accuracy in range of $\sim$ 1% $- \sim$ 4%. The minimum achieved prediction accuracy under harsh channel condition defined by OSNR= 5 dB is $\sim$ 0.73 and $\sim$ 0.75 for SVM and CNN regressors, respectively. The results of predicting the $\zeta$ parameter is reported in Fig. 8(c) where the performance of both regressors is comparable except for harsh channel condition defined by OSNR= 5 dB and $C_n^2 = 10^{-13}$– 10$^{-15} m^{-2/3}$ , where the SVM regressor obtained $\sim$5% better accuracy than that of CNN regressor. It is clear from the previous discussion that the SVM regressor has good performance, although the training feature was originally 2D which is more suitable for CNN regressor than SVM regressor. It is worthy to mention that the average time to predict the impairment value, using a machine equipped with an AMD 1950X processor and 32 GB memory, for SVM is 29.8 $\mu$sec. For CNN, the average time is found to be 270.8 $\mu$sec. That is, the average time taken by the CNN is approximately 9 times more than that taken by the SVM, which is an advantage for the SVM regressor.

#### 4.3 Monitoring channel parameters with different baud rates and using ADTS histogram features

In this subsection, we investigate the performance of CNN regressor for monitoring the three channel parameters for different transmission rates; 1 G bps, 10 Gbps, and 40 Gbps, using ADTS histogram features. It is obvious that as the transmission data rate increases, the signal becomes more affected by the channel impairments which degrades the regressor’s prediction accuracy. Therefore, the 40 Gbps transmission signal is the most distorted while the 1 Gbps transmission signal is the lowest distorted. However, the results in Fig. 9(a)-(c) show that the prediction accuracy of OSNR parameter as a function of the $C_n^2$ and $\zeta$ parameters are comparable for all transmission speeds with accuracy $\geq$ 0.98.

Figure 9(d)-(f) presents the prediction accuracy of the $\zeta$ parameter as a function of different values of the OSNR parameter and $C_n^2$ parameter. For harsh channel that is subject to high ASE noise (OSNR = 5 dB) and turbulence ($C_n^2$ = 10$^{-13}$), the prediction accuracy drops to 0.92, 0.86, and 0.84 for 1 Gbps, 10 Gbp, and 40 Gbps transmission speed, respectively. As the channel conditions improve, the prediction accuracy increases to > 0.99 for light channel conditions. Finally, in Fig. 9(g)-(i), we present the results of the $C_n^2$ parameter prediction accuracy as a function of different values for the OSNR parameter and $\zeta$ parameter. The results show that for the harsh channel conditions described with OSNR = 5 dB, the prediction accuracy of the $C_n^2$ parameter drops to 0.27 for 40 Gbps transmission speed. For 1 Gbps and 10 Gbps transmission speeds, the prediction accuracy under the same conditions is 0.94 and 0.76, respectively. These results reflect the effect of harsh environment on high speed transmission signals. In addition, we notice from Fig. 9(g)-(i) that for the 40 Gbps tranmission speed, the prediction accuracy for the $C_n^2$ parameter with OSNR = 5 dB has dropped when the channel conditions improved in terms of the pointing errors from $\zeta$ =0.5 in Fig. 9(g) to $\zeta$ =0.25 and 0.5 in Figs. 9(h) and (i). This is because under these channel conditions, the ADTS histogram features of the 10 points of $C_n^2$ are very corrlelated and their t-SNE plot is overlapped, as shown in Fig. 10, which degrades the CNN regressor capability to predict the $C_n^2$ parameter with high accuracy.

#### 4.4 Comparing ML with non-ML methods

It is relevant here to investigate the performance of ML monitoring methods such as SVM regressor with non-ML monitoring approaches. For that purpose, we have selected a non-machine learning method which uses signal’s Q-factor as a monitoring parameter. This method is one of the most commonly used non-machine learning methods in the literature. More details about this method could be found elsewhere [32,33]. Figure 11 shows the performance of SVM regressor using ADTS features and the non-ML method for predicting the OSNR parameter under the same impairments defined in Fig. 7. It is clear that the ML based regressor has better performance in terms of the prediction accuracy metric ($R^2$) than that of the non-ML method, especially for severe channel conditions defined with $\zeta =0.5$ and $C_n^2$ = 10$^{-13}$.

## 5. Conclusion

In this work, we investigated the prediction of different types of FSO channel impairments using ML algorithms. The results showed the capability of ML in predicting three types of impairments under moderate to light channel conditions. Moreover, it has been found that ML regressors that use ADTS histogram features provide better prediction accuracy than those utilizing AAH features. Results also show that the prediction accuracy of $C_n^2$ parameter drops to very low values under high transmission speeds, i.e. 40 Gbps, and harsh channel conditions defined by OSNR= 5 dB. However, for moderate channel conditions (i.e. OSNR=13 dB), the accuracy is generally > 0.8 and approaches 1 for light channel conditions (i.e. OSNR=23 dB). Although this work considered three important impairments in FSO links; however, there are other important impairments where predicting their values will help in designing adaptive optical communication systems. This includes impairments caused, for example, by fog, dust, and rain. Moreover, signal format classification under FSO channel impairments is another area to be considered in future work.

## Funding

Deputyship for Research & Innovation, Ministry of Education – Saudia Arabia (DRI-KSU-1088).

## Disclosures

The authors declare that there are no conflicts of interest related to this article.

## References

**1. **M. Z. Chowdhury, M. T. Hossan, A. Islam, and Y. M. Jang, “A comparative survey of optical wireless technologies: Architectures and applications,” IEEE Access **6**, 9819–9840 (2018). [CrossRef]

**2. **M. A. Khalighi and M. Uysal, “Survey on free space optical communication: A communication theory perspective,” IEEE Commun. Surv. Tutorials **16**(4), 2231–2258 (2014). [CrossRef]

**3. **W. S. Saif, M. A. Esmail, A. M. Ragheb, T. A. Alshawi, and S. A. Alshebeili, “Machine learning techniques for optical performance monitoring and modulation format identification: A survey,” IEEE Commun. Surv. Tutorials **22**(4), 2839–2882 (2020). [CrossRef]

**4. **J. Thrane, J. Wass, M. Piels, J. C. M. Diniz, R. Jones, and D. Zibar, “Machine learning techniques for optical performance monitoring from directly detected PDM-QAM signals,” J. Lightwave Technol. **35**(4), 868–875 (2017). [CrossRef]

**5. **W. S. Saif, A. M. Ragheb, H. E. Seleem, T. A. Alshawi, and S. A. Alshebeili, “Modulation format identification in mode division multiplexed optical networks,” IEEE Access **7**, 156207–156216 (2019). [CrossRef]

**6. **F. Musumeci, C. Rottondi, A. Nag, I. Macaluso, D. Zibar, M. Ruffini, and M. Tornatore, “An overview on application of machine learning techniques in optical networks,” IEEE Commun. Surv. Tutorials **21**(2), 1383–1408 (2019). [CrossRef]

**7. **L. Darwesh and N. S. Kopeika, “Deep learning for improving performance of OOK modulation over FSO turbulent channels,” IEEE Access **8**, 155275–155284 (2020). [CrossRef]

**8. **S. R. Park, L. Cattell, J. M. Nichols, A. Watnik, T. Doster, and G. K. Rohde, “De-multiplexing vortex modes in optical communications using transport-based pattern recognition,” Opt. Express **26**(4), 4004–4022 (2018). [CrossRef]

**9. **Q. Tian, Z. Li, K. Hu, L. Zhu, X. Pan, Q. Zhang, Y. Wang, F. Tian, X. Yin, and X. Xin, “Turbo-coded 16-ary OAM shift keying FSO communication system combining the CNN-based adaptive demodulator,” Opt. Express **26**(21), 27849–27864 (2018). [CrossRef]

**10. **S. Lohani, N. J. Savino, and R. T. Glasser, “Free-space optical ON-OFF keying communications with deep learning,” in * Frontiers in Optics / Laser Science* (Optical Society of America, 2020), p. FTh5E.4.

**11. **J. Li, M. Zhang, D. Wang, S. Wu, and Y. Zhan, “Joint atmospheric turbulence detection and adaptive demodulation technique using the CNN for the OAM-FSO communication,” Opt. Express **26**(8), 10494–10508 (2018). [CrossRef]

**12. **S. Lohani and R. T. Glasser, “Turbulence correction with artificial neural networks,” Opt. Lett. **43**(11), 2611–2614 (2018). [CrossRef]

**13. **Y. Zhai, S. Fu, J. Zhang, X. Liu, H. Zhou, and C. Gao, “Turbulence aberration correction for vector vortex beams using deep neural networks on experimental data,” Opt. Express **28**(5), 7515–7527 (2020). [CrossRef]

**14. **J. Liu, P. Wang, X. Zhang, Y. He, X. Zhou, H. Ye, Y. Li, S. Xu, S. Chen, and D. Fan, “Deep learning based atmospheric turbulence compensation for orbital angular momentum beam distortion and communication,” Opt. Express **27**(12), 16671–16688 (2019). [CrossRef]

**15. **W. Xiong, P. Wang, M. Cheng, J. Liu, Y. He, X. Zhou, J. Xiao, Y. Li, S. Chen, and D. Fan, “Convolutional neural network based atmospheric turbulence compensation for optical orbital angular momentum multiplexing,” J. Lightwave Technol. **38**(7), 1712–1721 (2020). [CrossRef]

**16. **A. Ragheb, W. Saif, A. Trichili, I. Ashry, M. A. Esmail, M. Altamimi, A. Almaiman, E. Altubaishi, B. S. Ooi, M.-S. Alouini, and S. Alshebeili, “Identifying structured light modes in a desert environment using machine learning algorithms,” Opt. Express **28**(7), 9753–9763 (2020). [CrossRef]

**17. **M. M. Ahmed, K. T. Ahmmed, A. Hossan, and M. R. Hossain, “Performance of free space optical communication systems over exponentiated Weibull atmospheric turbulence channel for PPM and its derivatives,” Optik **127**(20), 9647–9657 (2016). [CrossRef]

**18. **A. A. Farid and S. Hranilovic, “Outage capacity optimization for free-space optical links with pointing errors,” J. Lightwave Technol. **25**(7), 1702–1710 (2007). [CrossRef]

**19. **F. Benkhelifa, Z. Rezki, and M. Alouini, “Low SNR capacity of FSO links over gamma-gamma atmospheric turbulence channels,” IEEE Commun. Lett. **17**(6), 1264–1267 (2013). [CrossRef]

**20. **A. Al-Habash, L. C. Andrews, and R. L. Phillips, “Mathematical model for the irradiance probability density function of a laser beam propagating through turbulent media,” Opt. Eng. **40**(8), 1554–1562 (2001). [CrossRef]

**21. **P. K. Sharma, A. Bansal, P. Garg, T. A. Tsiftsis, and R. Barrios, “Performance of FSO links under exponentiated weibull turbulence fading with misalignment errors,” in 2015 IEEE International Conference on Communications (ICC) (2015), pp. 5110–5114.

**22. **V. Vapnik, * The Nature of Statistical Learning Theory* (Springer science & business media, 2013).

**23. **D. Basak, S. Pal, and D. C. Patranabis, “Support vector regression,” Neural Information Processing-Letters and Reviews11, 203–224 (2007).

**24. **R. K. Pandit and D. Infield, “Comparative assessments of binned and support vector regression-based blade pitch curve of a wind turbine for the purpose of condition monitoring,” Int. J. Energy Environ. Eng. **10**(2), 181–188 (2019). [CrossRef]

**25. **V. Cherkassky and Y. Ma, “Practical selection of SVM parameters and noise estimation for SVM regression,” Neural networks **17**(1), 113–126 (2004). [CrossRef]

**26. **I. Goodfellow, Y. Bengio, and A. Courville, * Deep Learning* (The MIT Press, 2016). http://www.deeplearningbook.org.

**27. **X. Fan, Y. Xie, F. Ren, Y. Zhang, X. Huang, W. Chen, T. Zhangsun, and J. Wang, “Joint optical performance monitoring and modulation format/bit-rate identification by CNN-based multi-task learning,” IEEE Photonics J. **10**(5), 1–12 (2018). [CrossRef]

**28. **F. N. Khan, Y. Zhou, Q. Sui, and A. P. T. Lau, “Non-data-aided joint bit-rate and modulation format identification for next-generation heterogeneous optical networks,” Opt. Fiber Technol. **20**(2), 68–74 (2014). [CrossRef]

**29. **T. Mrozek, “Simultaneous monitoring of chromatic dispersion and optical signal to noise ratio in optical network using asynchronous delay tap sampling and convolutional neural network (deep learning),” in 2018 20th International Conference on Transparent Optical Networks (ICTON) (2018), pp. 1–4.

**30. **T. O. Kvalseth, “Note on the R^{2} measure of goodness of fit for nonlinear models,” Bull. Psychon. Soc. **21**(1), 79–80 (1983). [CrossRef]

**31. **W. S. Saif, T. Alshawi, M. A. Esmail, A. Ragheb, and S. Alshebeili, “Separability of histogram based features for optical performance monitoring: An investigation using t-SNE technique,” IEEE Photonics J. **11**(3), 1–12 (2019). [CrossRef]

**32. **Y. Yu, B. Zhang, and C. Yu, “Optical signal to noise ratio monitoring using single channel sampling technique,” Opt. Express **22**(6), 6874–6880 (2014). [CrossRef]

**33. **I. Shake, H. Takara, and S. Kawanishi, “Technology for flexibly monitoring optical signal quality in transparent optical communications [invited],” J. Opt. Netw. **6**(11), 1229–1235 (2007). [CrossRef]