Classifying beams carrying orbital angular momentum with machine learning: tutorial

Svetlana Avramov-Zamurovic; Joel M. Esposito; Charles Nelson

doi:10.1364/JOSAA.474611

1. INTRODUCTION

In recent years, several researchers have built reliable optical communication systems consisting of a laser source, modulated to produce structured light that encodes the symbols of an alphabet; propagated through underwater or atmospheric optical turbulence; recorded via a camera at the reception site; and decoded (see Figs. 1, 2, and 4). Designing such systems has been the focus of sustained research for decades, but more recently, the integration of machine learning and artificial intelligence (ML/AI) techniques at the receiver to improve symbol classification accuracy, especially when using large alphabets in turbulent or attenuating environments, has invigorated the field. See Section 2 for a detailed reference overview and Fig. 4 for an illustration of some of the challenges posed by turbulent environments, specifically the ability of ML/AI to decode a distorted image.

Fig. 1. Conceptual structured light communication system.

Download Full Size | PDF

Fig. 2. Generic experimental design setup for an in-lab communication system study. A spatial light modualtor (SLM) is used to generate structured light.

Download Full Size | PDF

In this tutorial, we seek to illustrate the scientific, algorithmic, and practical engineering considerations in building systems that utilize beams carrying orbital angular momentum (OAM). We focus on light that carries OAM because it is possible to encode information by changing the rate at which photons spin along the propagation length of one wavelength. In particular, beams that carry OAM are of specific interest for communication systems due to their orthogonality properties, which allows beam designers to superimpose $N$ basis beams to create an alphabet of ${2^N}$ symbols, thereby increasing data rates $N$-fold (Fig. 4, left). Since each beam propagates independently, we can use the constructive and destructive interference intensity patterns to distinguish among the messages, even in the case when we simultaneously transmit a number of beams. It is important to underline that the construction of the beams is only dependent on the phase changes of the transmitted beam, and at the reception, we only need to record its intensity to successfully decode the message. ML/AI algorithms are effective tools when classifying received distorted images (Fig. 4, right). Additionally, we explain how to generate Laguerre–Gaussian (LG) beams, the most commonly used structured light, discuss propagation effects in air and water subject to optical turbulence, and list a number of possible engineering considerations when capturing the images at the receiver.

The tutorial is organized as follows. Section 2 provides a non-exhaustive overview of closely related work, highlighting some competing approaches. Section 3 discusses the experimental design, beginning with generating structured light. Section 4 discusses convolutional neural networks (CNN), an ML/AI technique frequently used for symbol classification that has proved effective in many other image recognition applications. Section 5 recaps important points and describes some important open problems. Appendices A and B cover useful practical guidelines and best practices and provide a link to our shared real-world dataset and code that can be used to illustrate the techniques presented in this tutorial.

2. RELATED WORK ON ML/AI AND LASER LIGHT CARRYING OAM

AI/ML-based classification of beams carrying OAM is a relatively young field of research in optics, but it has already demonstrated a significant impact on a number of selected areas. The objective of this section is to provide a mere sampling of the broad spectrum of possibilities and applications where beams carrying OAM are used in conjunction with ML/AI algorithms. It is not meant as an exhaustive survey. To that end, Sections 2.A and 2.B highlight a number of new and fundamental research areas in the field focusing only on the ML/AI-based classification of beams carrying OAM. By giving a few relevant details pertaining to the aforementioned references, our intention is simply to preserve the depth of their research. In that vein, we encourage the interested reader to use the given references as a starting point in their own exploration of this exciting new field of optics. Note that many authors do not report the detailed conditions under which their research has been conducted, making it difficult to compare the various works using common metrics, such as accuracy or classification speed. In Appendix B and Section 3, we provide some suggestions for standardizing shared datasets and reporting results.

As depicted in Figs. 1, 2, and 4, a key challenge for symbol classification algorithms is to overcome the distortion induced by optical turbulence in the random media through which the beam propagates through. We explicitly call out three distinct approaches to generating such distortions: simulation, hybrid-experimental, and full-experimental. The simulation approach implements purely mathematical models of optical turbulence (e.g., Kolmogorov, Nikishov, von Kármán, etc. spectra) and uses a numerical solver or a physics-based simulation package to create synthetic distortions of the ideal beam front images. In the simulation approach, no physical laser light is generated. In a hybrid-experimental approach, actual laser beams are generated, and real images are captured at the receiver. However, the beams are propagated through a series of phase screens, generated by a spatial light modulator or spiral phase plates, engineered to induce a desired level of synthetic optical turbulence. The plates or phase screens are randomly drawn from the same mathematical spectral models as in the simulation approach. Finally, the full-experimental approach involves propagating the beam through a complex environment that induces optical turbulence via some in natura mechanism. Here, possibilities include field tests in uncontrolled, realistic environments as well as laboratory experiments where optical turbulence is generated along the propagation path in a controlled fashion using heating elements. Each method has its strengths and drawbacks. For example, simulation and hybrid-experimental approaches provide full control and repeatability of the environmental parameters, while it can be challenging to accurately control and characterize the environmental parameters using the full-experimental approach. That said, the full-experimental approach obviously provides a more realistic and random environment for the beam propagation.

For ML/AI-based applications we strongly advocate for the full-experimental approach. The ultimate motivation for applying ML/AI models to optical communication systems is to achieve good classification accuracy in real environments. It is imperative, therefore, that ML/AI systems be trained on data from real environments. Yet both the simulation and hybrid-experimental approaches rely on underlying mathematical models of optical turbulence which, like all models, contain embedded assumptions and only approximate underlying physical phenomena. Additionally, when simulations are utilized in order to verify the error rates at practically required levels, significant resources are necessary in terms of computational power and memory storage (see Appendix A). Worse yet, the numerical solver or, in the case of the hybrid-experimental approach, construction of the SLM screens may impart approximations and artifacts. In addition, these models are only accurate in a certain range of operating conditions. Therefore, by definition, synthetic datasets will never include images captured outside of those conditions. The power of deep ML/AI models lies in their expressiveness. However, there is a real danger that they can learn these induced inaccuracies, limitations, and artifacts, making them unlikely to generalize well to realistic environments. The use of realistic, diverse, and representative training datasets is a fundamental principle in machine learning.

Section 2.A discusses a sampling of related work using the simulation and hybrid-experimental methods, while Section 2.B focuses on work using the full-experimental method. Note that suggested search keywords are italicized and discussed further in Section 3.

A. ML Applications in Optics with Beams Carrying OAM: Research Related to Simulation and Hybrid-Experimental Approaches

This section of the tutorial highlights a number of works where the simulation and hybrid-experimental approaches are used to study a wide variety of topics in optics. First, we highlight research related primarily to the properties of light, and then we highlight a number of works related to free space optics communication systems.

One of the first papers to integrate machine learning and beams carrying OAM focused on the use of simulated topological charges (property of light) up to 100 and also added random noise for testing their classification algorithm [1].

The idea of performing mathematical operations using optics with classification enhancement using ML/AI is a new approach wherein [2] a diffractive deep neural network is an integral part of an optical computer to perform basic operations at the speed of light for which the hyperparameters were trained using simulations of Hermite–Gaussian (HG) beams in weak atmospheric turbulence. Additionally, optical diffractive neural networks and single OAM mode Laguerre–Gaussian (LG) beams have been used to represent logical states (AND, OR, etc.) in a simulation of an optical computer [3].

Compensating for phase distortions on propagation is challenging due to the unpredictable impact of the environment. Here, we highlight several examples from holography and wavefront sensing that utilize ML/AI to improve the performance. In [4], authors jointly classified the OAM mode and compensating wave front distortion using a convolutional neural network with simulated propagation of LG beams in moderate atmospheric turbulence. A depth-controllable imaging technology in OAM deep multiplexing holography was introduced via a prototype of a five-layer optical diffractive neural network design [5]. In simulated weak atmospheric optical turbulence, both the OAM mode and the spatial depth of the incident light are simultaneously modulated via unitary transformations and linear modulations. Two CNNs were trained to reconstruct the encrypted hologram and interpret the topological information for the simulation of sixteen beams carrying OAM with topological charges ranging from 10 to 160 [6]. In addition, an attempt has been made to correct distortions through the relationship between the phase screens and intensity deviations. The turbulence aberration correction CNN (TACCNN) model, learns the mapping relationship of the intensity profile of the distorted vector vortex beams (VVB) modes and the turbulence phase screens. The TACCNN model has the advantage in reducing calibration time [7]. Polarization and OAM are coupled in the VVB modes using SLM, and turbulence is generated using SLM. The results show that the corrected optical mode profile at the receiver are nearly identical to the desired profiles, and the mode purity increases significantly. Also, finding the spectrum of a beam that carries OAM on propagation through weak turbulence can be used to determine the complex amplitude distributions, and in [8] the intensity pattern and its corresponding OAM spectrum constitute a data pair through training EfficientNet on close to 30,000 simulated pairs. The authors were able to achieve success in finding the spectrum under different levels of Gaussian noise and zoom levels.

Fiber networks are critical for our global communication systems, and here, we highlight a few advancements of OAM propagation along with the use of ML/AI. In [9], machine learning assisted an inverse design strategy for an ultra-wideband mode selective coupler developed by obtaining a complex mapping relationship between structure parameters of the bridge fiber as well as an effective refractive index. In this application custom deep CNNs were used with linearly polarized light carrying OAM simulated in fiber coupling. Additionally, a simple diffraction-based deep learning system was used to reconstruct a high dimensional orbital angular-momentum spectrum via a single-shot measurement [10].

Detecting polarization and modes of light carrying OAM has also been demonstrated using ML/AI in [11], where the authors used astigmatic transformation and machine-learning processing. Also, the construction of vortex beams using q-plates and the characterization of high-dimensional resources for quantum protocols to retrieve the position on the Poincare sphere corresponding to generated states, thereby classifying specific polarization patterns using linear support vector machines, is described in [12]. In [13], a pre-trained deep network ResNet-18 was used in conjugation with interferometry to distinguish between different modes. An interferogram between the LG beam and the Gaussian beam is used to identify the OAM mode in [14], where a baseline CNN, CNN-GS (Gerchberg–Saxton algorithm), and VIR-GSF (vortex interferogram recognition with a Gaussian smoothing filter) are applied, with VIR-GSF having the best performance.

This next section gives a number of examples of ML/AI applications in the area of free space communication simulations. The publications showcase the breadth and depth of research in the area of radio frequency communications and adaptive optics, as well as with an emphasis on simulating the impact of different CNN architectures, transmission distance, signal-to-noise ratio (SNR), modulation schemes, beam misalignment, and turbulence strength in both the atmosphere and deep ocean and seawater types. All of the findings suggest the benefits of using the ML/AI In classification algorithms.

In [15], a deep CNN (ResNet) was used to detect the fractional and integer OAM modes of vortex beams under partial occlusion in radio frequencies. This simulation propagated the beams over a distance equal to ${100} \times $ their wavelength. In [16], the authors simulated integer and fractional topological charges (2 to 2.9) as well as fractional polynomial orders (0 to 1) of LG beams, which were then classified in moderate atmospheric turbulence using a deep CNN. Adaptive demodulating accuracy for five kinds of CNN architectures was demonstrated in [17]: a self-organizing network, pre-trained AlexNet, a one-convolution-layer shallow CNN, a two-hidden-layer shallow neural network, and two-convolution-layer CNN where the simulation also included the propagation of LG beams with topological charges up to $\pm {16}$ over a 1 km link in moderate atmospheric turbulence. In [18], strong atmospheric turbulence in addition to beam misalignmentwith tilt angles from 5° to 30° were considered. In [19], moderate turbulence levels, transmission distances, mode spacings, and multiplexed modes were considered, using a custom-made CNN for classification. The impacts of turbulence strength, transmission distance, signal-to-noise ratio, and CNN hyperparameters were investigated using a 16-symbol alphabet based on coding and chaotic interleaving [20]. In [21], AlexNet was modified for mode classification by using a small portion of the speckle fields, eliminating the need for capturing the whole modal field and where HG and LG beams were simulated propagating in moderate atmospheric turbulence over 100 m. In [22], the authors jointly increased the modulation cardinality and controlling beam diameter in an LG-OAM free-space optical communication and also used a CNN to demodulate the received messages. In [23], an OAM-recognition neural network was used to extract features of different fractional vortex beams and then acquire the decision boundary for discrimination (100 fractional topological charges between 1 and 2 were simulated a using spatial light modulator). The impacts of different modulation orders, oceanic turbulence strengths, seawater types, signal-to-noise ratios, transmission distance, and CNN architectures (LeNet-5, AlexNet, and GoogLeNet inception v4) were investigated in [24]. A communication link introduced in [25] used spatial light modulators (SLM) to generate LG beams and turbulence and also used conjugate mode sorter beams at the receiver. Deformations introduced by turbulence were linearly decoded in the radon cumulative distribution transform (R-CDT) space through the aid of a non-linear feature extractor (1-layer CNN). Then, using the same experimental dataset as in [25], the pre-trained CNN AlexNet outperformed the classical conjugate mode sorting technique [26], proving to be more robust to added sensor noise, unknown levels of turbulence, and training set size.

B. ML Applications in Optics with Beams Carrying OAM: Research Related to the Full-Experimental Approach

In this section we highlight work that utilize the full-experimental approach to generate realistic optical turbulence prior to ML/AI-based classification. We feel this approach represents the gold standard for ML/AI work.

We highlight two research teams working in the underwater environment, demonstrating highly reliable classification in well-characterized media. In [27], the authors used two custom-made machine learning attenuation model neural networks: (1) automatic differentiation and (2) the R-CDT, which have strong fundamental ties to the underlying physics of the optimal transport of photons. The data was obtained experimentally, where 16 symbols are transmitted under various levels of underwater attenuation, showing that classification in the R-CDT space is more accurate than working in the raw image space [28].

Another research team performing an underwater experiment [29] with a specially designed alphabet of 16 symbols propagating through the same levels of water turbidity showed that images collected in the most adverse conditions authors observed provided good network training for successful classification of previously unseen images collected under more favorable conditions. Their custom-developed lightweight CNN used limited training and was representative of littoral areas, which can have higher concentrations of attenuating particles. The authors considered the attenuation at six orders of magnitude. In [30], natural convection was experimentally generated, and recorded data was augmented by random reflections, rotations, scaling, translation, and shear. In [31], the authors fully characterized the underwater optical turbulence created by natural convection and demonstrated reliable classification of a 32-symbol alphabet under very strong turbulence.

Here, we highlight experiments that used significant path lengths under natural conditions. A 150 m free-space link was built using a flat, four-inch mirror at an altitude of 1750 m above sea level in Johannesburg, South Africa [32]. A digital micro-mirror device was used to generate the HG and LG spatial modes, and then a conditional generative adversarial network, a framework that learns to generate new data with the same statistics as their training sets, was applied.

The first study of propagating beams carrying OAM (LG and HG) in a desert storm with a visibility of up to 9 m was given in [33]. In this research, a lab-emulated desert was created in a ${90} \times {40} \times {40}\;{{\rm cm}^3}$ controlled-environment chamber where the dust particles were homogeneously distributed using fans at the bottom and where the dust particles were collected during a real dust storm, with average diameter of 17.3 µm. The Support Vector Machine (SVM) and CNN outperformed the K-Nearest Neighbor algorithms.

From all of the work discussed in Section 2, it is evident that the ML/AI approach has made a significant impact on classification algorithms in different applications in optics and is a growing field of research.

3. DESIGNING EXPERIMENTS IN THE FULL-EXPERIMENTAL APPROACH TO STUDY ML/AI-AIDED LASER COMMUNICATION SYSTEMS

In-lab experimental designs for optical beam propagation through complex media are many and varied. This is a baseline tutorial for designing experiments using a full-experimental approach to inducing optical turbulence for laser beams carrying OAM integrated with machine learning techniques. This section is structured as follows: Section 3.A focuses on baseline techniques for generating beams carrying OAM, Section 3.B focuses on characterizing optical turbulence, Section 3.C focuses on experimental design options, and Appendix B focuses on additional considerations for beam detection specifically for machine learning applications.

Fig. 3. Amplitude and phase distributions, respectively, for Laguerre–Gaussian beam (a) and (b) ${U_{0,1}}$ and (c) and (d) ${U_{1,9}}$. Note that the intensity is normalized to 1, and the phase changes from 0 to ${2}\pi$, shaded from blue (zero value) to yellow (max value) [34].

Download Full Size | PDF

A. Generating Laguerre–Gaussian Beams Carrying OAM

While there are many possible ways to generate beams carrying OAM [35] (spiral phase plates and deformable mirrors, q-plates, etc.), we focus on the spatial light modulator (SLM). SLMs are devices that allow the spatial control of the phase across its area, enabling a straightforward mechanism for beam shaping [36]. We provide a baseline description of how to use an SLM to generate a Laguerre–Gaussian (LG) beam, which is a type of beam that carries OAM [34]. Since we use a phase-only spatial light modulator (SLM), we calculate the phase of an LG optical beam at the origin using the expression for the light field $U$, given in Eq. (1) [35], which is presented in cylindrical coordinates $({\rho ,\theta ,z})$, and has parametric dependence only on the radius of the beam at the source ${w_0}$, and the wavelength of light $\lambda$. Let us define the Rayleigh range as $R = \frac{{\pi w_0^2}}{\lambda}$ and the effective radius of the beam at the distance $z$ as $w(z) = {w_0}\sqrt {1 + {{({\frac{z}{R}})}^2}}$. Then, the field ${U_{n,m}}({\rho ,\theta ,z})$ is given by

(1)$$\begin{split}{U_{n,m}}({\rho ,\theta ,z} ) &= \sqrt {{I_0}} \frac{{{w_0}}}{{w(z )}}{e^{- \frac{{{\rho ^2}}}{{w{{(z )}^2}}}}}{e^{- i\frac{{\frac{{2\pi}}{\lambda}{\rho ^2}z}}{{2({{z^2} + {R^2}} )}}}}{e^{- i\frac{{2\pi}}{\lambda}z,}} \\ &\quad\times\sqrt {\frac{{2n!}}{{\pi w_0^2({n + | m |} )!}}} {\left({\frac{{\sqrt 2 \rho}}{{w(z )}}} \right)^{| m |}}\\ &\quad\times L_n^{| m |}\left({\frac{{2{\rho ^2}}}{{w{{(z )}^2}}}} \right){e^{- i({2n + | m | + 1} )\arctan \frac{z}{R} + im\theta }},\end{split}$$

where ${I_0}$ is the initial intensity (usually set to 1), $n$ is the order of the Laguerre polynomial, and $m$ is the topological charge of the beam carrying OAM. It is also possible to generate LG beams from a linear combination of Hermite–Gaussian beams (see [35], Chap. 2). Figure 3 illustrates the intensity and phase distributions for beams ${U_{0,1}}$ and ${U_{1,9}}$. It is important to recognize that since the intensity spatial distribution is readily captured by camera, it is possible to use a variety of ML/AI algorithms for classification, and most of the available research is focused on beams that are identifiable by the light amplitude. It is very difficult to experimentally acquire the phase information due to the complexity and limited availability of the appropriate instrumentation.

Fig. 4. Example of 15-beam alphabet created by superimposing 4 Laguerre–Gaussian basis beams. (a) The theoretical intensities and (b) the beams after propagation through moderate strength of turbulence underwater. Note that beam 16 is not shown since it is trivial with zero intensity. The notation in this image is the binary code representing the superposition of the basis beams ${1000}\;= \gt \;{{\rm U}_{0,1}}$, ${0100}\;= \gt \;{{\rm U}_{1,4}}$, ${0010}\;= \gt \;{{\rm U}_{0 - ,6}}$, and ${0001}\;= \gt \;{{\rm U}_{1,8}}$. Reprinted from [37]; author’s approval obtained.

Download Full Size | PDF

For a sample code that can be used to generate beams carrying OAM, see [34,38,39] and our sample code and data [40].

B. Optical Turbulence Characterization

Optical turbulence is the result of fluctuations in the index of the refraction along the propagation path of an electromagnetic beam which, for optical frequencies, is primarily driven by random temperature fluctuations in the medium. Optical turbulence poses a number of challenges with regards to its deleterious effects on laser beam propagation, which can include increased beam spreading, beam wander, and fluctuations in intensity (also known as scintillation), all of which can lead to reduced joules on target or lost bits of digital information (i.e., increased bit-error-rate or BER) effectively complicating symbol classification at the receiver.

The strength of optical turbulence can be described by the index of the refraction structure constant $C_n^2$ and as such, it should be reported for all experimental data collection scenarios. For atmospheric turbulence, $C_n^2$ can be computed from the temperature structure function ${D_T}(r)$ as follows [41]:

(2)$${D_T}\left(r \right) = \langle{\left({{T_1} - {T_2}} \right)^2} \rangle= \left\{{\begin{array}{*{20}{c}}{C_T^2l_0^{- \frac{4}{3}}{r^2} \quad 0 \le r \ll {l_0}}\\[3pt]{C_T^2{r^{2/3}} \quad {l_0} \le r \ll {L_0}},\end{array}} \right.$$

where $C_T^2$ is the temperature structure constant and ${T_1}$ and ${T_2}$ are the temperatures at two points separated by a distance $r$. The brackets $\langle \rangle$ in Eq. (2) denote the ensemble average of the squared difference in temperatures ${T_1}$ and ${T_2}$, and ${L_0}$ and ${l_0}$ are the outer and inner scales of turbulence, respectively. Additionally, the index of the refraction structure function within a statistically homogenous isotropic turbulence is given by

(3)$${D_n}\left(r \right) = \langle{\left({{n_1} - {n_2}} \right)^2}\rangle = \left\{{\begin{array}{*{20}{c}}{C_n^2l_0^{- \frac{4}{3}}{r^2}\quad 0 \le r \ll {l_0}}\\[3pt]{C_n^2{r^{2/3}}\quad {l_0} \le r \ll {L_0}},\end{array}} \right.$$

where ${n_1}$ and ${n_2}$ are the indices of the refraction of the medium at two points separated by a distance $r$ and $C_n^2$ is the index of the refraction structure constant with units of $[{{\rm m}^{- \frac{2}{3}}}]$. Using the relationship in Eq. (2), where thermocouples can be used to measure ${D_T}(r)$ and the assumptions outlined in the formulation of Eq. (3), the constant $C_n^2$ can be calculated utilizing the following formula [41]:

(4)$$C_n^2 = {\left({79 \times {{10}^{- 6}}\frac{P}{{{T^2}}}} \right)^2}C_T^2,$$

where the temperature $T$ is measured in $[{^\circ {\rm K}}]$, the pressure $P$ is in mbar, and $C_T^2$ has units of $[^\circ {{\rm K}^2}{{\rm m}^{- \frac{2}{3}}}]$. Note, the index of the refraction structure constant $C_n^2$ is a measure of the intensity fluctuations of the refractive index in the medium or atmosphere in this case.

Underwater optical turbulence emulation [42] and generation can also be described by the index of refraction fluctuations and temperature gradients. The following references give additional details on differences and similarities encountered with underwater versus atmospheric optical turbulence characterization: [43–45]. Of particular note, for underwater optical turbulence, the temperature gradient of the index of the refraction of water can be two orders of magnitude higher than in air [46], which, for in-laboratory optical turbulence generation, allows for a much greater optical turbulence effect over a shorter path length. Figure 4 illustrates the impact of the underwater optical turbulence on a set of beams carrying OAM.

It is important to note that we emphasize the characterization of optical turbulence as a critical component in determining the effectiveness of the communication systems when comparing the accuracy of the message classification in the case when ML/AI is used. The characterization of the turbulence is independent of which structured light is transmitted, but if the reader is specifically interested in the impact of the optical turbulence on light carrying OAM, the following references give an introduction [47,48].

C. General Experimental Design Guidelines

The test of all knowledge is experiment. Experiment is the sole judge of scientific ‘truth.’

Richard P. Feynman [49]

Figure 2 shows a generic baseline experimental design that is generally agnostic to the method of generating optical turbulence, which can be generated in a number of ways (see [46] for a review). Hot-air emulation largely focuses on the ability to generate temperature gradients across the optical beam path in the air that includes the use of heaters (heat guns, hot plate, etc.) with fans along a propagation chamber and outfitted with thermocouples for measuring $C_n^2$ [50–53]. Hot-air turbulence emulation and point-source heaters generate natural convection, thus impacting the variation of the refractive index along the propagation path. Those interested in implementing the hybrid-experimental approach should consult the following references for SLM [54,55], or for rotating phase screens see [56].

Underwater propagation research has seen a lot of recent activity owing to recent needs in underwater sensing and communication [57]. For in-laboratory use, underwater natural convection generates a much higher level of turbulence over short distances. Underwater optical turbulence emulators use a tank with optical glass on the boundaries. For example, in [31], researchers used a 3 m tank with controlled temperature point heaters to generate underwater optical turbulence. They discuss the challenges of generating well-characterized experimental conditions and increasing the propagation path by adding mirrors.

Rayleigh–Bénard natural convection in air and water is generally considered a well-characterized and consistent complex medium and presents one of the most repeatable environmental conditions for studying optical turbulence’s impact on the propagation of structured light [43,58].

Once a beam carrying OAM is generated and an optical turbulence mechanism is established, the detection of the LG beams becomes the next task. Appendix B gives some guidelines for best practices in collecting a valuable dataset from experimental trials with applications to machine learning.

Fig. 5. Representative CNN architecture.

Download Full Size | PDF

In order to facilitate the comparison of research findings we encourage the authors in future publications to standardize reporting on scenarios they studied. Most of the authors currently include information about the light wavelength and type of light carrying OAM. However, the most critical information about the characterization of the complex environment the beams are propagating is often omitted. In the case of the simulations and hybrid experimentation, since well-established models are utilized, it is straightforward to provide the model used, the environment, (air, water, etc.), the strength of the optical turbulence, the length of the propagation path, and the simulation/SLM spatial resolution and cycling rate appropriate for the conditions studied (see Appendix A). In the case of the full-experimental research most suitable for ML/AI methods, the documentation on how the turbulence is generated and measurements supporting characterization of its strength, is of utmost significance. In most of the published work, only qualitative explanations of the conditions are given. Data recording information should include the number of images collected and the rate of the recordings, spatial resolution, camera aperture settings, and detailed explanations on how the experiment was conducted (for guidelines, see Appendix B). Notably, training data for a CNN should be representative of the full range of variation likely to be encountered in practice. We provide real-world examples that include our datasets and a simple machine learning code 1 for the readers to familiarize themselves with best practices.

4. CONVOLUTIONAL NEURAL NETWORKS

The CNNs described in this tutorial are a type of supervised learning framework for image classification tasks, which means that the network is trained using a set of images that have been correctly labeled as belonging to one of a finite number of predetermined categories or classes (for example, see Fig. 4). The network tries to iteratively learn an approximation to the unknown function that maps raw input images to class assignments from the labeled examples in the hopes that its accuracy will generalize well to a set of test images it has never seen before.

The CNN architecture is motivated by Hubel and Wiesel’s 1951 experiments on the visual cortices of animals [59], which eventually won them the Nobel Prize in Physiology or Medicine, suggesting that individual neurons only respond to certain patterns (e.g., vertical lines) in a small subset of the field of view (termed the local receptive field) and that the fields of adjacent neurons are organized in an overlapping fashion. Inspired by this work, [60] proposed the neocognitron model which is likely the first appearance of a CNN in the literature. In that work, multiple convolutional filters with local receptive fields were manually designed to respond to certain patterns (e.g., vertical lines or specific textures), and a neural network was used to learn the weighted combinations of the filter responses to correctly classify an object. In 1998, [61] described a CNN model for handwriting recognition, called LeNet-5, which forms the basis for modern CNN implementations, including our own work in this area [26]. Here, the network inputs are raw images, and it learns the filter coefficients, avoiding the need to manually design them as was done in previous approaches. Later, researchers demonstrated that modern graphics processing units (GPUs) could vastly accelerate the training process, permitting researchers to use more layers, neurons, and training data [62]. This enabled the development of the now-famous AlexNet [63], a very deep CNN with over 60 million learnable parameters (aka weights), which won the 2012 ImageNet Contest involving assigning over a million images to one of a thousand possible classes. Since that time CNNs have established themselves as the dominant approach to image classification and many improvements on AlexNet’s architecture have been proposed, which will be discussed below.

A. Basic CNN Architecture

Figure 5 illustrates a typical CNN architecture, consisting of an input layer followed by a repeating series of convolution-pooling-activation layers, a set of dense layers followed by a soft-max function, and an output layer representing the confidence the input image belongs to each of the possible class. Each is described in more detail below.

Input Layer: An image is a large array of positive real valued numbers, ${\boldsymbol I} \in {{\boldsymbol R}^{M \times N \times B}}$, with an arbitrary number of rows $M$, columns $N$, and bit planes (aka channels) $B$, and where a given entry ${I_{\rm{rcb}}}$ represents the intensity of a pixel in a given row, column, and bit plane, often stored as a floating point number (between 0 and 1) or an 8-bit integer (0 to 255). Typically, a CNN’s input layer accepts either 2-D (grayscale) or 3-D (color) arrays (usually encoding red, green, and blue bit planes as RGB triples), maintaining the spatial proximity among pixels. No computation occurs in this layer, and there are no learned parameters.

Note that a given network only accepts images of a fixed size, which tend to be very small compared to the resolution of modern digital cameras. For example, AlexNet only works with ${227} \times {227} \times {3}$ RGB images. Larger images must first be downsampled before they can be classified. Due to the uncluttered nature of our beam images we have had great success with grayscale images as small as ${64} \times {64}$ pixels.

Convolution Layer: In image processing, a convolution is an operation where a filter, also termed a kernel or mask, is applied to an image to create a response matrix. The filter, ${\boldsymbol F} \in {{\boldsymbol R}^{n \times n \times B}}$, is typically a small square matrix, representing a receptive field, with an odd number of rows and columns $n$ so that there is a well-defined center entry. The entries ${F_{\textit{ijb}}}$ are called coefficients (see red matrix in Fig. 6). Unlike traditional image processing where coefficients are human-selected, each coefficient in a CNN is a learnable parameter, meaning that the numerical solver will automatically adjust their values during the learning process, as explained in Section 4.B.

Performing a convolution involves sweeping the filter’s center entry across the image in sliding window fashion, each time shifting it by a number of pixels referred to as its stride. Figures 6(a) and 6(b) shows an example of a grayscale image convolved with a filter designed to detect horizonal lines, using a stride of one (by far the most common value). The overlapping pixel values and filter coefficients are multiplied and summed to create a corresponding entry in the response matrix ${R_{\textit{rc}}}$. More formally, this operation ${\boldsymbol R} = {\boldsymbol I}*{\boldsymbol F}$ can be defined as

{R_{\textit{rc}}} = \mathop \sum \limits_{b = 1}^B \mathop \sum \limits_{i = 1}^n \mathop \sum \limits_{j = 1}^n {I_{r - \frac{{n + 1}}{2} + i,c - \frac{{n + 1}}{2} + j,b\;}} \cdot {F_{\textit{ijb}}}.

Most modern image processing libraries have routines to hardware accelerate this operation. Clearly, care must be taken at the edge of the image, giving rise to various padding options [64], where required pixel values beyond the edge of the image are imputed. Each entry in the response matrix measures how strongly the image matches the filter in an $n \times n$ neighborhood of pixel $r,c$. The learned filters can themselves be visualized as small images giving some insight into the image patterns the network finds useful.

Fig. 6. (a) Illustration of a grayscale convolution and (b) the resulting response (stride = 1, no padding). (c) The effect of ${2} \times {2}$ max pooling (stride = 2). (d) Activation via the ReLU function.

Download Full Size | PDF

In a CNN, a given convolutional layer is comprised of a set of $P$ filters (aka filter bank), and the resulting response matrices are concatenated to form an $M \times N\; \times P$ output. For example, the first layer of AlexNet uses 96 filters, each $11 \times 11\; \times 3$, and therefore has 34,848 learnable parameters, and the output (with stride =1) is $227 \times 227\; \times 96$.

Max-Pooling Layer: Because there are often may filters in a convolutional layer’s bank, the output matrix tends to have many channels. Max-pooling attempts to offset this increase in depth by downsampling in the row and column dimensions, retaining the strongest response in each neighborhood. For example, in a given channel [see Figs. 6(b) and 6(c)], each ${2} \times \;{2}$ block of pixels is replaced by a single entry whose value is the maximum of the four original responses, so the resulting output array has half the number of rows and columns. Other schemes exist, such as average pooling or median pooling. Again, one may adjust the stride, i.e., how many pixels the ${2} \times {2}$ window slides by at each step. In addition to reducing computation time and memory requirements, this operation reduces the sensitivity to small image perturbations, helping to fight overfitting. This layer contains no learnable parameters.

For example, input images to AlexNet are ${227} \times {227}$. After multiple max-pooling layers, each reducing the image by half along each dimension, the response matrices are ultimately downsampled to ${6} \times {6}$. In our application, beginning with smaller ${64} \times {64}$ images, we limit the number of max pooling operations to 3.

Activation Layer: Each pooled output is then fed to an activation function, which must be nonlinear and monotonically increasing. Traditional choices include sigmoids or hyperbolic tangents. However, modern CNNs, including our own work, employ rectified linear units (ReLU), i.e., if $x \gt 0,\;f(x) = x$, otherwise $f(x) = 0$, because they are computationally inexpensive to evaluate and differentiate. This layer contains no learnable parameters.

Repetition: Typically, multiple sets of convolution/max-pooling/activation layers are combined in a series (Fig. 5), with later layers typically having “smaller” receptive fields but larger filter banks. One can imagine that this structure permits the network to use hierarchical descriptions of objects, e.g., the first layer detects a small number of primitives such as edges, corners, or holes, which the second layer is able to combine into more complex structures (vortex, ring, and petal), and the third composes those into beam shapes. It is important to reemphasize that these features are learned and do not always correspond to human intuition.

Dense Layers: Dense layers resemble a traditional neural network containing a large number of neurons organized into a handful of layers. Each output of the final convolution-maxpooling-activation layer is connected to each of the neurons in the first dense layer via synapses, each with an associated multiplicative weight. Then, at each neuron the weighted inputs are summed and fed to an activation function (e.g., ReLU) whose output is then connected to each neuron in the subsequent layer via another weighted synapse, etc. This all-pairs connectivity (aka fully connected) results in a dense number of connections containing the vast majority of the learnable parameters. For example, AlexNet’s final convolution produces a ${6} \times {6} \times {256}$ response, followed by a 4096-neuron dense layer, resulting in approximately 38 million weights! With so many parameters, overfitting is quite possible, even on very large datasets, so a drop-out feature is often implemented to improve robustness, where the output of each neuron in the dense layer is ignored (set to zero) with probability $p$ during the training phase.

Softmax and Output Layer: Each of the $l = 1{\cdots}K$ neurons in the last dense layer corresponds to one of the $K$ object classes. The softmax function is often used to convert the $l$th neuron’s raw output to a value between 0 and 1, such that the sum of all $K$ outputs equals one,

{\sigma _l} = \frac{{{e^{{z_l}}}}}{{\mathop \sum \nolimits_{k = 1}^K {e^{{z_k}}}}},

thereby enabling the outputs to be interpreted as probabilities or confidences that the input comes from a certain class. The class with the highest confidence score is then assigned to the input, according to $\arg \mathop {\max}\limits_{{\rm l} \in [{1, \ldots ,{\rm \;K}}]} {\sigma _l}$.

B. Training

In essence, learning is numerically minimizing the classification error over the space of learnable parameters (i.e., synapse weights and filter coefficients). Typically, these parameters are initialized at random. Then the labeled training data (input images, for which the correct output class is known) is presented to the network in batches. Sometimes when the training dataset is insufficient in size or diversity, the images are augmented (subject to random affine transformations). In OAM application we can only partially endorse this practice. For example, small image translations should be used to improve the network’s robustness in the face of misalignment. On the other hand, if, for example, two alphabet symbols are identical in shape but only differ in size (e.g., a small and a large ring representing two different topological charges), then inducing random scalings will likely confuse the classifier and degrade results. In general, any augmentation operations should be representative of the types of distortions likely to be seen in practice.

After training images are presented, the resulting predictions are then compared to the correct output classes, and the error between the two is used by the solverused to adjust, or learn, the network parameters using back propagation to compute the needed gradients. Because the classification error is in general binary (i.e., correct versus incorrect), the cross entropy loss is commonly used as a continuous, smooth, proxy for the error function during optimization. In our experience, most common network architectures seem to perform well with a variety of popular solvers (i.e., Adam, Stochastic Gradient Descent, etc.) often with the default solver parameter values (i.e., learning rate, initial step size, etc.). See [28] for an OAM-geared meta-analysis of solver options.

It is worth mentioning that a large number of CNN design choices (the so-called hyperparameters, such as filter sizes, number of neurons, or solver parameters) cannot be learned by the algorithm but may be tuned by the designer. Few hard-and-fast rules exist for their selection, but see [65] for some guidelines.

A critical issue is that of training termination criteria. Examples include exceeding a maximum number of iterations, failing to reduce the objective function by some minimum threshold in a given iteration, or the estimated gradient failing to exceed some minimum value, implying the objective function has reached a local extremum. Yet it is extremely difficult to set meaningful values of these parameters a priori. Conservative values will result in premature termination and underperformance on the classification task, while overly aggressive values result in long training times and overfitting, memorizing the idiosyncratic patterns of the training dataset that will not generalize to new data. The latter is a very real risk of using deep learning models.

A best practice is to use a more organic termination criteria that helps fight overfitting or early stopping. Here one sets aside a small portion (e.g., 10–20%) of the training data, called a validation set. This data is not used by the minimization algorithm to directly adjust the weights. Instead, it can be thought of as a “practice test” given periodically to assess if the algorithm is continuing to improve its ability to classify new examples. Training is stopped once the network fails to improve its accuracy on the validation set a certain number of times (termed the patience).

C. Testing and Evaluation

Once trained, the network’s accuracy is then evaluated. Note that it is meaningless to report accuracy on the training dataset because the network may have just memorized its idiosyncrasies. Instead, we use test data, an independent set of labeled images it has never seen before. A confusion matrix provides a visualization of the types of errors that may occur. Figure 7 shows an example for a 4-beam classification task using a 400-image test set. The first row tells us that of the 100 images that truly belonged to the class “Beam 1,” 72 were correctly classified, while 6 were confused with “Beam 3,” and 22 were confused with “Beam 4.” Note that the matrix is not symmetric, since the classifier never mis-predicts true “Beam 3” images as belonging to class “Beam 1.” Sometimes it is instructive to manually inspect the misclassified images and compare them to the training examples (Fig. 8).

Fig. 7. Confusion matrix shows the number of correctly classified examples on the diagonal as well as the various types of errors.

Download Full Size | PDF

Fig. 8. (a) Test image of Beam 1011 (Fig. 4) that was incorrectly classified as Beam 0111. (b) A representative training image of Beam 1011. (c) A representative training image of Beam 1011.

Download Full Size | PDF

There are several reasons the accuracy maybe non-deterministic across trials. The learnable parameters are often initialized at random, many solvers use stochastic methods to estimate high dimensional gradients, drop-out is probabilistic, and the train-test split is often chosen at random. In practice, the first three issues tend to result in fairly minor variations and can often be controlled by seeding the random number generator or by averaging across several trials. However, the issue of the train-test split can significantly bias results, especially when the dataset contains a small handful of challenging images. Best practice involves the use of $K$- or cross-fold validation, where the dataset is split into $K$ partitions, and training and testing are repeated $K$ times (each using a different partition for testing and the remaining $K - {1}$ for training). The average accuracy over $K$ trials should be reported.

Fig. 9. Landscape of the deep CNN models (reprinted from Bianco et al. (2018); authors’ approval obtained [66]). The sizes of the dots are proportional to the model’s size. The upper left frontier represents the state of the art for the accuracy versus computation tradeoff.

Download Full Size | PDF

D. Popular Architectures and Transfer Learning

Since AlexNet was introduced, many other superior architectures have emerged. A general research objective is to improve classification accuracy relative to model size (generally expressed in terms of the number of learnable parameters), which serves as a proxy for several properties of practical interest. Larger models have the potential to be very expressive, yet face several drawbacks. They take longer to train and generally require more training data to avoid overfitting. However, training is ideally a one-time endeavor, done off-line and often using powerful cloud computing. Of greater concern is when large networks are deployed for repeated classification tasks in real-time applications or using so-called “edge computing.” They require more memory to store the parameters, require more floating-point operations to evaluate, and are therefore slower and consume more energy. For example, AlexNet’s 64 million parameters occupy a significant chunk of the RAM on a typical embedded microprocessor. In contrast, ShuffleNet [67] achieves comparable accuracy using less than 1/13th of the floating-point operations per second (FLOPS) and only 375 thousand parameters, while our custom lightweight CNN [68] contains only 128,000 parameters ($500 \times$ smaller than AlexNet). Figure 8 depicts the relative accuracy and size of some well-known architectures.

Training large networks such as the ones in Fig. 9 require great resources. Runtimes can extend over days, even on multi-core GPUs or other specialized hardware the associated energy or carbon footprint can be enormous; and, perhaps the most significantly, they require millions of hand-labeled training images. Transfer learning allows one to leverage the power of large pre-trained networks, designed for more general image recognition tasks, for more specialized image classification tasks, such as the ones considered in this tutorial, with only minimal additional problem-specific training. First, a large pre-trained network’s parameters are copied. Then the output layer is replaced with one containing the number of classes in the new specialized task (i.e., the OAM beam alphabet symbols). This new network is then retrained using a smaller dataset containing only application specific images, but parameters in the early convolution layers are frozen, as they represent low-level primitives common to many image classification tasks, while learnable parameters in the later layers, representing more application specific high-level features, are allowed to change. Because the number of parameters is effectively much smaller and the initial values are not simply random, the new network can be trained in much less time using an order of magnitude fewer application specific training images. Indeed, this approach was used in the seminal paper [26] on classifying OAM beam images, and [69] performed a comparative study of transfer learning and concluded that, at the time of publication, DenseNet had superior performance if network size was not a consideration, while ShuffleNet was preferable when memory limitations are considered. Note that AlexNet is no longer considered state of the art and its use should be avoided in future publications.

5. CONCLUSION

This tutorial was targeted at researchers interested in developing optical communication systems that propagate structured light through random media and use an AI/ML-based approach to classify the received alphabet symbols. We assumed the reader is familiar with either optics or AI/ML but is likely not an expert in both fields.

While the field is active and robust, the protocols for dataset generation and results reporting vary widely in the literature. Therefore, we conclude by summarizing some best practices mentioned throughout the tutorial, which we hope will become standard. First, we strongly emphasis that all data should be collected using the full-experimental approach to generating optical turbulence (i.e., using some in natura mechanism such as heaters) when training and benchmarking AI-based systems. Systems trained on synthetic data (i.e., via numerical methods or a series of phase plates) risk learning the limitation and artifacts of the underlying models. In general, images in the dataset must be accurately labeled (suggesting the use of an automatic triggering system), and all experimental conditions should be reported (e.g., $C_n^2$, wavelength, propagation distance, etc.) and sufficient in size to establish desirable accuracy (i.e., 99.99%), even after partitioning into training and test sets (${\gt}{20}{,}{000}$ images). Augmenting smaller datasets via synthetic image distortions is a common practice in AI. In our applications, small translations are certainly advisable to overcome alignment issues, but one ought to take care to ensure any other distortions are physically realistic. Lower resolution images (${\lt}{250} \times {250}\;{\rm pixels}$) are acceptable and frequently used in AI applications.

Custom CNN architectures can have the advantage of having a small computational footprint and data requirements, while transfer learning with larger pre-trained networks can be a useful starting point for new researchers or those looking to train on small datasets. In particular, ShuffleNet and DenseNet have demonstrated good performance, while AlexNet should be avoided, as it no longer represents the state of the art. The default solver settings are generally an excellent starting point. However, early-stopping termination criteria should always be employed, where a small fraction of the training data is used as a validation set, rather than using arbitrary termination criteria such as the maximum number of iterations.

Classification accuracy should always be established on a test dataset (not seen in training) and should be reported using $K$-fold validation, where the full dataset is split into $K$ partitions and training and testing are repeated $K$ times, each using a different partition for testing and the remaining $K - {1}$ for training. The average accuracy over $K$ test sets should be reported. Classification speed, or rate, should also be reported, along with training time and computational infrastructure.

There are numerous avenues of future work one can pursue in this fast-growing area of research. One can examine the impact of light wavelength, polarization, and/or level of coherence on the reliability of ML/AI. In particular, imparting spatial partial coherence on beams carrying OAM has been introduced [70–72] as a tool to reduce the scintillation at the target. This method increases the complexity of the beam design but does not fundamentally change the ML/AI classification algorithm. Systematic evaluation of such beams under various environmental conditions would be a valuable experiment for learning their properties when applying ML/AI. Another area of research could be how to design an optimal alphabet in terms of resilience on propagation and ease of classification. The ML/AI algorithm does not recognize how the beam was generated. It only reacts to the intensity image at the receiver [73], suggesting there may be other approaches to expanding the diversity of the symbols in the alphabet and thus improving the reliability rate when ML/AI is implemented. When exploring this approach, one has to balance the complexity of the alignment equipment necessary to deliver focused images at the camera screen, both on the transmitter and receiver side, with the beam propagation properties.

One of the tasks with the potential to provide substantial practical impact is to optimize the size of the ML/AI training set based on turbulence strength, its turnover rate, beam generation rate, and image recording rate.

Beyond communication applications, ML/AI techniques could also be used to deepen the fundamental understanding of the impact of the propagation medium on structured light. Examples include learning to estimate optical turbulence levels from image sequences or gathering information on vortex preservation under adverse environmental effects [74].

APPENDIX A: COMPUTATIONAL RESOURCE REQUIREMENTS FOR CONSTRUCTING DATASETS

In the body of the tutorial, we discounted the simulated and hybrid-experimental approaches to generating optical turbulence on the grounds that they result in unrealistic image distortions (i.e., approximations, artifacts, or limited range of applicability). As there is a real risk that deep CNNs trained with such datasets may learn these limitations, causing them to perform very well on the test dataset, yet failing to perform well under realistic conditions. In this appendix, we highlight another reason to avoid their use, computation cost.

If the ultimate goal of this line of research is to produce commercially viable communication systems, then available datasets should be capable of certifying accuracy on the order of 99.9999%. Since the datasets must be partitioned into training, validation and test sets, well in excess of million images, are required. If we use lower resolution grayscale images, for example ${256} \times {256}$ pixels with 8-bit depth resolution, then each image occupies ${\sim}{2}\;{\rm MBs}$, and the entire dataset will well exceed 1 TBs. If one would like to establish such accuracy across a range of environmental conditions or with larger alphabets, then even more data would be needed.

While the full-experimental approach requires an initial investment to set up, capturing and transferring instantaneous snapshots can be done fairly quickly. For example, with a 500 Hz framerate camera, one million images can be captured in about 30 min (though slower capture rates may be warranted based on the turbulence turnover rate). Using a high-end solid-state drive, the upper limit of the data transfer rate imposed by the SATA-III standard can be approached (${\sim}{500}\;{\rm MB/s}$), requiring a little over an hour to transfer a 2 TB dataset to disk. Of course, standard rotating hard drives can be up to 5 times slower.

However, in the case of the simulations, generating each of the one million or so images also involve the computationally expensive process of producing a series of phase screens to emulate turbulence. The number of phase screens in a simulation is related to the length of the propagation path and the strength of the optical turbulence, while the spatial resolution of a phase screen is dictated by the wavelength of light, distance propagated, and the size of the beam at the reception. For example, red light ($\lambda= 633\;{\rm nm}$) propagating 1/10 of a kilometer ($z = 100\;{\rm m} $) and captured on a ${15}\;{\rm cm} \times {15}\;{\rm cm}$ reception screen ($L = 15\;{\rm cm}$) requires a spatial resolution of $\Delta x = \frac{{\lambda z}}{L} = \frac{{633\;{\rm nm}*100 \; {\rm m}}}{{15\;{\rm cm}}} = 0.4\; {\rm mm}$ [42]. To achieve the 0.4 mm resolution, we need ${355} \times {355}$ pixels ($\frac{{15\;{\rm cm}}}{{0.4 \; {\rm mm}}} = 355$), but since the split screen propagation simulation utilizes Fourier analysis, a larger resolution is required to avoid aliasing (e.g., ${512} \times {512}$ pixels). If we assume an intensity resolution depth of 256, then we need 8.4 Mb (${512} \times {512} \times {256/8} = {8.4 \;{\rm Mb}}$) for each screen simulated. Let us assume that 10 phase screens represent an acceptable approximation of the path of interest and it would like to capture a unique environmental condition for each image in the dataset, generating on the order of 10 million randomly drawn screens is required for each turbulence strength. Note that simulations over longer propagation paths and higher levels of optical turbulence can require generating even higher resolution screens. It is important to note that in the hybrid approach screens also have been physically generated on the SLM. The major difference is that in the full-simulation approach, the images are propagated using high resolution, given in this example, and in the hybrid approach a lower resolution camera could be used to capture data for ML/AI processing.

APPENDIX B: DETECTION OF OAM BEAMS FOR ML APPLICATIONS—SOME CONSIDERATIONS

When the captured beam images are intended for use as a training dataset for an AI/ML algorithm, there are several experimental nuances required to ensure the resulting images are representative of their class.

1. Label Accuracy

Clearly, training any supervised AI/ML algorithm with mislabeled data will degrade performance. Large organic datasets such as MNIST or ImageNet use multiple levels of human adjudication to reduce label inaccuracies yet are still estimated to contain on the order of 0.1% mislabeled images. When structured light is generated by an SLM, mislabeling can be avoided by using automatic triggering and synchronization between the SLM and the camera capturing the images. Special care must be taken near transitions between two symbols, a smearing effect can result where the captured image is the superposition of two different symbols. We have also found it beneficial to insert a “blank” phase screen at the start of each cycle as a marker for post hoc signal processing. Alternatively, the insertion of a Gaussian beam at the transition can serve the same purpose while also providing a measure of scintillation and estimate of turbulence strength.

2. Camera Settings

Controlling various camera settings is crucial in recording a valid dataset, lest camera-induced artifacts may be present. For example, turning off the auto adjust, white balance, or exposure for the camera before data collection is advisable. Check the brightest beam symbol for saturation (histogram function of camera can be useful for this) to ensure that the image is not saturated, otherwise much of the variation in the image will be lost. Likewise, check the darkest symbol to ensure the SNR is acceptable. The use of lossy compression is not advisable.

3. Screen Positioning

The goal is to fill as much of the camera’s field of view with the beam as possible. This allows for best detail. However, overfilling the screen can result in occluded or cropped images which present significant challenges for all image classification algorithms. Given that the alphabet likely contains symbols of various sizes and that beams naturally exhibit wander and distortion, it is advisable to pre-cycle all the beams under a variety of environmental conditions to verify good alignment and scale. Also, note that any consistent artifacts across images of a particular beam class will quickly be learned by the ML/AI algorithm, leading to false accuracy on the test set. Examples might include a particular beam symbol being consistently lower in the image, bright artifacts appearing at the edge of the image, or faint ghost images from extra reflections when transmitting certain symbols (possibly from the surface of the tank).

4. Symbol Cycling to Eliminate Order Effects

In the literature of design of experiments, the term order effects refers to when the order of the trials influences the outcome. For example, in human interface studies, subjects often perform better on a task as the experiment progresses due to additional practice, regardless of the interface being tested. Therefore, if the order in which subjects are exposed to the various interfaces is not varied, the first interface will always fair worse. In our experiments, this can occur because the time scale of optical turbulence changes underwater can be an order of magnitude slower than the rate at which the symbols can be cycled via the SLM. Consider an experiment where 16 symbols will be recorded over 8 min at a frame rate of 500 Hz. An example of a poor experimental design would be, beginning with Symbol 1, to transmit each for 30 seconds (15,000 images) before transitioning to the next symbol. The conditions in the tank are highly correlated over the 30 second capture window, resulting in less image diversity for a given symbol. On the other hand, the conditions in the tank when Symbol 16 is captured may have changed significantly in the seven and a half minutes since Symbol 1 was captured. This can create a situation where some symbols experience high classification accuracy in testing (because the test images exhibit little variation within a class), yet performance generalizes poorly in practice (because the training images do not represent a variety of conditions). A superior experimental design might be to cycle through the entire alphabet, say, 300 times, capturing each symbol for 0.1 second (50 images) per cycle. Such a strategy will capture images of all symbols once every 1.6 seconds, helping to ensure that all symbols are captured across a wide variety of conditions as the turbulence evolves.

Funding

U.S. Naval Research Laboratory; Office of Naval Research.

Acknowledgment

We thank William Jarrett for providing datasets used in the tutorial. S. Avramov-Zamurovic and J. M. Esposito acknowledge the ongoing support of the Naval Research Laboratory and the Office of Naval Research.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper and the ML/AI code for classifying the given beams are available at [40].

REFERENCES

1. E. M. Knutson, S. Lohani, O. Danaci, S. D. Huver, and R. T. Glasser, “Deep learning as a tool to distinguish between high orbital angular momentum optical modes,” Proc. SPIE 9970, 997013 (2016). [CrossRef]

2. S. Watanabe, T. Shimobaba, T. Kakue, and T. Ito, “Hyperparameter tuning of optical neural network classifiers for high-order Gaussian beams,” Opt. Express 30, 11079–11089 (2022). [CrossRef]

3. P. Wang, W. Xiong, Z. Huang, Y. He, Z. Xie, J. Liu, H. Ye, Y. Li, D. Fan, and S. Chen, “Orbital angular momentum mode logical operation using optical diffractive neural network,” Photon. Res. 9, 2116–2124 (2021). [CrossRef]

4. C. Lu, Q. Tian, X. Xin, B. Liu, Q. Zhang, Y. Wang, F. Tian, L. Yang, and R. Gao, “Jointly recognizing OAM mode and compensating wavefront distortion using one convolutional neural network,” Opt. Express 28, 37936–37945 (2020). [CrossRef]

5. Z. Huang, Y. He, P. Wang, W. Xiong, H. Wu, J. Liu, H. Ye, Y. Li, D. Fan, and S. Chen, “Orbital angular momentum deep multiplexing holography via an optical diffractive neural network,” Opt. Express 30, 5569–5584 (2022). [CrossRef]

6. H. Zhou, Y. Wang, X. Li, Z. Xu, X. Li, and L. Huang, “A deep learning approach for trustworthy high-fidelity computational holographic orbital angular momentum communication,” Appl. Phys. Lett. 119, 044104 (2021). [CrossRef]

7. Y. Zhai, S. Fu, J. Zhang, X. Liu, H. Zhou, and C. Gao, “Turbulence aberration correction for vector vortex beams using deep neural networks on experimental data,” Opt. Express 28, 7515–7527(2020). [CrossRef]

8. J. Wang, S. Fu, Z. Shang, L. Hai, and C. Gao, “Adjusted efficientNet for the diagnostic of orbital angular momentum spectrum,” Opt. Lett. 47, 1419–1422 (2022). [CrossRef]

9. S. Zhang, C. Zhang, Y. Zeng, D. Liu, Y. Qin, Z. Zhang, and S. Fu, “Machine learning assisted ultra-wideband fiber-optics mode selective coupler design,” IEEE J. Sel. Top. Quantum Electron. 28, 4500110 (2022). [CrossRef]

10. H. Guo, X. Qiu, and L. Chen, “Simple-diffraction-based deep learning to reconstruct a high-dimensional orbital-angular-momentum spectrum via single-shot measurement,” Phys. Rev. Appl. 17, 054019 (2022). [CrossRef]

11. B. da Silva, B. Marques, R. Rodrigues, P. Ribeiro, and A. Khoury, “Machine-learning recognition of light orbital-angular-momentum superpositions,” Phys. Rev. A 103, 063704 (2021). [CrossRef]

12. T. Giordani, A. Suprano, E. Polino, F. Acanfora, L. Innocenti, A. Ferraro, M. Paternostro, N. Spagnolo, and F. Sciarrino, “Machine learning-based classification of vector vortex beams,” Phys. Rev. Lett. 124, 160401 (2020). [CrossRef]

13. M. A. Cox, T. Celik, Y. Genga, and A. Drozdov, “Interferometric orbital angular momentum mode detection in turbulence with deep learning,” Appl. Opt. 61, D1–D6 (2022). [CrossRef]

14. L. Zhao, Y. Hao, L. Chen, W. Liu, M. Jin, Y. Wu, J. Tao, K. Jie, and H. Liu, “High-accuracy mode recognition method in orbital angular momentum optical communication system,” Chin. Opt. Lett. 20, 020601 (2022). [CrossRef]

15. J.-J. Sun, S. Sun, and L.-J. Yang, “Machine learning-based fast integer and fractional vortex modes recognition of partially occluded vortex beams,” IEEE Trans. Anntenas Propag. 70, 6775–6784 (2022). [CrossRef]

16. M. Cao, Y. Yin, J. Zhou, J. Tang, L. Cao, Y. Xia, and J. Yin, “Machine learning based accurate recognition of fractional optical vortex modes in atmospheric environment,” Appl. Phys. Lett. 119, 141103 (2021). [CrossRef]

17. J. Li, M. Zhang, D. Wang, S. Wu, and Y. Zhan, “Joint atmospheric turbulence detection and adaptive demodulation technique using the CNN for the OAM-FSO communication,” Opt. Express 26, 10494–10508 (2018). [CrossRef]

18. Q. Zhao, S. Hao, L. Wang, X. Wan, and C. Xu, “Mode detection of misaligned orbital angular momentum beams based on convolutional neural network,” Appl. Opt. 57, 10152–10158 (2018). [CrossRef]

19. Z. Wang, M. Dedo, K. Guo, K. Zhou, F. Shen, Y. Sun, S. Liu, and Z. Guo, “Efficient recognition of the propagated orbital angular momentum modes in turbulences with the convolutional neural network,” IEEE Photon. J. 11, 7903614 (2019). [CrossRef]

20. S. El-Meadawy, H. M. H. Shalaby, N. Ismail, F. Abd El-Samie, and A. E. A. Farghal, “Free-space 16-ary orbital angular momentum coded optical communication system based on chaotic interleaving and convolutional neural networks,” Appl. Opt. 59, 6966–6976 (2020). [CrossRef]

21. V. Raskatla, B. Singh, S. Patil, V. Kumar, and R. Singh, “Speckle-based deep learning approach for classification of orbital angular momentum modes,” J. Opt. Soc. Am. A 39, 759–765 (2022). [CrossRef]

22. C. Runge, B. Freitas, and U. M. Dias, “Jointly increasing modulation cardinality and controlling beam diameter in LG-OAM free-space optical communication using convolutional neural network,” Opt. Eng. 61, 026104 (2022). [CrossRef]

23. Z. Liu, S. Yan, H. Liu, and X. Chen, “Superhigh-resolution recognition of optical vortex modes assisted by a deep-learning method,” Phys. Rev. Lett. 123, 183902 (2019). [CrossRef]

24. W. Wang, P. Wang, L. Guo, W. Pang, W. Chen, A. Li, and M. Han, “Performance investigation of OAMSK modulated wireless optical system over turbulent ocean using convolutional neural networks,” J. Lightwave Technol. 38, 1753–1765 (2020). [CrossRef]

25. S. Park, L. Cattell, J. Nichols, A. Watnik, T. Doster, and G. Rohde, “De-multiplexing vortex modes in optical communications using transport-based pattern recognition,” Opt. Express 26, 4004–4022 (2018). [CrossRef]

26. T. Doster and A. Watnik, “Machine learning approach to OAM beam demultiplexing via convolutional neural networks,” Appl. Opt. 56, 3386–3396 (2017). [CrossRef]

27. P. Neary, A. Watnik, K. Judd, J. Lindle, and N. Flann, “Machine learning-based signal degradation models for attenuated underwater optical communication OAM beams,” Opt. Commun. 474, 126058 (2020). [CrossRef]

28. P. Neary, J. Nichols, A. Watnik, K. Judd, G. Rohde, J. Lindle, and N. S. Flann, “Transport-based pattern recognition versus deep neural networks in underwater OAM communications,” J. Opt. Soc. Am. A 38, 954–962 (2021). [CrossRef]

29. S. Avramov-Zamurovic, A. Watnik, J. Lindle, K. P. Judd, and J. Esposito, “Machine learning-aided classification of beams carrying orbital angular momentum propagated in highly turbid water,” J. Opt. Soc. Am. A 37, 1662–1672 (2020). [CrossRef]

30. J. Delpiano, G. Funes, J. Cisternas, S. Galaz, and J. Anguita, “Deep learning for image-based classification of OAM modes in laser beams propagating through convective turbulence,” Proc. SPIE 11133, 36–42 (2019). [CrossRef]

31. S. Avramov-Zamurovic, C. Nelson, and J. M. Esposito, “Effects of underwater optical turbulence on light carrying orbital angular momentum and its classification using machine learning,” J. Mod. Opt. 68, 1041–1053 (2021). [CrossRef]

32. D. Briantcev, M. Cox, A. Trichili, A. Drozdov, B. S. Ooi, and M.-S. Alouini, “Efficient channel modeling of structured light in turbulence using generative adversarial networks,” Opt. Express 30, 7238–7252 (2022). [CrossRef]

33. A. Ragheb, W. Saif, A. Trichili, I. Ashry, M. Esmail, M. Altamimi, A. Almaiman, E. Altubaishi, B. Ooi, M.-S. Alouini, and S. Alshebeili, “Identifying structured light modes in a desert environment using machine learning algorithms,” Opt. Express 28, 9753–9763 (2020). [CrossRef]

34. S. Avramov-Zamurovic, A. Watnik, J. Lindle, and K. Judd, “Designing laser beams carrying OAM for a high-performance underwater communication system,” J. Opt. Soc. Am. A 37, 876–887 (2020). [CrossRef]

35. G. Gbur, Singular Optics, 1st ed. (CRC Press, 2017).

36. T. Shirai, O. Korotkova, and E. Wolf, “A method of generating electromagnetic Gaussian Shell-model Beams,” J. Opt. A 7, 232 (2005). [CrossRef]

37. W. Jarrett, “Machine learning-based design of structured laser light for improved data transfer rate in underwater wireless communication,” Technical Report AD1171853 (Defense Technical Information Center, 2022), available at https://apps.dtic.mil/sti/citations/AD1171853.

38. M. A. Cox and A. V. Drozdov, “Converting a Texas instruments DLP4710 DLP evaluation module into a spatial light modulator,” Appl. Opt. 60, 465–469 (2021). [CrossRef]

39. J. Pinnell, I. Nape, B. Sephton, M. Cox, V. Rodríguez-Fajardo, and A. Forbes, “Modal analysis of structured light with spatial light modulators: a practical tutorial,” J. Opt. Soc. Am. A 37, C146–C160 (2020). [CrossRef]

40. S. Avramov-Zamurovic, J. M. Esposito, and C. Nelson, “Machine learning and OAM communication system tutorial,” https://sites.google.com/usna.edu/usnalasercommunicationproject/menu/tutorial (2022).

41. L. Andrews and R. Phillips, Laser Beam Propagation through Random Media , 2nd ed. (SPIE, 2005).

42. J. D. Schmidt, Numerical Simulation of Optical Wave Propagation with Examples in MATLAB (SPIE, 2010).

43. G. Nootz, S. Matt, A. Kanaev, K. Judd, and W. Hou, “Experimental and numerical study of underwater beam propagation in a Rayleigh–Bénard turbulence tank,” Appl. Opt. 56, 6065–6072 (2017). [CrossRef]

44. G. Nootz, E. Jarosz, F. R. Dalgleish, and W. Hou, “Quantification of optical turbulence in the ocean and its effects on beam propagation,” Appl. Opt. 55, 8813–8820 (2016). [CrossRef]

45. R. Hill, “Optical propagation in turbulent water,” J. Opt. Soc. Am. 68, 1067–1072 (1978). [CrossRef]

46. L. Jolissaint, “Optical turbulence generators for testing astronomical adaptive optics systems: a review and designer guide,” Publ. Astron. Soc. Pacific 118, 1205 (2006). [CrossRef]

47. Y. Ren, H. Huang, G. Xie, N. Ahmed, Y. Yan, B. I. Erkmen, N. Chandrasekaran, M. P. J. Lavery, N. K. Steinhoff, M. Tur, S. Dolinar, M. Neifeld, M. J. Padgett, R. W. Boyd, J. H. Shapiro, and A. E. Willner, “Atmospheric turbulence effects on the performance of a free space optical link employing orbital angular momentum multiplexing,” Opt. Lett. 38, 4062–4065 (2013). [CrossRef]

48. S. Fu and C. Gao, “Influences of atmospheric turbulence effects on the orbital angular momentum spectra of vortex beams,” Photon. Res. 4, B1–B4 (2016). [CrossRef]

49. R. P. Feynman, R. B. Leighton, and R. M. Sands, The Feynman Lectures on Physics: The New Millennium Edition (Basic Books, 2011), Vol. 2.

50. C. Nelson, S. Avramov-Zamurovic, R. Malek-Madani, O. Korotkova, R. Sova, and F. Davidson, “Measurements and comparison of the probability density and covariance functions of laser beam intensity fluctuations in a hot-air turbulence emulator with the maritime atmospheric environment,” Proc. SPIE 8517, 53–64 (2012). [CrossRef]

51. I. Toselli, F. Wang, and O. Korotkova, “Controlled simulation of optical turbulence in a temperature gradient air chamber,” Proc. SPIE 9833, 87–92 (2016). [CrossRef]

52. H. Gamo and A. Majumdar, “Atmospheric turbulence chamber for optical transmission experiment: characterization by thermal method,” Appl. Opt. 17, 3755–3762 (1978). [CrossRef]

53. O. Keskin, L. Jolissaint, and C. Bradley, “Hot-air optical turbulence generator for the testing of adaptive optics systems: principles and characterization,” Appl. Opt. 45, 4888–4897 (2006). [CrossRef]

54. L. Burger, A. Litvin, and A. Forbes, “Simulating atmospheric turbulence using a phase-only spatial light modulator: research article,” South African J. Sci. 104, 129–134 (2008). [CrossRef]

55. I. Toselli, O. Korotkova, X. Xiao, and D. Voelz, “SLM-based laboratory simulations of Kolmogorov and non-Kolmogorov anisotropic turbulence,” Appl. Opt. 54, 4740–4744 (2015). [CrossRef]

56. P. Polynkin, A. Peleg, L. Klein, T. Rhoadarmer, and J. Moloney, “Optimized multiemitter beams for free-space optical communications through turbulent atmosphere,” Opt. Lett. 32, 885–887(2007). [CrossRef]

57. H. Kaushal and G. Kaddoum, “Underwater optical wireless communication,” IEEE Access 4, 1518–1547 (2016). [CrossRef]

58. K. Judd, S. Avramov-Zamurovic, R. A. Handler, A. T. Watnik, J. R. Lindle, J. Esposito, and W. A. Jarrett, “Propagation of laser beams carrying orbital angular momentum through simulated optical turbulence in Rayleigh-Bénard convection,” Proc. SPIE 11860, 1186009 (2021). [CrossRef]

59. D. H. Hubel and T. N. Wiesel, “Receptive fields of cells in striate cortex of very young, visually inexperienced kittens,” J. Neurophysiol. 26, 994–10021963. [CrossRef]

60. K. Fukushima, “Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position,” Biological Cybernetics 36, 193–202 (1980). [CrossRef]

61. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278–2324 (1998). [CrossRef]

62. D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep, big, simple neural nets for handwritten digit recognition,” Neural Comput. 22, 3207–3220 (2010). [CrossRef]

63. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM 60, 84–90 (2017). [CrossRef]

64. C. R. Gonzalez and R. E. Woods, eds., Digital Image Processing, 3rd ed. (Prentice-Hall, 2006).

65. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT, 2016).

66. S. Bianco, R. Cadene, L. Celona, and P. Napoletano, “Benchmark analysis of representative deep neural network architectures,” IEEE Access 6, 64270–64277 (2018). [CrossRef]

67. X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: an extremely efficient convolutional neural network for mobile devices,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018). pp. 6848–6856.

68. J. M. Esposito, S. Avramov-Zamurovic, and C. Nelson, “Benchmarking an ultra-lightweight deep learning architecture for laser-based underwater communication,” in Frontiers in Optics and Laser Science, Optica Technical Digest (Optica, 2021), paper FTu6C3.

69. P. Neary, A. T. Watnik, P. Judd, J. R. Lindle, and N. Flann, “CNN classification architecture study for turbulent free-space and attenuated underwater optical OAM communications,” Appl. Sci. 10, 8782 (2020). [CrossRef]

70. S. Ponomarenko, “A class of partially coherent beams carrying optical vortices,” J. Opt. Soc. Am. A 18, 150–156 (2001). [CrossRef]

71. X. Chen, J. Li, S. Rafsanjani, and O. Korotkova, “Synthesis of I_m-Bessel correlated beams via coherent modes,” Opt. Lett. 43, 3590–3593 (2018). [CrossRef]

72. J. Li, X. Chen, S. McDuffie, M. A. M. Najjar, S. M. H. Rafsanjani, and O. Korotkova, “Mitigation of atmospheric turbulence with random light carrying OAM,” Opt. Commun. 446, 178–1852019. [CrossRef]

73. W. A. Jarrett, S. Avramov-Zamurovic, C. Nelson, J. Esposito, and M. W. Hyde, “Neural network classification of structured light in optical turbulence,” Proc. SPIE 11860, 118600C (2021). [CrossRef]

74. S. Avramov-Zamurovic, C. Nelson, and J. M. Esposito, “Experimentally evaluating beam scintillation and vortex structure as a function of topological charge in underwater optical turbulence,” Opt. Commun. 513, 128079 (2022). [CrossRef]

Classifying beams carrying orbital angular momentum with machine learning: tutorial

Abstract

1. INTRODUCTION

2. RELATED WORK ON ML/AI AND LASER LIGHT CARRYING OAM

A. ML Applications in Optics with Beams Carrying OAM: Research Related to Simulation and Hybrid-Experimental Approaches

B. ML Applications in Optics with Beams Carrying OAM: Research Related to the Full-Experimental Approach

3. DESIGNING EXPERIMENTS IN THE FULL-EXPERIMENTAL APPROACH TO STUDY ML/AI-AIDED LASER COMMUNICATION SYSTEMS

A. Generating Laguerre–Gaussian Beams Carrying OAM

B. Optical Turbulence Characterization

C. General Experimental Design Guidelines

4. CONVOLUTIONAL NEURAL NETWORKS

A. Basic CNN Architecture

B. Training

C. Testing and Evaluation

D. Popular Architectures and Transfer Learning

5. CONCLUSION

APPENDIX A: COMPUTATIONAL RESOURCE REQUIREMENTS FOR CONSTRUCTING DATASETS

APPENDIX B: DETECTION OF OAM BEAMS FOR ML APPLICATIONS—SOME CONSIDERATIONS

1. Label Accuracy

2. Camera Settings

3. Screen Positioning

4. Symbol Cycling to Eliminate Order Effects

Funding

Acknowledgment

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (9)

Equations (6)

Journal of the Optical Society of America A