## Abstract

Comparisons between machine learning and optimal transport-based approaches in classifying images are made in underwater orbital angular momentum (OAM) communications. A model is derived that justifies optimal transport for use in attenuated water environments. OAM pattern demultiplexing is performed using optimal transport and deep neural networks and compared to each other. Additionally, some of the complications introduced by signal attenuation are highlighted. The Radon cumulative distribution transform (R-CDT) is applied to OAM patterns to transform them to a linear subspace. The original OAM images and the R-CDT transformed patterns are used in several classification algorithms, and results are compared. The selected classification algorithms are the nearest subspace algorithm, a shallow convolutional neural network (CNN), and a deep neural network. It is shown that the R-CDT transformed images are more accurate than the original OAM images in pattern classification. Also, the nearest subspace algorithm performs better than the selected CNNs in OAM pattern classification in underwater environments.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. INTRODUCTION

Digital communication demands are continually increasing. Currently it is estimated that only 58% of the world’s population is connected to the Internet [1]. As access continues to be extended to remote areas and usage continues to proliferate, demand for bandwidth will only increase. Difficulties exist in providing access to remote areas, such as costs for laying cables and dealing with difficult obstacles (in water or land) when laying cables. Other concerns related to communications are the problems of security and dealing with eavesdropping with signals [2].

One approach to addressing these issues is to use free-space optical (FSO) communications. FSO provides the benefit of bypassing costly cable installations through geographically difficult regions. Another benefit of FSO is the fact that it is difficult to intercept the communication stream without degrading the signal and alerting the system to tampering attempts. However, security is an ongoing area of interest and study with FSO communications [2].

Orbital angular momentum (OAM) is a property of electromagnetic (EM) waves. In 1992, Ref. [3] was able to show that, in addition to having forward momentum, Laguerre–Gaussian (LG) beams also display orbital angular momentum. Allen *et al.* showed that LG beams travel in a helical pattern about the axis defining the direction of propagation. The LG equation contains an azimuthal dependency expressed as $\exp(- i\ell \phi)$, where $\ell$ is called topologically charged. When $\ell$ is 0, the beam propagates in a standard planar wavefront. When $|\ell | \gt 0$, the EM wave experiences angular momentum and travels in a helically shaped wavefront, and the radius of the EM wave increases with $\ell$. The sign on $\ell$ determines whether the EM helix propagates in a left-handed or right-handed direction.

One of the compelling properties of OAM is that topologically charged modes are orthogonal to each other. Consequently, OAM modes are good candidates for representing independent patterns that can be easily multiplexed/demultiplexed in a communications link. Methods for demultiplexing OAM patterns include adaptive optics [4], conjugate mode sorting [5,6], spiral fringe counting [7], optical transformers [8], Doppler effect measurements [9], dove prism interferometers [10], digital signal processing (DSP)-based multiple-in multiple-out (MIMO) channel equalizers [11], and machine learning (ML) approaches [12,13].

In ideal transmission environments, the beams can be demultiplexed at the receiver without loss or cross talk of any signals. Reference [14] showed that by multiplexing four OAM modes together, over 100 terabit/s data rates could be achieved. This was accomplished over short distances in an ideal environment.

Communications in real-world environments are complicated by varying degrees of attenuation and/or changes in the refractive index of the medium. Attenuation is caused by particles that either absorb or deflect the signal. Turbulence is a frequent cause of variations in the index of refraction and can result in a loss of orthogonality among the modes and create cross talk between channels.

Several approaches have been explored to address these issues. One approach [4] uses a separate non-OAM beam to determine the amount of phase distortion present to pre-compensate transmission of the OAM signals. Other approaches use classification algorithms to recognize the modes [15–18]. Reference [19] used a novel approach applying a Radon cumulative distribution transform (R-CDT) of the received image prior to classification. They found that detection algorithms trained on patterns in the R-CDT space did as well as or better than those based purely on the original OAM pattern images.

An exploration of applying various state-of-the-art CNNs to OAM pattern classification was conducted in Ref. [20]. The cited work compared how different CNNs performed in OAM beam classification in both free-space and underwater domains. This work focuses on ML and optimal transport-based demultiplexing approaches that are based on the problem physics. These approaches help correctly classify OAM modes in the presence of both attenuation and variations in the medium’s optical properties.

The main contribution of this work is establishing R-CDT detection in OAM communications in the underwater domain. Additionally, a model justifying the application of optimal transport to attenuated environments is derived. Another contribution in this work is comparing standard image to R-CDT transformed patterns in image classification. Specifically, we show how the R-CDT performs in this regime in comparison to two CNNs that were used in a work focused on turbulent free-space communications [19]. The remainder of the paper proceeds as follows: background on OAM and optimal transport (Section 2), experiment setup (Section 3), results (Section 4), and final conclusion (Section 5).

## 2. BACKGROUND AND PRIOR ART

#### A. Signal Propagation and Orbital Angular Momentum

The electric field associated with a linearly polarized, monochromatic beam propagating in the $z$-direction is typically modeled as $E(\vec x,z,t) = U(\vec x,z){e^{i(\omega t - {k_0}z)}}\hat {\vec x}$ . Here $\omega = 2\pi c/\lambda = {k_0}c$ is the temporal frequency of oscillation, $c$ is the speed of light, $\lambda$ is the wavelength, and ${k_0}$ is the wavenumber, and the vector $\hat {\vec x}$ encodes the direction of polarization in the plane transverse to the direction of propagation, i.e., $\vec x \equiv \{x,y\}$ in Cartesian coordinates. Rather than writing $\{x,y\}$ to represent the transverse plane in the subsequent derivation, $\vec x$ is used in its place to declutter the equations. In a homogeneous, isotropic medium characterized by refractive index $n(\vec x,z)$, and assuming the rate of transverse variations are slow relative to speed of propagation, the complex amplitude of this electric field can be shown to obey

There are many solutions to Eq. (1), particularly in the so-called “free-space” situation where $n(\vec x,z) = 1$ and the last term in Eq. (1) vanishes [21]. For example, a variety of solutions have been found that exhibit the OAM properties including Bessel [22], Bessel–Gauss [23], Laguerre–Gauss [3], Hermite–Gauss [24], Ince–Gauss [25], and Mathieu–Gauss [26]. The complex amplitude of LG beams, for example, can be represented in cylindrical coordinates as

Different OAM modes are defined by the topological charge or mode number, $\ell$. In practice, the mode number can be controlled through vortex phase plates or spatial light modulators.

#### B. Lagrangian Model and Problem Statement

In a prior work [28], Nichols *et al.* developed image models consistent with the physics expressed in Eq. (1) by first describing the field as the phasor $U({\vec x_0},z) \equiv {\rho ^{1/2}}({\vec x_0},z)\exp [i\psi ({\vec x_0},z)]$. Specifically they were able to show that substituting this description of the electric field into Eq. (1) and neglecting diffraction, the magnitude of the complex electric field $\rho \equiv {U^*}U$ is governed by a transport equation, which can be written (see Appendix A for the model development) as

The classification problem is then as follows: given a received image ${\rho ^{(\ell)}}({\vec x_0},Z)$ that has traveled a distance $Z$ and been corrupted by the medium according to Eq. (3), identify the correct mode number $\ell \in [1,L]$ of the “clean” pattern ${\rho ^{(\ell)}}({\vec x_0},0)$ that was sent. This is precisely the problem considered in Ref. [30]. In the cited work, a new classifier was developed and demonstrated effective in this setting, outperforming a number of standard machine learning and neural net algorithms. We will describe this approach in what follows and then apply the classifier to the underwater communications applications alluded to earlier.

#### C. Classification in the R-CDT Domain

Before describing the classifier, the mathematics underlying the one-dimensional (1D) cumulative distribution transform (CDT) [31], two-dimensional (2D) CDT [32], R-CDT [32], and 2D R-CDT [30] are reviewed.

In one dimension, an alternative formulation of Eq. (3) is

which again states that the total intensity present at $z = 0$ remains unchanged under the coordinate transformation induced by the physics of the problem. The expression in Eq. (5) is closely related to the CDT introduced in Ref. [31]. For strictly positive signals, the CDT is defined in terms of a reference signal ${s_0}(y), y \in {\Omega _s}$ as Choosing the reference domain ${\Omega _s} = [0, 1]$ and ${s_0}(y) = 1/|{\Omega _s}| = 1$, it can be shown that ${\hat \rho _Z}(y)$ becomes the inverse of the cumulative distribution of the recorded signal $\rho ({x_0},Z)$. The CDT ${\hat \rho _Z}(y)$ is an invertible transform defined on the reference signal domain and the pairing $\rho ({x_0},Z); {\hat \rho _Z}(y)$ is, thus, analogous to the more familiar Fourier transform pair.We can use the properties of the CDT to develop a measure of similarity between signals for classification purposes. We write

The CDT and the similarity metric in Eq. (7) was extended to two dimensions in Ref. [32] with help from the Radon transform. The Radon transform of an image $\rho ({\vec x_0},z) \in {\Omega _s} \subset {{\mathbb R}^2}$, is denoted $\tilde {{\rho _0}} = {\cal R}(\rho)$ and is defined by

The Radon transform has a similar influence on the model presented in Eq. (3). Applying the Radon transform, one has that for every $\theta$,

Given these properties, we can now formulate our strategy for solving the classification problem. As in the 1D case presented in Eq. (7), we suggest the metric

Based on the metric suggested in Eq. (11), a classifier can be suggested based on the understanding that a generative model for the R-CDT of a certain known pattern for the $k$th class, denoted as $\tilde \varphi _\theta ^{(k)}(t,z)$, is given by some unknown displacement (in Radon domain) applied to it via

where ${f_\theta}(t,Z)$ is a model approximation for the unknown transport-based distortion imparted on the wave as it travels through a non-uniform medium. Using the composition property stated in Ref. [32], we have that the deformed pattern in R-CDT space can be expressed as $f_\theta ^{- 1}[{\tilde \varphi _\theta ^{(k)}(t,Z),Z}]$. Furthermore, if we assume that $f_\theta ^{- 1}(\cdot , \cdot)$ forms a convex subspace within the set of all possible 1D diffeomorphisms, then an easy solution to the classification problem can be proposed. In Ref. [30], the authors propose a classification method that utilizes the convex subspace assumption above to perform image classification.The principle is to estimate the subspace that models all possible variations of the observed deformed pattern $f_\theta ^{- 1}[{\tilde \varphi _\theta ^{(k)}(t,Z),Z}]$ (in Radon space) by simply estimating a basis for each subspace formed by the training data in that class. The approach taken here is the same as the one proposed in Ref. [30] where the principal component analysis (PCA) technique is used to obtain a basis for the training set of each class. The testing procedure then simply consists of transforming the input test data, computing the least squares distance to each of the trained subspaces, and assessing the class of the test sample to be the same of the closest R-CDT subspace.

In summary, for data modeled by Eq. (3), minimizing the Euclidean norm in the R-CDT domain will give the pattern that required the “least action” to transform into the received image. Thus, one might expect good classification performance in situations such as optical communications where we have just shown the observed data indeed obey such a model.

## 3. EXPERIMENT SETUP

#### A. Hardware

The laser source used in these measurements was a diode-pumped solid state laser (Bright Solutions ONDA 532) that operates at 532 nm and produces 5 ns pulses with 300 µJ/pulse. The laser is externally triggered at 1 kHz using a waveform generator (Agilent 33600A), and the output beam was expanded to fill the aperture of the static vortex phase plates. The output beam was then split into four equal intensity beams, and each beam was normally incident on one of the four phase plates of different charge $[1, 4, -6, -8]$, imprinting a different OAM phase on each beam. The phase plates were fabricated at Clemson University. The beams were then coherently recombined using beam splitters and reflected through a water tank 1.2 m in length. The transmitted intensity patterns were captured by a high performance, fast-frame-rate camera (Photron FASTCAM SA-Z) that was externally triggered at the rate of 1000 frames per second, synchronized with the laser pulses.

To create unique patterns, different combinations of the four beams were coherently combined by blocking one or more beams; binary 1 signifies transmitted, binary 0 signifies blocked. This on/off capability yields binary combinations of each OAM mode. Thus, four phase plates result in ${2^4}$, or 16 unique patterns created from the various combinations of the modes. The binary headings to each image show which OAM modes are enabled and which are off.

The 16 OAM mode patterns, shown in Fig. 1, were manually created by blocking one or more of the beams. The camera spatial resolution is ${{1024}} \times {{1024}}$ pixels with a 20 µm pitch and resolution depth of 12 bits. Five micron polyamid seeding particles (Dantec Dynamics) are added to the water tank, thereby introducing scattering of the beam, resulting in beam attenuation in the forward direction. Beam attenuation is continuously monitored by measuring the transmission of a 15 mW, 532 nm CW laser beam propagating through the tank parallel to the OAM beam. Small pumps agitate the water to ensure that the particles remain in suspension and homogeneously distributed.

For the underwater configuration, the laser beam is split into four different beams using cube beam splitters. They are then passed through phase plates and recombined together with another set of diffraction gratings. When the light source fixture and camera are still, the combined OAM pattern distribution remains stationary. However, when vibrations occur, the pattern rotates around the $z$-axis.

In previous work [17], ferroelectric spatial light modulators (SLMs) were used for generating different phase patterns. Since SLMs are relatively slow, specially fabricated phase plates were created to improve throughput rates, to simplify configuration, and to prepare for eventual use in real environments.

#### B. Data Set

The influence of an attenuating medium on the propagation physics is discussed in Appendix A. In summary, attenuation by scattering or absorption is modeled here by the complex refractive index $n({\textbf{x}}) - i\kappa$ where $\kappa$ is the well-known “extinction coefficient.” This results in the removal of signal intensity as the beam moves in the direction of propagation. In this work, it is assumed that this effect is uniform in the transverse dimension (i.e., $\kappa$ is not a function of ${\vec x_z}$). The resulting signal model is, therefore,

As expected, as the beam propagates, it loses photons at an exponential rate governed by the attenuation length (AL) ${(2\kappa k)^{- 1}}$. The end result of these losses is a lower signal-to-noise ratio (SNR) for the received signal $\rho ({\vec x_z})$. Prior to computing the R-CDT, all images are first normalized to the same intensity to enforce the continuity constraint. Those images collected at longer ALs will, therefore, have their SNR lowered by the exponential factor in Eq. (13).

One thousand images from each of the 16 permutations of the OAM beam set were captured for the analysis, resulting in 16,000 images per data set. The baseline attenuation length of 0 is obtained in clear water. The subsequent attenuation lengths are 4, 8, and 12. Polyamid beads are added to the water to achieve each non-zero attenuation length. The data sets collected at these attenuation lengths are referred to as AL0, AL4, AL8, and AL12.

Water quality rarely stays consistent in natural environments. We aim to test the robustness of the algorithms by combining ALs for training. We then test how well the algorithms perform when presented with images that are more heavily attenuated. The combined sets are referred to as AL0-4, AL0-4-8, and AL0-4-8-12. In creating combined data sets, the total number of images per set is maintained, i.e., 1000 images for each of the 16 OAM beams. In AL0-4, for example, AL0 and AL4 beam patterns are randomly sampled where 50% come from AL0 and 50% come from AL4. This allows computational complexity between training sets to be kept equal.

After combined data sets are created, each data set is randomly sampled and split into 70%/15%/15% for training, validation, and testing. Figure 2 shows six OAM modes with images taken from each AL.

A close inspection of Fig. 2 reveals two characteristics worth discussing. First, a grating pattern is apparent in images with longer ALs. This is a result of the gain being turned up on the camera used for this experiment to account for the signal loss described by Eq. (13). Additionally, the patterns appear to be grainy. This is an artifact of the camera when set to short exposure periods. One of our main goals in the analysis will be to show the influence of this degradation on image classification performance.

The images are cropped and downsampled to capture the patterns of interest and exclude borders that do not contain relevant information, thereby reducing computation time. The clipped images are ${{512}} \times {{512}}$ in size. The images are further downsampled to ${{128}} \times {{128}}$ and are used as the basis for the experiments performed in this paper.

In what follows, we compare classification performance using (1) the raw imagery as described above and (2) imagery that has been first transformed via the R-CDT. Because the R-CDT is actually providing a direct estimate of the physics governing image formation (see Section 2) we expect this approach to accurately model the associated variability for each OAM mode. Different OAM modes are therefore predicted to be more easily identified in the R-CDT space. For details of the R-CDT computations, see [19].

#### C. Experiment Descriptions

An objective of this work is to identify how attenuation affects the performance of OAM beam classification and which, if any, of the algorithms provides the most robust performance. An additional goal is to see if OAM pattern classification in an underwater environment, subjected to attenuation, is similar to the free-space results of Ref. [19] under turbulence.

Reference [19] established that linear discriminant analysis (LDA) with R-CDT provides superior computational training speed and accuracy compared to several deep learning architectures. Recently Ref. [30] provided an updated approach to computing the R-CDT and provided a Python package to implement it easily. The software is available on GitHub [35]. The NS algorithm estimates the linear subspace containing the R-CDT patterns and then performs classification based on those subspace estimates using Eq. (11) [30].

Accuracy, as used in this section, is a simple percentage of the occurrences that the correct pattern is classified. While accuracy, as used this way, may not be always appropriate (for unbalanced data sets for example), it is sufficient in this case as all patterns have an equal numbers of images.

Following the initial approach of Ref. [19], LDA classification was performed on the PCA of 4000 components of features found within the images. Accuracy results and computational speed were then compared to the classifier recently provided by Ref. [30], and improvements in both were observed. Consequently, the NS classification algorithm is used in this work instead of LDA.

Similar to Ref. [19], this work contrasts the NS algorithm (in place of LDA) and two convolutional neural networks (CNNs). For these experiments, images and their corresponding R-CDTs are compared against each other to identify which provides better discrimination.

The shallow CNN, shown in Fig. 3, consists of a CNN layer with sixteen ${{11}} \times {{11}}$ filters configured with a stride of ${{3}} \times {{3}}$. Next is a CNN layer with thirty-two ${{3}} \times {{3}}$ filters with a stride of ${{3}} \times {{3}}$. These layers are followed by a max pool layer, ReLU, and a fully connected layer. Weights in this shallow CNN are initialized according to Ref. [36]. Adam is the selected optimizer [37].

The deep neural network, shown in Fig. 4, is AlexNet [38] and is selected for comparison with Ref. [19]. Both architectures and their corresponding hyperparameters were kept consistent with those established in Ref. [19]. When AlexNet and the shallow CNN are used with ${{128}} \times {{128}}$ images, the input convolutional kernels are ${{11}} \times {{11}}$. When the R-CDT sets are used, the convolutional kernel dimensions are configured as ${{11}} \times {{5}}$.

## 4. RESULTS

This section presents a comparison of R-CDT and image space classification accuracies. The effects of attenuation on classification performance are also presented.

Table 1 shows the side-by-side classification results of R-CDT and OAM image pattern prediction accuracy for three different classification algorithms. The first column shows the training sets that are formed from various combinations of AL0-AL12. The second column shows the NS-based R-CDT results. Columns three and four show results for the shallow CNN, and columns five and six show results for the deep CNN. The shallow and deep CNNs were trained on both the R-CDT image sets as well as the OAM images.

As displayed in column 2, the NS-based R-CDT algorithm outperforms the CNN methods on all combinations of the data sets. The shallow and deep CNN columns show that accuracies using the R-CDT patterns perform as well as or better than the image-based classifier the majority of the time.

It is interesting to note that, overall, the shallow CNN performs slightly better than the deep CNN. Additionally, in the deep CNN the R-CDT patterns are classified less accurately than the original images. We speculate the following reasons for these observations. It has been shown [39] that the accuracy of deep networks tends to degrade after a certain depth is reached. Both observations may be caused by the size of the images in combination with the deep network crossing the threshold where performance begins to degrade. More research would be necessary to explore this point further.

The following test explores and contrasts how well the algorithms perform with limited numbers of training samples. These tests compare accuracies of the NS algorithm trained with R-CDT patterns, a shallow CNN trained with images, and a deep CNN named VGG-11 [40] trained with images. The number of training samples per class are from the set {1, 2, 4, 8, 16, 32, 64, 128, 256, 512}. The previous tests used approximately 700 images per class for training. Each algorithm is trained for three epochs.

Figures 5 and 6 provide curves plotting classification accuracy of each algorithm given a fixed number of samples per class. Both figures show that the R-CDT combined with the NS algorithm is able to quickly learn distinguishing features with a limited number of training examples per class. Results for the other data sets follow the trends presented for AL0 and AL4 as shown in Figs. 5 and 6.

Table 1 shows that the NS algorithm coupled with the R-CDT provides the best results. Each CNN architecture is trained and evaluated with R-CDT patterns. The CNN is then trained and evaluated with images. It is found that the R-CDT, in general, works as well as or outperforms the images-based classification.

Looking closely at Table 1, the R-CDT-based test accuracies are consistently better than OAM image-based test accuracies for both the NS and CNN classification algorithms. Intuitively, it makes sense for the R-CDT space to provide better results than the original OAM image space. By performing the R-CDT, the images are moved into a new space where there is more separation between the classes, as shown in Ref. [19]. Because there is greater separation between the classes in the R-CDT space, the decision boundaries that the classification algorithms learn are easier to differentiate, and classification results are improved over what is achieved in the image space.

Figures 5 and 6 display similar characteristics to those presented in Ref. [30]. The NS algorithm, combined with the R-CDT, obtains better results than CNNs given the same number of training samples per class.

## 5. SUMMARY AND CONCLUSION

OAM beam patterns were collected at four different ALs: AL0, AL4, AL8, and AL12. From these data sets, three new composite sets were created by combining different ALs. These composite data sets are AL0-4, AL0-4-8, and AL0-4-8-12.

These data sets were used to evaluate how well the training in the R-CDT space compared to training in image space. These sets were also used to explore how well different classification algorithms perform relative to each other. Concretely, the algorithms selected were the NS R-CDT classifier, a shallow CNN, and a deep CNN.

This work first set out to establish the use of the R-CDT based classifier in attenuated underwater environments. Specifically, we set out to see if classification gains seen by the R-CDT over image patterns in free space extended to attenuated environments in water. It was found that, in the underwater environment, classification in the R-CDT space is more accurate than working in the raw image space. This outcome is perhaps expected given that the R-CDT is directly capturing the physics of turbulence-induced morphing to the images. This outcome also correlates with what was observed in free space as reported in Ref. [19] and in Ref. [30].

Several areas exist for further exploration. In this work, a negligible amount of turbulence was present in the underwater environment, so a direct study of turbulence in water versus free space (as presented in Ref. [19]) was not possible. Future work can perform a direct comparison of turbulence in water to free-space turbulence. Additionally, in future work it will be important to allow both turbulence and attenuation to vary simultaneously, to gain a full picture of the best algorithms for classification. Another interesting direction of research is to explore other transforms, to see what additional gains can be made in classification accuracy in the presence of signal degrading conditions.

## APPENDIX A: IMAGE MODEL IN LAGRANGIAN COORDINATES FOR ATTENUATED BEAMS

Representing the complex amplitude as $U({\vec x_0},z) = {\rho ^{1/2}}({\vec x_0},z){e^{i[\psi ({{\vec x}_0},z)]}}$ and substituting into Eq. (1) yields a complex equation in $\rho , \psi$. The imaginary and real portions are, respectively,

Both expressions can be simplified to some extent with a change to Lagrangian coordinates. Let ${\vec x_z} \equiv \vec f({\vec x_0},z)$ be a vector function of the coordinates ${\vec x_0},z$. In this coordinate system, material derivatives, e.g., ${\partial _z}\rho ({\vec x_0},z) + {\nabla _X}\rho ({\vec x_0},z)\vec u({\vec x_0},z)$, become ordinary derivatives $d\rho ({\vec x_z})/dz$. The velocity of the field then becomes simply $\vec u({\vec x_0},z) = d{\vec x_z}/dz$. In this formulation, the expressions in Eq. (A1) become

The above analysis presumes zero losses during signal propagation. However, assuming a non-zero extinction coefficient, the refractive index can be written $n(\vec x,z) - i\kappa$ with extinction coefficient $\kappa$. The governing equations, in Lagrangian coordinates, then become

## APPENDIX B: RELATING ${f_\theta}(t)$ TO $\vec f({\vec x_0},z)$

In this section, we relate the problem physics, as encoded in the Lagrangian mapping $\vec f({\vec x_0},z)$, to the corresponding function(s) in the Radon transform domain. Taking the Radon transform of both sides of Eq. (A1a) yields [44]

## Funding

U.S. Naval Research Laboratory (6.1, 6.2); National Institutes of Health (GM130825).

## Acknowledgment

We would like to acknowledge support for this project from the U.S. Naval Research Laboratory through a 6.1 base program for orbital angular momentum and a 6.2 base program for underwater communications. G. K. R. was supported in part by NIH award GM130825. Thank you to the research group of Eric Johnson at Clemson University for fabrication of the vortex phase plates.

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request. Source code for implementing the image transforms and classifiers used in this paper is available at [45].

## REFERENCES

**1. **Statista, “Global digital population as of January 2021,” 2021 (see https://www.statista.com/statistics/617136/digital-population-worldwide/).

**2. **Y. Zhao, J. Li, X. Zhong, and H. Shi, “Physical-layer security in fractional orbital angular momentum multiplexing under atmospheric turbulence channel,” Opt. Express **27**, 23751–23762 (2019). [CrossRef]

**3. **L. Allen, M. W. Beijersbergen, R. J. C. Spreeuw, and J. P. Woerdman, “Orbital angular momentum of light and the transformation of Laguerre-Gaussian laser modes,” Phys. Rev. A **45**, 8185–8189 (1992). [CrossRef]

**4. **Y. Ren, G. Xie, H. Huang, C. Bao, Y. Yan, N. Ahmed, M. P. J. Lavery, B. I. Erkmen, S. Dolinar, M. Tur, M. A. Neifeld, M. J. Padgett, R. W. Boyd, J. H. Shapiro, and A. E. Willner, “Adaptive optics compensation of multiple orbital angular momentum beams propagating through emulated atmospheric turbulence,” Opt. Lett. **39**, 2845–2848 (2014). [CrossRef]

**5. **G. Gibson, J. Courtial, M. J. Padgett, M. Vasnetsov, V. Pas’ko, S. M. Barnett, and S. Franke-Arnold, “Free-space information transfer using light beams carrying orbital angular momentum,” Opt. Express **12**, 5448–5456 (2004). [CrossRef]

**6. **A. Mair, A. Vaziri, G. Weihs, and A. Zeilinger, “Entanglement of the orbital angular momentum states of photons,” Nature **412**, 313–316 (2001). [CrossRef]

**7. **M. S. Soskin, V. N. Gorshkov, M. V. Vasnetsov, J. T. Malos, and N. R. Heckenberg, “Topological charge and angular momentum of light beams carrying optical vortices,” Phys. Rev. A **56**, 4064–4075 (1997). [CrossRef]

**8. **M. P. J. Lavery, G. C. G. Berkhout, J. Courtial, and M. J. Padgett, “Measurement of the light orbital angular momentum spectrum using an optical geometric transformation,” J. Opt. **13**, 064006 (2011). [CrossRef]

**9. **M. P. J. Lavery, F. C. Speirits, S. M. Barnett, and M. J. Padgett, “Detection of a spinning object using light’s orbital angular momentum,” Science **341**, 537–540 (2013). [CrossRef]

**10. **J. Leach, M. J. Padgett, S. M. Barnett, S. Franke-Arnold, and J. Courtial, “Measuring the orbital angular momentum of a single photon,” Phys. Rev. Lett. **88**, 257901 (2002). [CrossRef]

**11. **Y. Ren, Z. Wang, G. Xie, L. Li, A. J. Willner, Y. Cao, Z. Zhao, Y. Yan, N. Ahmed, N. Ashrafi, S. Ashrafi, R. Bock, M. Tur, and A. E. Willner, “Atmospheric turbulence mitigation in an OAM-based MIMO free-space optical link using spatial diversity combined with MIMO equalization,” Opt. Lett. **41**, 2406–2409 (2016). [CrossRef]

**12. **X. Cui, X. Yin, H. Chang, H. Liao, X. Chen, X. Xin, and Y. Wang, “Experimental study of machine-learning-based orbital angular momentum shift keying decoders in optical underwater channels,” Opt. Commun. **452**, 116–123 (2019). [CrossRef]

**13. **S. Lohani, E. M. Knutson, M. O’Donnell, S. D. Huver, and R. T. Glasser, “On the use of deep neural networks in optical communications,” Appl. Opt. **57**, 4180–4190 (2018). [CrossRef]

**14. **H. Huang, G. Xie, Y. Yan, N. Ahmed, Y. Ren, Y. Yue, D. Rogawski, M. J. Willner, B. I. Erkmen, K. M. Birnbaum, S. J. Dolinar, M. P. J. Lavery, M. J. Padgett, M. Tur, and A. E. Willner, “100 Tbit/s free-space data link enabled by three-dimensional multiplexing of orbital angular momentum, polarization, and wavelength,” Opt. Lett. **39**, 197–200 (2014). [CrossRef]

**15. **M. Krenn, R. Fickler, M. Fink, J. Handsteiner, M. Malik, T. Scheidl, R. Ursin, and A. Zeilinger, “Communication with spatially modulated light through turbulent air across Vienna,” New J. Phys. **16**, 113028 (2014). [CrossRef]

**16. **E. M. Knutson, S. Lohani, O. Danaci, S. D. Huver, and R. T. Glasser, “Deep learning as a tool to distinguish between high orbital angular momentum optical modes,” Proc. SPIE **9970**, 236–242 (2016). [CrossRef]

**17. **T. Doster and A. T. Watnik, “Machine learning approach to OAM beam demultiplexing via convolutional neural networks,” Appl. Opt. **56**, 3386–3396 (2017). [CrossRef]

**18. **S. Li and J. Wang, “Adaptive free-space optical communications through turbulence using self-healing Bessel beams,” Sci. Rep. **7**, 43233 (2017). [CrossRef]

**19. **S. R. Park, L. Cattell, J. M. Nichols, A. Watnik, T. Doster, and G. K. Rohde, “De-multiplexing vortex modes in optical communications using transport-based pattern recognition,” Opt. Express **26**, 4004–4022 (2018). [CrossRef]

**20. **P. L. Neary, A. T. Watnik, K. P. Judd, J. R. Lindle, and N. S. Flann, “CNN classification architecture study for turbulent free-space and attenuated underwater optical OAM communications,” Appl. Sci. **10**, 8782 (2020). [CrossRef]

**21. **U. Levy, S. A. Derevyanko, and Y. R. Silberberg, “Light modes of free space,” Prog. Opt. **61**, 237–281 (2016). [CrossRef]

**22. **J. Durnin, J. J. Miceli, and J. H. Eberly, “Diffraction-free beams,” Phys. Rev. Lett. **58**, 1499–1501 (1987). [CrossRef]

**23. **F. Gori, G. Guattari, and C. Padovani, “Bessel-Gauss beams,” Opt. Commun. **64**, 491–495 (1987). [CrossRef]

**24. **A. E. Siegman, “Hermite–Gaussian functions of complex argument as optical-beam eigenfunctions,” J. Opt. Soc. Am. **63**, 1093–1094 (1973). [CrossRef]

**25. **M. A. Bandres and J. C. Gutiérrez-Vega, “Ince–Gaussian beams,” Opt. Lett. **29**, 144–146 (2004). [CrossRef]

**26. **J. C. Gutiérrez-Vega, M. D. Iturbe-Castillo, and S. Chávez-Cerda, “Alternative formulation for invariant optical fields: Mathieu beams,” Opt. Lett. **25**, 1493–1495 (2000). [CrossRef]

**27. **W. Paufler, B. Böning, and S. Fritzsche, “High harmonic generation with Laguerre–Gaussian beams,” J. Opt. **21**, 094001 (2019). [CrossRef]

**28. **J. M. Nichols, T. H. Emerson, L. Cattell, S. Park, A. Kanaev, F. Bucholtz, A. Watnik, T. Doster, and G. K. Rohde, “Transport-based model for turbulence-corrupted imagery,” Appl. Opt. **57**, 4524–4536 (2018). [CrossRef]

**29. **J. M. Nichols, A. T. Watnik, T. Doster, S. Park, A. Kanaev, L. Cattell, and G. K. Rohde, “An optimal transport model for imaging in atmospheric turbulence,” arXiv:1705.01050 (2017).

**30. **M. S.-E. Rabbi, X. Yin, A. H. M. Rubaiyat, S. K. S. Li, A. Aldroubi, J. M. Nichols, and G. K. Rohde, “Radon cumulative distribution transform subspace modeling for image classification,” arXiv:2004.03669 (2020).

**31. **S. R. Park, S. Kolouri, S. Kundu, and G. K. Rohde, “The cumulative distribution transform and linear pattern classification,” Appl. Comput. Harmon. Anal. **45**, 616–641 (2018). [CrossRef]

**32. **S. Kolouri, S. Park, and G. Rohde, “The Radon cumulative distribution transform and its application to image classification,” IEEE Trans. Image Process. **25**, 920–934 (2016). [CrossRef]

**33. **J. M. Nichols, M. N. Hutchinson, N. Menkart, G. A. Cranch, and G. K. Rohde, “Time delay estimation via Wasserstein distance minimization,” IEEE Signal Process. Lett. **26**, 908–912 (2019). [CrossRef]

**34. **S. Kolouri, Y. Zou, and G. K. Rohde, “Sliced Wasserstein Kernels for probability distributions,” in *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)* (2016), pp. 5258–5267.

**35. **M. S.-E. Rabbi, X. Yin, A. H. M. Rubaiyat, S. K. S. Li, A. Aldroubi, J. M. Nichols, and G. K. Rohde, “Radon cumulative distribution transform subspace models for image classification,” 2020 (see https://github.com/rohdelab/rcdt_ns_classifier/).

**36. **X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in *Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics*, Y. W. Teh and M. Titterington, eds., Vol. 9 of Proceedings of Machine Learning Research (PMLR, 2010), pp. 249–256.

**37. **D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980 (2014).

**38. **A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in *Proceedings of the 25th International Conference on Neural Information Processing Systems ( NIPS)* (Curran Associates, 2012), Vol. 1, pp. 1097–1105.

**39. **K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385 (2015).

**40. **K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556 (2014).

**41. **J. M. Nichols, T. H. Emerson, and G. K. Rohde, “A transport model for broadening of a linearly polarized, coherent beam due to inhomogeneities in a turbulent atmosphere,” J. Mod. Opt. **66**, 835–849 (2019). [CrossRef]

**42. **V. I. Klyatskin and D. Gurarie, “Coherent phenomena in stochastic dynamical systems,” Phys. Usp. **42**, 165–198 (1999). [CrossRef]

**43. **F. Flandoli, M. Gubinelli, and E. Priola, “Well-posedness of the transport equation by stochastic perturbation,” Invent. Math. **180**, 1–53 (2010). [CrossRef]

**44. **P. Milanfar, “A model of the effect of image motion in the radon transform domain,” IEEE Trans. Image Process. **8**, 1276–1281 (1999). [CrossRef]

**45. **Rohde Lab, “Python Transport Based Signal Processing Toolkit,” GitHub, 2021, https://github.com/rohdelab/PyTransKit.