Single-shot wavefront sensing with deep neural networks for free-space optical communications

Minghao Wang; Minghao Wang; Wen Guo; Xiuhua Yuan

doi:10.1364/OE.412929

1. Introduction

The link performance of free-space optical communication (FSOC) systems is limited by the phase distortions accumulated on the wavefront of the laser beam when it propagates through turbulent atmosphere [1]. Since it is not possible to directly manipulate the atmospheric path, current efforts to improve the quality of the physical channel mainly focus on optimizing the design of the optical transceiver. For instance, partially coherent beams and MIMO optics can improve the transmission quality of the signal beam [2–4], while adaptive optics (AO) attempts to correct the phase aberrations before focusing the beam to the photodetector [5,6]. Wavefront sensing is the supporting technique for AO, for determining the wavefront is prerequisite for compensation [7,8].

A common approach to detect the phase of the incident optical wave is to use the Shack-Hartmann wavefront sensor [9]. It measures the deviation of the individual centroids imaged by a micro-lens array, from which the local phase slopes are deduced to reconstruct the incoming wavefront. There are also other types of wavefront sensing hardware, unfortunately most of them are expensive and tricky to deploy [7].

To reduce the cost and bulk of the AO module, image-based methods are developed to retrieve the phase information from intensity measurements [8,10]. The famous Gerchberg-Saxton (G-S) algorithm repeatedly transforms the optical field between the object domain and the Fourier domain while replacing the amplitude in the Fourier domain with the measured amplitude, until their difference meets a defined lower threshold [11]. Parametric algorithms such as stochastic parallel gradient descent (SPDG) optimize a series of parameters that represent the aberrated wavefront to minimize a certain cost function [12]. Apart from the requirement of a close-enough starting guess, these conventional algorithms typically involve multiple iterations to create one output and are thus too slow to apply in on-line tasks [13,14].

Neural networks are a class of layered nonlinear models that can be regarded as powerful function fitting tools [15,16]. They can be trained to learn the underlying complex mapping between the input data and the desired output, for which millions of trainable parameters (also known as “weights”) are updated to optimize a defined target (loss minimization or reward maximization), usually in a gradient descent manner via backpropagation. Whereas it takes a huge amount of data and a lot of time to train deep neural networks, the inference of a neural network usually is very fast, which is favorable for real-time wavefront sensing. The application of neural networks in image-based wavefront sensing trace back to the work with the space telescopes in the early 1990s [17–19]. Back then the models in use were three-layer perceptrons, taking flattened pixel values of the intensity images as input and regressing the corresponding Zernike coefficients, which is an older form of neural networks from the modern perspective. A more recent application of the perceptron is carried out by extracting the Chebyshev moments of the point spread function (PSF) as a 1-D vector input [20].

Although in principle fully connected networks with only one hidden layer can be as capable as any other architecture regarding feature extraction, it is usually more efficient to construct a deeper network (i.e., composed of more layers) with much less neurons in each intermediate layer. To this end, convolutional neural networks (CNNs) are proposed to shrink the amount of parameters and to further exploit the spatial correlation among adjacent pixels, achieving state-of-the-art performance in image-related tasks such as classification and object detection [21]. In light of this, the majority of recent attempts to solve the phase retrieval problem have taken advantage of various CNN architectures including LeNet [22], AlexNet [23,24], ResNet [25,26], Inception [14,27] and Xception [28]. In [14] the wavefront predicted by the CNN is used to generate a better starting guess for nonlinear optimization algorithms, greatly reducing the residual wavefront error under large aberrations. End-to-end frameworks that directly regress the wavefront from intensity measurements is not only more desirable but also possible, which fall into two categories: one that is based on phase diversity [18–20,23] and the other only a single frame [14,24,28,29].

Phase diversity, making use of both an in-focus image and an out-of-focus image, is introduced because mathematically, at least two intensity images are needed to avoid the non-uniqueness in phase retrieval [30,31]. A viable implementation of phase diversity is shown in the dashed box in Fig. 1. In [22] and [27], networks trained in the phase diversity manner have reported higher accuracy than those trained with a single frame. However, a second intensity frame is not indispensable in a deep learning approach, because the training process forces the output of the neural network to approximate the ground truth wavefront, which is usually a smooth phase map. This implicit constraint of smoothness may help eliminate the ambiguity in single-frame phase reconstruction. Therefore, some neural network models trained with single-frame PSF data can work, although their estimations are not so accurate [24,29]. In FSOC, phase ambiguity is further reduced as compared to the imaging problem, because the shape of the source beam is commonly Gaussian-shaped and not varying with time, eliminating the difficulty for the algorithm to decide whether some certain intensity pattern results from phase distortions or changes in the source. Therefore, it is possible to precisely recovery the wavefront through a single shot of the intensity image in FSOC.

Fig. 1. Schematic of the adaptive optics setup based on single-shot wavefront sensing at the FSOC receiver. L₁, L₂ and L_f: lenses. DM: deformable mirror. BS: beam splitter. PD: photodetector. The portion of light used for wavefront sensing is colored in blue. An alternative phase diversity arm is also depicted in the dashed box.

Download Full Size | PDF

In this study, we investigate a single-frame wavefront sensing solution for FSOC exploiting a single image of the pupil plane intensity pattern, which can be captured by the camera in an optical setup shown in Fig. 1. As for the reconstruction algorithm, a popular deep neural network architecture named EfficientNet was chosen and trained in the supervised manner to regress the Zernike coefficients of the wavefront of the incident beam.

The EfficientNet architecture is depicted in Fig. 2. The baseline network for the EfficientNet family, denoted as EfficientNet-B0, is acquired by neural architecture search (NAS) among connections of inverted bottleneck residual blocks with squeeze-and-excitation (SE) layer [32]. The inverted bottleneck residual building block features a shortcut residual connection. It expands the input channels rather than suppressing them before the depthwise separable convolution, thus the name inverted bottleneck [33]. Besides, the inserted squeeze-and-excitation layer actively learns an optimized weighting strategy among channels so that the extracted features can be better combined [34]. These techniques lead to a highly efficient network. With 78% less parameters, EfficientNet-B0 achieves higher accuracy than ResNet-50 on ImageNet. The same will prove to be true for the task of wavefront sensing in our case. Larger versions of EfficientNet, i.e., B0∼B7, are scaled upon EfficientNet-B0 by expanding the length (layers in each of the 7 network stages), width (feature channels in each layer) and/or resolution of the network.

Fig. 2. Schematic diagram of the EfficientNet structure. There are 7 stages, each of which contains multiple building blocks, i.e., the inverted bottleneck residual block with squeeze-and-excitation. The connections among building blocks within each stage are optimized via NAS to get the EfficientNet-B0 baseline. B1∼B7 are then acquired by compound scaling upon B0.

Download Full Size | PDF

Different from previous studies, the pupil plane data in this paper were not generated on the basis of Noll’s work [35]. Instead, we used wave optics propagation to obtain the optical field at the pupil plane, simulating the Fresnel diffraction process of an extended Gaussian beam aberrated by a 2-D Kolmogorov turbulence phase screen [36]. To accommodate for a wide range of practical FSOC link parameters, the strength of the turbulence, the propagation distance, and the size of the Gaussian beam source were all allowed to vary when generating the datasets. As a result of the high degree of intrinsic diversity introduced in this way, a dataset containing only 20,000 samples were able to train the network sufficiently.

This paper proceeds in Section 2 to explain the scheme of single-shot wavefront sensing and how the pupil plane phase distortion is related to the FSOC link parameters. In Section 3 the preparation of the datasets is detailed together with some training tricks that we accumulated. Also, the choice of neural network models and the appropriate size for the dataset are evaluated. The trained models are tested as the turbulence strength varies, in the presence of different levels of noise. Besides, the performance of an EfficientNet model trained with phase diversity data is analyzed for comparison. Furthermore, an experimental demonstration is set up to validate the proposed method.

2. Method

The essence of applying adaptive optics in the FSOC receiver is to detect the wavefront distortion and to compensate it by adding a conjugate phase before focusing the beam onto the photodetector. For the image-based solution, a camera is used to capture the intensity pattern I(r) of the incident optical field U(r) with some aberrated phase φ(r):

(1)$$I({\textbf r}) = {|{U({\textbf r})} |^2} = {|{A({\textbf r})\textrm{exp} (i\varphi ({\textbf r})} |^2} = {|{A({\textbf r})} |^2},$$

where r = (ρ, θ) is the position vector in the lateral polar coordinates at the pupil plane and A(r) is the amplitude of the field. Generally, the wavefront aberration within a circular aperture can be decomposed into the sum of a series of Zernike polynomials:

(2)$$\varphi ({\textbf r}) = \sum\limits_{j = 1}^N {{c_j}{Z_j}(\rho ,\theta )} = {{\textbf C}_Z}\cdot {\textbf Z},$$

where Z = [Z₁, Z₂, …, Z_N] is the first N Zernike modes that are under consideration, and C_Z = [c₁, c₂, …, c_N] is the corresponding coefficients. Now the task of wavefront sensing is to retrieve φ(r) from I(r). In other words, that is to compute

(3)$$\mathbf{C}_{Z}^{\prime}=f\{I(\mathbf{r})\}=\left.\underset{\mathbf{C}_{Z}^{\prime}}{\arg \min }\left[\mathbf{C}_{Z}^{\prime} \cdot \mathbf{Z}-\varphi(\mathbf{r})\right]\right|_{I(\mathbf{r})},$$

where $\textbf{C}_Z^{\prime}$ is the estimated Zernike coefficients of the incident wavefront, f is the phase retrieval algorithm, and arg min is argument of the minimum, which represents the process of optimizing $\textbf{C}_Z^{\prime}$ to minimize the gap between the estimated wavefront and the true wavefront. Indeed, the optimization of $\textbf{C}_Z^{\prime}$ is done essentially via the optimization of model parameters.

Figure 1 is the schematic of a specific form of single-shot AO setup at the FSOC receiving end. The incident beam is scaled down by the telescope comprised of L₁ and L₂. A small fraction of the light power is split by the beam splitter (BS) to get to the CCD, the output of which is used to deduce the wavefront distortion. Once the wavefront is recovered, compensation may be done by the deformable mirror (DM) or a spatial light modulator, and the reflected beam carrying some residual errors is passed and focused onto the photodetector (PD). Note that in this setup, there is only one CCD camera that images the pupil plane intensity, while a phase diversity setup can be realized by replacing the lower arm with an focusing arm as is depicted in the dashed box.

The wavefront distortion is assumed to be caused by Kolmogorov turbulence along the propagation path. In order to relate to the link parameters of FSOC, we generated the aberrated optical field using Fresnel diffraction from the source plane to the pupil plane rather than following Noll’s residual variance to create phase aberrations directly in the pupil aperture [35]. Specifically, the turbulence induced phase distortion is modeled with a 2-D phase screen at the source plane, which is generated by multiplying each component of the power spectral density (PSD) of the reflective-index fluctuations by a random normal number and then transforming the result to the spatial domain [36]:

(4)$$\psi(\mathbf{r})=\mathcal{F}^{-1}\left\{R(\mathbf{f}) \Phi_{\psi}^{1 / 2}(\mathbf{f})\right\},$$

where f is the spatial-frequency vector, ${\cal{F}}$ ⁻¹ denotes an inverse Fourier transform, R(f) is a Gaussian random matrix of zero-mean and unit-variance, and Φ_ψ(f) is the phase PSD, which takes the form

(5)$${\Phi _\psi }({\textbf f}) \approx 0.0097kC_n^2{D_Z}{|{\textbf f} |^{ - 11/3}},$$

in which f = |f|, k = 2π/λ is the optical wave number, λ is the wavelength, D_Z is the propagation distance, and $C_n^2$ is the refractive-index structure constant (in units of m^−2/3) that reflects the strength of turbulence. For the plane wave, link parameters are related to the Fried parameter by [1]:

(6)$${r_0} = {(0.423{k^2}C_n^2{D_Z})^{ - 3/5}}.$$

For the Gaussian beam wave, the expression for r₀ is adjusted by a propagation-related term, which can be found in [1] (Chapter 6).

The final step to obtain the optical field at the receiver pupil plane is to calculate the Fresnel diffraction of the coherent source U₀(r) imposed with the turbulence phase screen ψ(r) and propagates for a distance of D_Z, which can be written as

(7)$$U({\textbf r}) = \frac{{\textrm{exp} (ik{D_Z})}}{{i\lambda {D_Z}}}\int {{U_0}({\textbf r}^{\prime})} \textrm{exp} [{i\psi ({\textbf r}^{\prime})} ]\textrm{exp} \left( {\frac{{ik}}{{2{D_Z}}}{{|{{\textbf r} - {\textbf r}^{\prime}} |}^2}} \right)\textrm{d}{\textbf r}^{\prime},$$

where r’ is the position vector at the source plane, U₀(r’) = exp(-|r'|²/$\omega_0^2$) is the collimated Gaussian source field with ω₀ being the beam waist. In practice, fast Fourier transform (FFT) can be used to calculate the integral in Eq. (7):

(8)$$U(\mathbf{r})=\mathcal{F}^{-1}\left\{\mathcal{F}\left\{U_{0}(\mathbf{r})\right\} \exp \left(-i \pi \lambda D_{z} \mathbf{f}\right)\right\},$$

where the constant phase and scaling term related to D_Z have been dropped. With the procedures detailed above, the intensity pattern I(r) and the corresponding wavefront φ(r) are obtained, which act as the input and the ground truth for the phase retrieval algorithm f. An example of I(r) and φ(r) is shown in Figs. 3(a) and 3(b), respectively.

Fig. 3. An example of the simulated pupil plane data. (a) Intensity pattern. (b) Wavefront phase. (c) The first 36 Zernike coefficients corresponding to the phase in (b), and the estimation by the proposed method. (d) The residual phase errors corresponding to (b) after compensation with the estimated Zernike modes given in (c).

Download Full Size | PDF

In this paper, EfficientNet is trained to fit and generalize the complex mapping between I(r) and φ(r) (in the form of C_Z). Its fully connected output layer is composed of 66 neurons representing the first 66 Zernike coefficients. Although only one intensity frame is used, we have preserved all the three input channels so that the weights pretrained on ImageNet can be loaded to the model. Specifically, the data is fed to the network via the first input channel, while the other two input channels are padded with zeros. It will be demonstrated later that fine-tuning upon the pretrained weights is more beneficial than training from random initializations.

3. Results and discussion

3.1. Datasets

In this section, we will describe how the datasets are created, including the training set, the validation (val) set and the test set. As stated in Section 2, the optical field at the receiving pupil, from which a sample of [intensity frame I(r), Zernike coefficient C_Z] can be computed, is the result of free-space Fresnel diffraction of the source beam field U₀(r) imposed with a 2-D phase screen ψ(r). This process is illustrated in Fig. 4(a), and a random realization of the turbulence phase screen is shown in Fig. 4(b).

Fig. 4. Generation of the datasets. (a) Fresnel diffraction from the source to the receiver pupil. (b) A realization of the 2-D turbulence phase screen applied to the Gaussian source. (c) Standard deviation of the Zernike coefficients in the datasets.

Download Full Size | PDF

The wavelength λ in simulation was chosen to be 632 nm, while the propagation distance D_Z, the turbulence strength $C_n^2$ and the beam size ω₀ were all allowed to vary. In this study, D_Z was randomly taken from 600 m to 1 km, and $C_n^2$∈[1×10⁻¹⁶ m^−2/3, 1×10⁻¹⁴ m^−2/3], ω₀ ∈[0.1 m, 0.2 m]. The diameter of the receiver pupil D is assumed to be 0.256 m, which, under the link parameters specified earlier, is always smaller than the lateral extent of the incident beam. And the resultant ratio of D/r₀ will range from 0.5 to 10, indicating weak to strong aberrations. In effect, the wavefront error induced at the pupil plane can be as large as 6π rad. The standard deviation (Std) of the Zernike coefficients in the datasets are provided in Fig. 4(c), where it is obvious that serious tip and tilt (mode # 2 and #3) are introduced, posing a tough challenge for the retrieval algorithm.

Data generation was implemented in MATLAB. The sampling interval for wave optics simulation is 1 mm, and the grid size is 512×512, but after propagation, only the central 256×256 pixels are cut out for use. The phase obtained by propagation simulation is first decomposed into Zernike polynomials, and modes of order higher than 66 are excluded. In other words, the wavefront reconstructed by the first 66 Zernike modes are used as the ground truth. For easier learning, the absolute phase at the center of the pupil is biased to be zero, which will act as an anchor so that spatially, the neural network would always know where to start the estimation.

For the training set, 20,000 samples were generated, and the validation set contains 2,000 such samples. To test the trained neural network models, a general-purpose test set is created the same way as the training set and the validation set. Moreover, to further examine the influence of the turbulence strength, 10 separate test sets with varied ${C_{n}^{2}}$ but constant D_Z (800 m) and ω₀ (0.15 m) were also prepared, each containing 1,000 samples.

3.2. Training the model

The EfficientNets are powerful feature extractors and they are quite easy to train as compared to earlier counterparts. In this section, we will show that the simplest version of the family, EfficientNet-B0, is already capable of achieving decent wavefront estimation accuracy.

During training, mean square error (MSE) was chosen as the loss function, which can be written as

(9)$$Loss\textrm{ = }\frac{1}{N}\sum\limits_{j = 1}^N {{{({{c_j} - c_j^{\prime}} )}^2}} ,$$

where c_j and $c_j^{\prime}$ are the real and estimated coefficients of the jth Zernike polynomial, respectively. N = 66 is the number of Zernike coefficients to estimate. For validation and testing, we chose pixelwise mean absolute error (MAE) as the metric function, in units of radians, to reflect the wavefront deviation intuitively. Thus, the wavefront should be reconstructed with the estimated Zernike coefficients to compute the MAE, which is

(10)$$Metric\textrm{ = }\frac{1}{M}\sum\limits_{i = 1}^M {{{|{{x_i} - x_i^{\prime}} |}^2}} ,$$

where x_i and $x_i^{\prime}$ are the ground truth and the reconstructed value of the ith pixel on the wavefront pattern, M is the number of pixels within the aperture area.

As is known, the choice of optimizers has a great impact on the final convergence. After some trial and error, we have come up with a two-phase training strategy. In Phase 1, the model was trained for 100 epochs with the SDG optimizer at a learning rate of 0.001. In Phase 2, the model was recompiled with the Adam optimizer and trained for another 140 epochs. The learning rate was 0.001 for the first 100 epochs of Phase 2, then decreased to 0.0001 for the last 40 epochs. The learning rates was determined with the help of the ReduceLROnPlateau callback provided in Keras. The batch size was set at 16, and 50 dB additive Gaussian noise (relative to the average power) was randomly applied to half of the frames in a batch. The training was implemented in Keras 2.2.4 with TensorFlow backend on a TITAN RTX GPU with 24 GB VRAM.

Table 1 provides a comparison of the training results of EfficientNet-B0, ResNet-50 and Inception-V3. We can see that EfficientNet-B0 fits the training data very well and achieves the lowest wavefront estimation error on the test set. This is because EfficientNet is more efficient in feature extraction owing to its optimized architecture. Also note that EfficientNet-B0 contains much fewer parameters and thereby consuming less VRAM than its counterparts.

Fine-tuning upon a pretrained model with domain-specific data, also known as transfer learning, is a common approach in vision-related deep learning, for it will save the labor of warming-up a huge network from scratch. Although the task of wavefront sensing seems to have little in common with detecting or classifying everyday objects, we have found that loading EfficientNet-B0 with the weights pretrained on ImageNet will significantly benefit the estimation accuracy. As shown in Table 2, fine-tuning would reduce the test MAE by 60% when compared with training from randomly initialized weights. The reason for this improvement may be that pretraining on ImageNet endows the model with the ability to generalize better, which serves as a reasonably good initialization even for tasks with rather distinct features, like wavefront sensing that is of concern here.

Table 1. Model features and residual wavefront reconstruction error of the three neural networks

View Table | View all tables in this article

Table 2. Performance improvement by training upon weights on ImageNet^a

View Table | View all tables in this article

Besides, the number of samples needed for training was examined. Three training sets of sizes 10k, 20k and 50k, generated in the same manner, were used to train the EfficientNet, and the results are compared in Fig. 5. The model fitted well on the 10k training set, giving the lowest MAE during training. However, to produce more accurate predictions on the validation set and the test set, a larger training set containing 20k samples is necessary. For a yet larger 50k training set, the performance barely improved.

Fig. 5. Residual wavefront MAE of EfficientNet-B0 trained on datasets of different sizes.

Download Full Size | PDF

A comparison among EfficientNet-B0, ResNet-50 and Inception-V3 is presented in Fig. 6, which shows the test MAE as the number of training samples varies. Notice that ResNet-50 and Inception-V3 also benefit from pretraining on the ImageNet dataset, resulting in lower errors than the results shown in Table 1. ResNet-50 and Inception-V3 tend to converge on a larger training set, e.g., 50k∼100k samples. Also, the approximate value of their MAE is greater than that of EfficientNet-B0, implying that they are less efficient in extracting useful features. It should be mentioned that the performance of ResNet and Inception may improve if better hyper parameters and training tricks can be devised. By comparison, the EfficientNet model requires less tweaking effort to work.

Fig. 6. Effects of training set size on the three network models. In order to achieve a fair comparison, the same training strategy is used for all the models, as described earlier in this section. MAE is obtained by prediction on the same test set.

Download Full Size | PDF

Given the above observations, we settled with fine-tuning an EfficientNet-B0 loaded with ImageNet pretrained weights on a training set containing 20,000 samples. With this model, the MAE of wavefront reconstruction can be as low as 0.114 rad, which means less than 0.02λ phase errors. In Fig. 3(c), we demonstrate that the model can give accurate predictions of the Zernike coefficients of the wavefront, including the lower-order aberrations. Note that 66 Zernike modes are predicted by the models, but only the first 36 are shown in Fig. 3(c) for concision. The residual error of the recovered wavefront corresponding to Fig. 3(b) is shown in Fig. 3(d).

3.3. Influence of turbulence strength

During the generation of the datasets, the values of the propagation distance D_Z, the beam size ω₀ and the turbulence strength ${C_{n}^{2}}$ were all varied. However, the Fried parameter r₀ of a Gaussian beam is not a monotonic function of either D_Z or ω₀. Without loss of generality, we only examine the influence of ${C_{n}^{2}}$, whose MAE curve is monotonic and thus making more sense.

In Fig. 7 the prediction MAE as a function of ${C_{n}^{2}}$ is plotted. Firstly, an EfficientNet-B0 is trained with 50 dB Gaussian noise, and then tested under different noise levels, the result of which is shown in Fig. 7(a). It is seen that the network worked well when only 50 dB noise is present. Even under severe turbulence when ${C_{n}^{2}}$ = 1×10⁻¹⁴ m^−2/3 (r₀ = 26 mm), the wavefront estimation error is under 0.05λ. However, as the noise level increases, the accuracy considerably degrades, which is understandable because the network never saw samples with such strong noise during training and hence no chance to learn to fit. So, we tried training the network with 30 dB noise, and the result is given in Fig. 7(b). This time the performance improved substantially under 30 dB noise, but the MAE for stronger turbulence under lower level of noise increased a lot, indicating the lack of capability to recover severe distortions in the presence of noise. We believe this is due to the limited capacity of EfficientNet-B0, so we turned to the larger EfficientNet-B3, which is obtained by scaling up the baseline B0 network with 8x more computational resources, leading to 1.72x depth, 1.33x width and 1.52x resolution as compared to the baseline B0 [32]. The result is shown in Fig. 7(c), where in the presence of 30 dB noise, EfficientNet-B3 achieved slightly better results than EfficientNet-B0 in Fig. 7(b). Meanwhile, its accuracy under lower noise levels is comparable to that of EfficientNet-B0 trained with 50 dB noise as in Fig. 7(a), which means that EfficientNet-B3 indeed managed to discriminate noise from useful information.

Fig. 7. Wavefront estimation MAE on the test data as a function of the refractive-index structure constant. (a) EfficientNet-B0 trained with 50 dB noise. (b) EfficientNet-B0 trained with 30 dB noise. (c) EfficientNet-B3 trained with 30 dB Gaussian noise.

Download Full Size | PDF

3.4. Training with phase diversity

Phase diversity requires both an in-focus intensity image and a defocused one to be the input. In this section, we make a brief comparison regarding the performance of EfficientNet-B0 trained with phase diversity frames as input. The dataset is generated the same way as in our single-shot pupil plane approach, except that the optical field at the pupil plane is additionally focused by a lens to form the in-focus intensity pattern. A constant 1.0 is added to the 5th Zernike mode at the pupil to account for the defocus. The results are shown in Table 3. Compared with the model trained with single-shot pupil plane image (cf. Table 2), phase diversity yields rather poor predictions. On the same test set as used in Section 3.2, the MAE is 0.1140 vs. 0.4829. On a less challenging test set in which phase aberrations are no greater than 2π, the MAE is 0.0638 vs. 0.1650. This performance gap is probably due to the fact that the training strategy and the loss function are optimized for the pupil plane data, but not for the focal plane data used here. Another probable reason may be the large peak-to-average-ratio of the focal plane data, meaning that a few pixels in the central area contain a large part of the overall power, which would require some careful preprocessing (e.g., tone mapping to suppress the dynamic range of the PSF image) for the network to learn efficiently.

Table 3. Residual MAE of EfficientNet wavefront sensing in the phase diversity manner^a

View Table | View all tables in this article

For completeness purposes, we also trained an EfficientNet-B0 with single-frame in-focus data, and the test MAE is 0.926 rad, which, just as similar results reported in the literature [29], is a bit far from accurate.

3.5. Experimental validation

An experimental platform has been set up to verify the effectiveness of the proposed method, whose top view photo is shown in Fig. 8. The output of a 1550 nm fiber-coupled laser (Src, Thorlabs KLS1550) is collimated and emitted to the free space. L₁ and L₂ are lenses that comprise a 4f system which adjusts the size of the beam (2 mm in diameter) to accommodate for the active area of the space light modulator (SLM, Holoeye PLUTO-2-TELCO-013). L₃ and L₄, L₅ and L₆, L₇ and L₈ are also 4f systems that serve the similar purpose. D is an iris diaphragm used for spatial filtering, and P is a polarizer which ensures pure phase modulation by the SLM.

Fig. 8. Experimental setup of the wavefront sensing model validation platform. Src: fiber coupled laser source @ 1550 nm with output collimator. SLM: spatial light modulator. DM: deformable mirror with 97 actuators. WFS: wavefront sensor with 31×32 effective output pixels. Cam: infrared camera. L₁∼L₉: lenses with focal length of 50 mm, 100 mm, 60 mm, 60 mm, 150 mm, 45 mm, 250 mm, 7.5 mm and 500 mm, respectively. M₁∼M₃: reflection mirrors. BS₁ and BS₂: beam splitters. D: iris diaphragm. SLM imposes turbulence wavefront. WFS measures pupil plane phase and intensity simultaneously. DM corrects the phase aberration predicted by the neural networks, and Cam measures the PSF.

Download Full Size | PDF

Like what is done when generating the simulated data, random atmospheric phase screens are generated and loaded to the SLM, which accordingly modulates the wavefront phase of the incident beam. In order to emulate a long-distance beam propagation on the laboratory platform, Fresnel scaling is performed for which a constant quadratic phase is applied to the SLM and lens L₉ is used to cancel out the residual quadratic phase. In this way, the 0.42 m distance between the SLM and L₉ can emulate an 800 m free space path. (More details on Fresnel scaling can be found in Section 2.3. of Ref. [37].) The deformable mirror (DM, ALPAO DM97-15) and the wavefront sensor (WFS, Imagine Optic HASO4 NIR) locate at conjugate planes. It is worthy of note that the WFS can capture pupil plane phase and intensity simultaneously, which allows for convenient training data collection. However, the effective resolution of the WFS, limited by the number of micro lenses, is only 31×32, as is shown in Figs. 9(a) and 9(b). Thus, bilinear interpolation was used to upsample the WFS data to 256×256, so that it matches the input size of the neural network models.

Fig. 9. An example of the experimental data. (a) Normalized pupil plane intensity pattern captured by the WFS. (b) Pupil plane wavefront captured by the WFS. (c) PSF captured by the infrared camera. Note that (a), (b) and (c) were captured at the same moment, displaying the state of the aberrated beam. (d) First 36 Zernike coefficients of the aberrated wavefront and the reconstructed phase. (e) Residual phase errors after reconstruction. (f) PSF after wavefront correction by the DM.

Download Full Size | PDF

In the experiment, the intensity pattern captured by the WFS (see Fig. 9(a)) is upsampled and fed to the CNN models which had been trained on the simulated data earlier. The first 36 elements in the models’ output were used to control the DM and reconstruct the wavefront, for the WFS only deal with 36 Zernike polynomials, and the MAE of the residual phase after compensation was computed. At first, the CNN models trained with the simulated data did not work well on the experimental platform. This is not surprising because the laboratory environment differs from simulation, and the wavelength changed from 632 nm to 1550 nm. In view of this, we collected 3,600 data samples with the WFS, 3,000 of which were used to fine tune the models (Adam optimizer, learning rate 0.001, 30 epochs), and the rest 800 were used for validation and test.

An example of the estimation result of the fine-tuned EfficientNet-B0 is shown in Figs. 9(d)–9(f). It can be seen in Fig. 9(d) that the estimated Zernike modes match the true measurement quite well for lower orders, while the errors increase for mode order greater than 10. The estimated Zernike coefficients were then used to control the DM, which corrects the wavefront at the pupil plane, and the PSF improved from Fig. 9(c) to Fig. 9(f), where some higher order disruptions can be recognized.

We also compared the performance of EfficientNet-B0, ResNet-50 and Inception-V3 which were fine tuned on the experimental data. In the test set, the average wavefront error is about 3 rad, and the result is presented in Table 4. The test results show some degradation compared to the simulation. Nonetheless, the EfficientNet model maintains its advantage over the other two networks, and its test MAE of the residual phase is 0.3466, which is about 0.05λ.

Table 4. Residual MAE of the models fined tuned on experimental data^a

View Table | View all tables in this article

4. Conclusion

In this study, the feasibility of wavefront sensing through a single frame of the pupil plane intensity image is demonstrated. In the specific scenario of FSOC, a neural network named EfficientNet-B0 is trained to learn the implicit constraints of smoothness of the wavefront in addition to the spatial features in the pupil plane intensity, thus eliminating the ambiguity in single-shot phase determination. FSOC oriented datasets were generated for training, validating, and testing the neural network models. It is shown that 20,000 samples are sufficient to train the EfficientNet, which outperforms earlier CNN models by a large margin in reconstruction accuracy. Particularly, the performance of the trained networks is analyzed as the turbulence strength varies under different noise levels. Whereas the MAE of EfficientNet-B0 is no greater than 0.05λ even in strong turbulence for a noise level of 50 dB (less than 0.02λ on average), if stronger noise exists in the input, the larger EfficientNet-B3 should be preferred. Moreover, a comparison was made between two models trained with phase diversity data and single-shot pupil plane image, respectively, and results reveal an advantage of using pupil plane data to train the neural network model. Last but not least, a validation experiment was carried out in an adaptive optics setup. Thanks to the wavefront sensor which can capture pupil plane intensity and phase simultaneously, experimentally collected data was used to fine-tune the models trained with simulated data, and the test results verified the effectiveness of the proposed method. The method and results presented in this paper should be of interest to researchers and engineers in the field of free-space optics.

Funding

Shandong Provincial Science and Technology Support Program of Youth Innovation Team in Colleges (2019KJN041, 2020KJN005); National Natural Science Foundation of China (61572296, 61876100, 62072286).

Disclosures

The authors declare no conflicts of interest related to this article.

References

1. L. C. Andrews and R. L. Philips, Laser beam propagation through random media, 2nd ed. (SPIE, 2005).

2. G. Gbur, “Partially coherent beam propagation in atmospheric turbulence,” J. Opt. Soc. Am. A 31(9), 2038–2045 (2014). [CrossRef]

3. M. Niu, J. Cheng, and J. F. Holzman, “MIMO architecture for coherent optical wireless communication: System design and performance,” J. Opt. Commun. Netw. 5(5), 411–420 (2013). [CrossRef]

4. S. Lin, C. Wang, X. Zhu, R. Lin, F. Wang, G. Gbur, Y. Cai, and J. Yu, “Propagation of radially polarized Hermite non-uniformly correlated beams in a turbulent atmosphere,” Opt. Express 28(19), 27238–27249 (2020). [CrossRef]

5. R. K. Tyson, Principles of adaptive optics, 3rd ed. (CRC, 2015).

6. W. Liu, K. Yao, D. Huang, X. Lin, L. Wang, and Y. Lv, “Performance evaluation of coherent free space optical communications with a double-stage fast-steering-mirror adaptive optics system depending on the Greenwood frequency,” Opt. Express 24(12), 13288–13302 (2016). [CrossRef]

7. H. Campbell and A. Greenaway, “Wavefront sensing: from historical roots to the state-of-the-art,” EAS Publ. Ser. 22, 165–185 (2006). [CrossRef]

8. R. A. Gonsalves and R. Chidlaw, “Wavefront sensing by phase retrieval,” in Applications of Digital Image Processing III, (SPIE, 1979), 32–39.

9. B. C. Platt and R. Shack, “History and principles of Shack-Hartmann wavefront sensing,” J. Refract Surg. 17(5), S573–S577 (2001). [CrossRef]

10. M. J. Booth, “Wavefront sensorless adaptive optics for large aberrations,” Opt. Lett. 32(1), 5–7 (2007). [CrossRef]

11. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik 35, 237–250 (1972).

12. M. A. Vorontsov, G. W. Carhart, and J. C. Ricklin, “Adaptive phase-distortion correction based on parallel gradient-descent optimization,” Opt. Lett. 22(12), 907–909 (1997). [CrossRef]

13. Y. Zhang, H. Xie, and Q. Dai, “Robust sensorless wavefront sensing via neural network in a single-shot,” in SPIE BiOS (SPIE, 2020), Vol. 11248.

14. S. W. Paine and J. R. Fienup, “Machine learning for improved image-based wavefront sensing,” Opt. Lett. 43(6), 1235–1238 (2018). [CrossRef]

15. A. K. Jain, M. Jianchang, and K. M. Mohiuddin, “Artificial neural networks: a tutorial,” Computer 29(3), 31–44 (1996). [CrossRef]

16. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

17. J. R. P. Angel, P. Wizinowich, M. Lloyd-Hart, and D. Sandler, “Adaptive optics for array telescopes using neural-network techniques,” Nature 348(6298), 221–224 (1990). [CrossRef]

18. M. Lloyd-Hart, P. Wizinowich, B. McLeod, D. Wittman, D. Colucci, R. Dekany, D. McCarthy, J. Angel, and D. Sandler, “First results of an on-line adaptive optics system with atmospheric wavefront sensing by an artificial neural network,” Astrophys. J. 390, L41–L44 (1992). [CrossRef]

19. T. K. Barrett and D. G. Sandler, “Artificial neural network for the determination of Hubble Space Telescope aberration from stellar images,” Appl. Opt. 32(10), 1720–1727 (1993). [CrossRef]

20. G. Ju, X. Qi, H. Ma, and C. Yan, “Feature-based phase retrieval wavefront sensing approach using machine learning,” Opt. Express 26(24), 31767–31783 (2018). [CrossRef]

21. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, (NIPS, 2012), 1097–1105.

22. Y. Wu, Y. Guo, H. Bao, and C. Rao, “Sub-millisecond phase retrieval for phase-diversity wavefront sensor,” Sensors 20(17), 4877 (2020). [CrossRef]

23. H. Ma, H. Liu, Y. Qiao, X. Li, and W. Zhang, “Numerical study of adaptive optics compensation based on convolutional neural networks,” Opt. Commun. 433, 283–289 (2019). [CrossRef]

24. Y. Jin, Y. Zhang, L. Hu, H. Huang, Q. Xu, X. Zhu, L. Huang, Y. Zheng, H.-L. Shen, and W. Gong, “Machine learning guided rapid focusing with sensor-less aberration corrections,” Opt. Express 26(23), 30162–30171 (2018). [CrossRef]

25. G. Allan, I. Kang, E. S. Douglas, G. Barbastathis, and K. Cahoy, “Deep residual learning for low-order wavefront sensing in high-contrast imaging systems,” Opt. Express 28(18), 26267–26283 (2020). [CrossRef]

26. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]

27. T. Andersen, M. Owner-Petersen, and A. Enmark, “Neural networks for image-based wavefront sensing for astronomy,” Opt. Lett. 44(18), 4618–4621 (2019). [CrossRef]

28. Y. Nishizaki, M. Valdivia, R. Horisaki, K. Kitaguchi, M. Saito, J. Tanida, and E. Vera, “Deep learning wavefront sensing,” Opt. Express 27(1), 240–251 (2019). [CrossRef]

29. Q. Tian, C. Lu, B. Liu, L. Zhu, X. Pan, Q. Zhang, L. Yang, F. Tian, and X. Xin, “DNN-based aberration correction in a wavefront sensorless adaptive optics system,” Opt. Express 27(8), 10765–10776 (2019). [CrossRef]

30. R. Gonsalves, “Phase retrieval and diversity in adaptive optics,” Opt. Eng. 21(5), 215829 (1982). [CrossRef]

31. L. M. Mugnier, A. Blanc, and J. Idier, “Phase diversity: a technique for wave-front sensing and for diffraction-limited imaging,” in Advances in Imaging and Electron Physics, P. Hawkes, ed. (Elsevier, 2006), pp. 1–76.

32. M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” arXiv preprint arXiv:1905.11946 (2019).

33. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018), 4510–4520.

34. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018), 7132–7141.

35. R. J. Noll, “Zernike polynomials and atmospheric turbulence,” J. Opt. Soc. Am. 66(3), 207–211 (1976). [CrossRef]

36. D. G. Voelz, Computational Fourier optics: a MATLAB tutorial (SPIE, 2011).

37. B. Rodenburg, M. Mirhosseini, M. Malik, O. S. Magaña-Loaiza, M. Yanakas, L. Maher, N. K. Steinhoff, G. A. Tyler, and R. W. Boyd, “Simulating thick atmospheric turbulence in the lab with application to orbital angular momentum communication,” New J. Phys. 16(3), 033020 (2014). [CrossRef]

	# Params	Inference time ^a	Inference VRAM	Train MAE (rad)	Val MAE (rad)	Test MAE (rad)
EfficientNet-B0	4.09M	8.7 ms	2.6 GB	0.0292	0.0407	0.2821
ResNet-50	23.7M	9.4 ms	8.1 GB	0.1896	0.2008	1.2465
Inception-V3	21.9M	11.9 ms	4.5 GB	0.1520	0.2003	1.2647

	Train MAE	Val MAE	Test MAE
Training start from random initialization	0.0292	0.0407	0.2821
Fine-tuning upon pretrained weights	0.0223	0.0192	0.1140

	Train MAE	Val MAE	Test MAE
Datasets of large aberrations (up to 6π) ^b	0.0680	0.1356	0.4829
Datasets of smaller aberrations (up to 2π)	0.0222	0.0458	0.1650

	Train MAE	Test MAE
EfficientNet-B0	0.2541	0.3466
ResNet-50	0.5790	0.8183
Inception-V3	0.6237	0.7802

	# Params	Inference time ^a	Inference VRAM	Train MAE (rad)	Val MAE (rad)	Test MAE (rad)
EfficientNet-B0	4.09M	8.7 ms	2.6 GB	0.0292	0.0407	0.2821
ResNet-50	23.7M	9.4 ms	8.1 GB	0.1896	0.2008	1.2465
Inception-V3	21.9M	11.9 ms	4.5 GB	0.1520	0.2003	1.2647

Single-shot wavefront sensing with deep neural networks for free-space optical communications

Abstract

1. Introduction

2. Method

3. Results and discussion

3.1. Datasets

3.2. Training the model

3.3. Influence of turbulence strength

3.4. Training with phase diversity

3.5. Experimental validation

4. Conclusion

Funding

Disclosures

References

Cited By

Figures (9)

Tables (4)

Equations (10)

Optics Express