## Abstract

We present a new class of wavefront sensors by extending their design space based on machine learning. This approach simplifies both the optical hardware and image processing in wavefront sensing. We experimentally demonstrated a variety of image-based wavefront sensing architectures that can directly estimate Zernike coefficients of aberrated wavefronts from a single intensity image by using a convolutional neural network. We also demonstrated that the proposed deep learning wavefront sensor can be trained to estimate wavefront aberrations stimulated by a point source and even extended sources.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Wavefront sensing is a technique for measuring the wavefront aberrations of an optical field, and it has been widely used for adaptive optics in biomedical imaging and astronomical imaging [1–3]. Several types of wavefront sensors (WFSs) have been proposed, such as the Shack-Hartmann WFS, the plenoptic WFS, the pyramid WFS, the curvature WFS, and the image-based WFS [2,4–8]. However, these WFSs suffer, to varying degrees, from costly optical elements, stringent alignment requirements, dynamic range, linearity, multiple measurements, and often intensive computational reconstruction of the wavefront or the Zernike modes [9]. Also, approaches to wavefront sensing that make use of random modulations have been reported, although they need a careful calibration process, assume wavefront aberrations dominated by low-frequency components, or require multiple iterations for phase estimation [10–13].

Machine learning, including deep learning, has become a hot topic in the fields of optics and photonics [14,15]. Imaging and focusing through scattering media, phase retrieval, and computer-generated holography based on machine learning have been reported [16–20]. Furthermore, a few wavefront estimation methods based on machine learning have also been proposed [21–23]. In [21], wavefront estimation from phase diversity measurements are computed with a general regression neural network, although this method requires multiple image sensors. In [22], Zernike coefficients are directly estimated from the Shack-Hartmann WFS measurements using a multilayer neural network trained with back-propagation, alleviating the computational requirements of the wavefront reconstruction of the Shack-Hartmann WFS. Recently, as described in [23], a convolutional neural network (CNN) has been trained to generate a first guess for estimating Zernike coefficients from point spread function (PSF) measurements, and iterative processing is executed to refine the coefficients.

So far, in order to accommodate the increasing demands for prompt estimation of dynamic wavefronts in adaptive optics, traditional WFSs have been basically designed with simple and analytically tractable inverse optical models, e.g. the lenslet-array in the Shack-Hartmann/plenoptic WFS, the prism in the pyramid WFS, and the imaging optics in the image-based WFS. On one hand, while this allows recovery of the phase profile from calculations based on intensity measurements, on the other hand, it also restricts the capabilities of wavefront sensing systems.

In this paper, we present the deep learning WFS (DLWFS): a generalized machine-learning-based wavefront sensing framework. Here we show that machine learning can bypass the analytical modeling and extend the design space of wavefront sensors for alleviating their current limitations. We present three wavefront sensing schemes, using a state-of-the-art CNN, whereby Zernike coefficients are noniteratively estimated from measurement of an overexposured, defocused, or scattered image. We also extend the framework and explore its effectiveness for handling extended sources.

## 2. Method

A schematic diagram of the DLWFS is shown in Fig. 1. Here, a point source or extended source object *f* is captured as a single intensity image *g* after passing through an aberrating medium, such as turbulence, which we assume to have a space-invariant response on the sensor’s field of view and is expressed as the PSF *h*. This measurement process is written as [24]

*ℱ*is the Fourier transform, and

*w*is the aberrated wavefront. The wavefront

*w*is described with a Zernike series

*z*with coefficients

_{i}*a*[25]: where

_{i}*r*is the radial distance and

*θ*is the azimuthal angle in polar coordinates on the pupil plane.

Although our goal is to estimate the Zernike coefficients *a _{i}* in Eq. (3) from the captured image

*g*in Eq. (1), the limited dynamic range of image sensors is a general issue not only in wavefront sensing but also in computational imaging [26]. This issue is more pronounced particularly when only residual aberrations are present, and most of the PSF energy concentrates in a few central pixels. To tackle this issue, we introduce an

*optical preconditioner 𝒫*for the incoming aberrations in the optical setup of the proposed DLWFS, as shown in Fig. 1. In this case, Eq. (1) is rewritten as

*g*is a new captured intensity image with the preconditioner. The energy of this captured image

_{p}*g*is spread out more widely over the detector array in comparison with the original PSF

_{p}*h*. The net effect of the preconditioner is to improve the performance of the WFS by increasing the number of informative pixels in the captured image. In this paper, we experimentally demonstrate three preconditioners: overexposure, defocus, and scatter.

The estimation of the Zernike coefficients *a _{i}* in Eq. (3) from the captured image

*g*in Eq. (4) is nonlinear, especially when the overexposure preconditioner is used. This kind of inverse problem has been difficult to solve with linear approaches, but recent research suggests that machine-learning-based approaches may be a feasible solution [16,21–23]. Therefore, we not only show that machine learning actually enables image-based wavefront sensing, but it also allows us to extend the design space of WFSs, which is demonstrated by the introduction of the preconditioners in this paper.

_{p}We use a state-of-the-art CNN called *Xception* to directly estimate the Zernike coefficients from a single intensity measurement through the preconditioner without any iterative process [27]. Xception is a deep neural network that has a remarkable balance between the number of optimized parameters (low) and the generalization capability (high) compared with other recently proposed networks. The Xception network has been modified for regression to adapt it for the wavefront sensing problem by adding a dropout layer and a fully connected layer, as shown in Fig. 1.

## 3. Experimental demonstration

In typical wavefront sensing, a point source is often used, and is placed far enough so that the optical aberrations–from the optical system or turbulent media–are stimulated by a flat wavefront. This is common in astronomical adaptive optics, where the point source is actually a natural guide star. However, advanced adaptive optics techniques for the next generation of extremely large telescopes will require the use of artificial laser guide stars, which are not seen as point sources but rather as extended elongated objects in the sky. Current WFSs are not designed to handle objects other than point sources [28]. We implemented an optical setup to demonstrate our proposed DLWFS approach with both a point source and extended sources.

First, wavefront sensing in an in-focus condition without any preconditioner was performed as a baseline. Then, our extended DLWFS method with three preconditioners, namely overexposure, defocus, and scatter, was demonstrated. We generated arbitrary wavefront profiles–emulating turbulence, for example–with a transmissive spatial light modulator (SLM). Phase maps displayed on the SLM were generated on the basis of Eq. (3) using the first 32 Zernike coefficients. We set the first coefficient (piston) to zero, and the remaining 31 Zernike coefficients were randomly generated within the range of [−0.5, 0.5] for computing each phase map. We generated a training dataset consisting of a number of pairs of Zernike coefficients and corresponding phase maps. A test dataset consisting of an additional 1, 000 pairs was also generated. An example pair of the Zernike coefficients from the test dataset and the corresponding phase map are shown in Figs. 2(a) and 2(b), respectively.

A learning algorithm called *Adam* was used for optimizing the network with an initial learning ratio of 1 × 10^{−4}, a batch size of 10, and a number of epochs of 40 [29]. The dropout ratio of the dropout layer connected to the Xception block, as shown in Fig. 1, was 0.2. Once we trained the Xception network, the Zernike coefficients were estimated from a single intensity image produced by the phase map displayed on the SLM, as shown in Fig. 1. The codes were implemented in MALTAB and Keras and were executed on a computer with an Intel Xeon 6134 CPU running at 3.2 GHz, with 192 GB of RAM, and an NVIDIA Tesla V100 GPU with 16 GB of VRAM.

#### 3.1. Point source

First, we performed wavefront sensing by assuming a distant point source that generates a flat wavefront as illumination. In this case, the object *f* in Eq. (1) is a delta function, whereas the captured image *g* corresponds to the aberrated PSF *h*. Light from an incoherent white light emitting diode (LED: Kaito2610, Kaito Denshi) was passed through a pinhole (diameter: 10 μm), as shown in Fig. 1, and a collimating lens (focal length: 100 mm) to illuminate an SLM (LC 2002, Holoeye, pixel pitch: 32 μm, pixel count: 800 × 600) operated in phase mode. Light passing through the SLM was captured by a monochrome CCD image sensor (PL-B953U, PixeLink, pixel pitch: 4.65 μm, pixel count: 1024 × 768) with an imaging lens (focal length: 80 mm). For each case, we captured the training dataset of 100,000 images from different pairs of Zernike coefficients and corresponding phase maps. A separate test dataset of 1,000 images was also captured. The pixel count of the captured intensity images was 256 × 256.

### 3.1.1. In-focus

An in-focus DLWFS without any preconditioner was demonstrated as a baseline. The distance between the imaging lens and the image sensor was 8 cm, which corresponded to the focal length of the imaging lens. The captured image obtained with the Zernike coefficients in Fig. 2(a) is shown in Fig. 3(a). Note that we added small amounts of aberrations, so the captured images were barely different from diffraction-limited PSFs. The coefficients estimated from the captured image are shown in Fig. 3(b). The root mean squared error (RMSE) between the original and estimated Zernike coefficients for the test dataset was 0.142 ± 0.032.

### 3.1.2. Overexposure

Next, the overexposure preconditioner was used in the in-focus setup. The PSF images on the WFS were captured with a longer exposure time compared to the first experiments. Since the introduced aberrations lead to small deviations of the diffraction limited PSFs, many pixels on the captured image were saturated, as shown in Fig. 4(a), where the Zernike coefficients in Fig. 2(a) were used. This overexposure enhanced the fringes caused by the aberrations in areas of the detector that would otherwise be hidden under the noise level in a normally exposed in-focus setup. The coefficients estimated from the captured image are shown in Fig. 4(b). The RMSE of the estimated Zernike coefficients for the test dataset was 0.036 ± 0.013. Therefore, this overexposure preconditioner improved the estimation accuracy.

### 3.1.3. Defocus

Next, the defocus preconditioner was used. The optical intensity pattern on the image sensor was extended by using defocus compared with the in-focus setup. Thus, the preconditioner was able to compress the dynamic range of the WFS, further using the available pixels for estimating residual aberrations, for instance. The distance between the imaging lens and the image sensor was 8.3 cm. The captured image with the Zernike coefficients in Fig. 2(a) is shown in Fig. 5(a). The coefficients estimated from the captured image are shown in Fig. 5(b). The RMSE of the estimated Zernike coefficients for the test dataset was 0.040 ± 0.016. Therefore, the defocus preconditioner considerably improved the estimation accuracy of the Zernike coefficients compared with the conventional in-focus setup.

### 3.1.4. Scatter

For the preconditioner in this case, a scattering plate was inserted between the imaging lens and the image sensor. This preconditioner added more aberrations to the wavefront to be measured and made the optical patterns on the WFS detector more spread-out, similarly to the defocus case. Maintaining the distances of the in-focus setup, the distance between the imaging lens and the scattering plate was 4 cm, and the distance between the scattering plate and the image sensor was 4 cm. The captured image with the Zernike coefficients in Fig. 2(a) is shown in Fig. 6(a). The coefficients estimated from the captured image are shown in Fig. 6(b). The RMSE of the estimated Zernike coefficients for the test dataset was 0.057 ± 0.018. The scattering preconditioner also improved the estimation accuracy compared with the in-focus PSF estimations.

#### 3.2. Extended sources

We also implemented the DLWFS with extended sources as the object. The effect at the image sensor plane is that the objects *f* are convolved with the aberrated PSF *h*, as shown in Eq. (1). Although this particular wavefront sensing case is more general, it is indeed more difficult in contrast with the point source case since the DLWFS measurements were blurred by the objects. Therefore, we increased the training dataset size to 1,000,000.

We generated arbitrary objects and wavefront profiles with two SLMs (LC 2012, Holoeye, pixel pitch: 36 μm, pixel count: 1024 × 768). For producing extended sources, the first SLM was used in amplitude mode by inserting the SLM between two perpendicularly aligned polarizers, and it was illuminated with an incoherent LED, as also shown in Fig. 1. The second SLM was used in the phase mode, as in the point source case, emulating the wavefront aberrations that would come from turbulence. The distance between the first SLM and the imaging lens (focal length: 85 mm) was 79 cm, and the second SLM was located in front of the imaging lens. Light passing through the two SLMs was captured by the image sensor via the imaging lens.

As extended sources, we used randomly selected handwritten numbers taken from the EMNIST database [30]. The pixel count of the images in the database was 28 × 28, and they were displayed on the central area of the SLM. For each case, the numbers of object images for training and test datasets were 1, 000, 000 and 1,000, respectively. These two datasets were non-overlapping. Examples of the object images from the database are shown in Fig. 7. The pixel count of the captured intensity images was set to 256 × 256, as in the case of the point source.

### 3.2.1. In-focus

In the case of the in-focus DLWFS without any preconditioner, the distance between the imaging lens and the image sensor was 9.5 cm, which corresponded to the imaging distance. The captured image with the Zernike coefficients in Fig. 2(a) and the top-left image in Fig. 7 is shown in Fig. 8(a). As shown in Fig. 8(b), the estimation of the Zernike coefficients from the captured image failed. The RMSE of the estimated Zernike coefficients for the test dataset was 0.288 ± 0.024.

### 3.2.2. Overexposure

The overexposure preconditioner for the DLWFS was demonstrated in the in-focus setup. The captured image is shown in Fig. 9(a), where the Zernike coefficients in Fig. 2(a) and the top-left image in Fig. 7 were used and saturation appears. The coefficients estimated from the captured image are shown in Figs. 9(b). The RMSE of the estimated Zernike coefficients for the test dataset was 0.214 ± 0.051. Therefore, using the overexposure preconditioner also improved the estimation accuracy in the case with extended sources.

### 3.2.3. Defocus

The distance between the imaging lens and the image sensor was set to 9.75 cm to implement the defocus preconditioner for the DLWFS. The captured image with the Zernike coefficients in Fig. 2(a) and the top-left image in Fig. 7 is shown in Fig. 10(a). The coefficients estimated from the captured image are shown in Fig. 10(b). The RMSE of the estimated Zernike coefficients for the test dataset was 0.099 ± 0.064. The estimation accuracy with the defocus preconditioner was improved compared with the conventional in-focus setup.

### 3.2.4. Scatter

In the setup for the scattering preconditioner for the DLWFS, a scattering plate was inserted at a distance of 7.5 cm behind the imaging lens of the in-focus setup. The captured image with the Zernike coefficients in Fig. 2(a) and the top-left image in Fig. 7 is shown in Fig. 11(a). The coefficients estimated from the captured image are shown in Fig. 11(b). The RMSE of the estimated Zernike coefficients for the test dataset was 0.195 ± 0.064. Thus, the scatter DLWFS showed better estimation accuracy than the in-focus setup.

#### 3.3. Discussion

The previous experiments showed that all of the presented preconditioners improved, to varying degrees, the accuracy of the estimated coefficients of the DLWFS, as summarized in Table 1. The accuracy of the respective wavefronts estimated by the DLWFS are also shown in Table 2, calculated with Eq. (3) using the estimated coefficients. Overall, wavefront sensing using the DLWFS with a point source showed better performance than that obtained for extended sources because they worked as low-pass filters of the PSF. The averages of the RMSEs of the DLWFS with the three preconditioners in the cases of the point source and the extended sources were 3.2-times and 1.7-times, respectively, which were lower than those of the conventional in-focus DLWFS without any preconditioner. The defocus DLWFS presented a steady improvement of the estimation accuracy over the in-focus baseline setup in both the point source and extended source cases compared with the other preconditioners.

A plot depicting the influence of the number of training samples for the estimation accuracy obtained with extended sources and the defocus preconditioner is shown in Fig. 12. This result clearly shows that a larger training set trivially improves the estimation accuracy, if needed. Although it is well-known that more training requires a longer computational time, this additional computation is only required for training, whereas the estimation always requires the same computational time. In the experiments, the estimation time was 9.2 milliseconds for a single image with the computer described above. This can be vastly sped up by optimizing the network design and using specialized computing resources.

Figure 12 also shows the effect of the estimation error when testing using wavefronts with unknown disturbances. These results are obtained by training and testing only for the first 15 and 21 Zernike coefficients of those acquired with 32 coefficients. Although the impact on the estimation accuracy when observing unknown coefficients is mild, the DLWFS can alleviate this effect by increasing the number of training pairs. It is also known that the magnitudes of higher-order Zernike coefficients in natural turbulence are small compared with lower orders [31], which will also neglect the impact of the appearance of coefficients beyond the training ones.

## 4. Conclusion

By exploiting machine learning, we presented DLWFS, which is a generalized WFS framework that can expand the design space for wavefront sensing. A state-of-the-art CNN is able to directly estimate Zernike coefficients of an incoming wavefront from a single intensity image measured through one of a series of illustrative preconditioners. In this paper, we experimentally demonstrated the DLWFS with three preconditioners: overexposure, defocus, and scatter, for a point source and extended sources. The results showed that all of them can vastly improve the estimation accuracy obtained when performing in-focus image-based estimation. The applicability of the DLWFS to practical situations, e.g. cases with a large number of Zernike coefficients, a low luminous flux, and an extended field of view, should be investigated. It is also important to compare the DLWFS with conventional WFSs to validate its usefulness as a practical replacement.

The concept of the generalized preconditioner allows the design of innovative WFSs. In particular, the choice and optimization of the preconditioner for the DLWFS, which can be any optical transformation, is an open research question. Even other optical elements that are already used in traditional WFSs could be potentially used, such as a lenslet array or a pyramid. Nonetheless, the proposed DLWFS scheme has the advantage that it can be trained as mounted, without requiring further alignment or precision optics. Therefore, our proposed framework simplifies and rationalizes WFSs, which can now find applications in a vastly expanded range of fields, especially for adaptive optics in biomedicine, astronomy, and optical communications.

## Funding

Japan Society for the Promotion of Science (JP17H02799, JP17K00233); Precursory Research for Embryonic Science and Technology (JPMJPR17PB); Fondo Nacional de Ciencia y Tecnologia (1181943); Redes Etapa Inicial (REDI170539).

## References

**1. **N. Ji, “Adaptive optical fluorescence microscopy,” Nat. Methods **14**, 374–380 (2017). [CrossRef] [PubMed]

**2. **B. C. Platt and R. Shack, “History and principles of Shack-Hartmann wavefront sensing,” J. refractive surgery **17**, 573–577 (1995). [CrossRef]

**3. **P. Schipani, L. Noethe, D. Magrin, K. Kuijken, C. Arcidiacono, J. Argomedo, M. Capaccioli, M. Dall’Ora, S. D’Orsi, J. Farinato, D. Fierro, R. Holzlöhner, L. Marty, C. Molfese, F. Perrotta, R. Ragazzoni, S. Savarese, A. Rakich, and G. Umbriaco, “Active optics system of the VLT Survey Telescope,” Appl. Opt. **55**, 1573–1583 (2016). [CrossRef] [PubMed]

**4. **M. Shaw, K. O’Holleran, and C. Paterson, “Investigation of the confocal wavefront sensor and its application to biological microscopy,” Opt. Express **21**, 19353–19362 (2013). [CrossRef] [PubMed]

**5. **C. Wu, J. Ko, and C. C. Davis, “Determining the phase and amplitude distortion of a wavefront using a plenoptic sensor,” J. Opt. Soc. Am. A **32**, 964–978 (2015). [CrossRef]

**6. **R. Ragazzoni, “Pupil plane wavefront sensing with an oscillating prism,” J. Mod. Opt. **43**, 289–293 (1996). [CrossRef]

**7. **F. Roddier, “Curvature sensing and compensation: a new concept in adaptive optics,” Appl. Opt. **27**, 1223–1225 (1988). [CrossRef] [PubMed]

**8. **R. A. Gonsalves, “Phase retrieval and diversity in adaptive optics,” Opt. Eng. **21**, 829–832 (1982). [CrossRef]

**9. **H. Campbell and A. Greenaway, “Wavefront sensing: From historical roots to the state-of-the-art,” EAS Publ. Ser. **22**, 165–185 (2006). [CrossRef]

**10. **R. Horisaki, Y. Ogura, M. Aino, and J. Tanida, “Single-shot phase imaging with a coded aperture,” Opt. Lett. **39**, 6466–6469 (2014). [CrossRef] [PubMed]

**11. **R. Horisaki, R. Egami, and J. Tanida, “Single-shot phase imaging with randomized light (SPIRaL),” Opt. Express **24**, 3765–3773 (2016). [CrossRef]

**12. **C. Wang, X. Dun, Q. Fu, and W. Heidrich, “Ultra-high resolution coded wavefront sensor,” Opt. Express **25**, 13736–13746 (2017). [CrossRef]

**13. **P. Berto, H. Rigneault, and M. Guillon, “Wavefront sensing with a thin diffuser,” Opt. Lett. **42**, 5117–5120 (2017). [CrossRef]

**14. **C. M. Bishop, *Pattern Recognition and Machine Learning* (Springer-Verlag New York, Inc., NJ, USA, 2006).

**15. **Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature **521**, 436–444 (2015). [CrossRef] [PubMed]

**16. **T. Ando, R. Horisaki, and J. Tanida, “Speckle-learning-based object recognition through scattering media,” Opt. Express **23**, 33902–33910 (2015). [CrossRef]

**17. **R. Horisaki, R. Takagi, and J. Tanida, “Learning-based imaging through scattering media,” Opt. Express **24**, 13738–13743 (2016). [CrossRef]

**18. **R. Horisaki, R. Takagi, and J. Tanida, “Learning-based focusing through scattering media,” Appl. Opt. **56**, 4358–4362 (2017). [CrossRef] [PubMed]

**19. **A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica **4**, 1117–1125 (2017). [CrossRef]

**20. **R. Horisaki, R. Takagi, and J. Tanida, “Deep-learning-generated holography,” Appl. Opt. **57**, 3859–3863 (2018). [CrossRef]

**21. **R. L. Kendrick, D. S. Acton, and A. L. Duncan, “Phase-diversity wave-front sensor for imaging systems,” Appl. Opt. **33**, 6533–6546 (1994). [CrossRef]

**22. **H. Guo, N. Korablinova, Q. Ren, and J. Bille, “Wavefront reconstruction with artificial neural networks,” Opt. Express **14**, 6456–6462 (2006). [CrossRef]

**23. **S. W. Paine and J. R. Fienup, “Machine learning for improved image-based wavefront sensing,” Opt. Lett. **43**, 1235–1238 (2018). [CrossRef] [PubMed]

**24. **J. W. Goodman, *Introduction to Fourier Optics* (McGraw-Hill, 1996).

**25. **J. D. Schmidt, *Numerical Simulation of Optical Wave Propagation with Examples in MATLAB* (SPIE, WA, USA, 2010). [CrossRef]

**26. **A. Stern, Y. Zeltzer, and Y. Rivenson, “Quantization error and dynamic range considerations for compressive imaging systems design,” J. Opt. Soc. Am. A **30**, 1069–1077 (2013). [CrossRef]

**27. **F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in *2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, (IEEE, 2017), pp. 1800–1807. [CrossRef]

**28. **L. Gilles and B. Ellerbroek, “Shack-Hartmann wavefront sensing with elongated sodium laser beacons: centroiding versus matched filtering,” Appl. Opt. **45**, 6568–6576 (2006). [CrossRef]

**29. **D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Int. Conf. on Learn. Represent. (ICLR) (2015).

**30. **G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, “EMNIST: an extension of MNIST to handwritten letters,” arXiv preprint p. 1702.05373 (2017).

**31. **R. J. Noll, “Zernike polynomials and atmospheric turbulence,” J. Opt. Soc. Am. **66**, 207–211 (1976). [CrossRef]