## Abstract

Adaptive optics relies on the fast and accurate determination of aberrations but is often hindered by wavefront sensor limitations or lengthy optimization algorithms. Deep learning by artificial neural networks has recently been shown to provide determination of aberration coefficients from various microscope metrics. Here we numerically investigate the direct determination of aberration functions in the pupil plane of a high numerical aperture microscope using an artificial neural network. We show that an aberration function can be determined from fluorescent guide stars and used to improve the Strehl ratio without the need for reconstruction from Zernike polynomial coefficients.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Adaptive optics is becoming increasingly important in high numerical aperture (NA) microscopy [1] and laser nanofabrication [2]. In microscopy, adaptive optics can correct for aberrations that are imparted by spatial and temporal variations in the optical properties of the specimen [3]. In laser nanofabrication, adaptive optics can combat aberrations arising from variations in material geometry [4] and refractive-index [5,6]. The effect of these aberrations is dire: a loss of resolution, brightness, and contrast [1,3] or a loss of feature size, resolution, and incorrect positioning [5,6].

The key to the implementation of adaptive optics is the fast and accurate determination of the aberration function of the optical system. Traditionally, direct sensing schemes involving wavefront sensors such as the Shack Hartman [7,8] have provided direct measurement of aberration functions, but such sensors can be limited by noise or spatial dependence. In nonlinear and fluorescent microscopes, localized emission can act as a guide-star for spatially localized direct measurement [9,10] but the requirement for a wavefront sensor remains. Indirect optimization methods such as modal wavefront sensing [11–13] or pupil segmentation [14,15] provide a means to wavefront sensorless measurement, but require time consuming iterative solution due to the highly nonlinear relationship between aberrations and performance metrics.

Recently, research has turned to the use of deep learning [16] by artificial neural networks (ANNs) to solve photonic problems [17]. ANNs mimic the processing behavior of biological neural networks and have the capability to learn complex relationships without being programmed with specific physical rules [16]. In adaptive optics, deep learning was initially applied to astronomical telescopes [18–20] and has recently been revisited with the advent of modern architectures [21,22]. In microscopy, ANNs are beginning to find use in both indirect [23–25] and direct aberration sensing methods [26–28]. In each of these cases, the ANNs are mapping image characteristics or point spread functions (PSFs) to Zernike coefficients in a modal based approach that requires reconstruction of the aberration function. Little research has been dedicated to the direct determination of aberration functions using ANNs, yet it has been shown to outperform modal based ANN approaches in some systems [29].

In this work we numerically investigate the direct determination of aberration functions in the pupil plane of a high NA microscope by a simple perceptron-based ANN. We show that an appropriately trained ANN can accept pairs of CCD camera PSF images from a nonlinear or fluorescent guide-star and directly return an aberration function in radians suitable for use in adaptive optics. We model the high NA focusing conditions as used in microscopy and laser nanofabrication and demonstrate a significant reduction in aberration magnitude and an improvement in Strehl ratio is possible when the ANN predicted aberration function is used for aberration compensation. Finally, we analyze the performance of the ANN with respect to aberration magnitude, polynomial, and noise. The use of perceptron-based ANNs for direct aberration function determination in microscopy may enable faster convergence of indirect optimization methods, may be used as a direct sensing method itself, or may enable all-optical aberration retrieval through the emerging use of perceptron-based diffractive ANNs [30,31].

## 2. Network architecture for a direct aberration function determination

The ANN considered here is modelled on a multi-layer perceptron network previously investigated for the determination of Zernike coefficients [19]. However, in place of computation of Zernike coefficients and reconstruction of an aberration function, in this work the aberration function of an objective lens is determined directly. The advantage of direct determination is superior performance [29] and the potential for the network to be implemented opto-electronically via perceptron-based diffractive ANNs [30,31]. Figures 1(a) and 1(b) illustrate the concept of ANN direct aberration function determination. The input to the ANN comprises two axially offset CCD camera images $I_1$ and $I_2$ taken close to the focal plane of a tube lens, whilst the output of the ANN is the aberration function $\psi$ in radians of the objective lens. The use of two PSF images enables axially symmetric aberrations such as defocus to be determined [19].

The determination of an aberration function proceeds as follows. The two PSF images are fed into the input layer $I$ which is connected to a hidden layer $H$ by the weight matrix $W$. The input to the hidden layer is passed through a nonlinear activation performed via the Sigmoid function. The output layer $\psi$ accepts weighted input from the hidden layer output via the weight matrix $V$ and returns the aberration function. A complete mathematical description of the ANN architecture, shown in Fig. 1(c), is given in Appendix A. For this work which comprised two 64x64 pixel PSF images and a single 64x64 pixel aberration function, a single hidden layer comprised of 128 neurons was found to be sufficient to map PSF images to aberration functions.

## 3. Training method

Training occurs by stochastic fixed step size gradient decent based on the error function

where $\psi ^\prime$ is the true aberration function used to generate the PSF images and $\psi$ is the network predicted aberration function. The difference $\psi ^\prime -\psi$ is the residual aberration function present in an optical system that would result from the use of adaptive optics.Starting weights are initially randomized with magnitudes scaled to the average values of the PSF images and of the true aberration functions in the training data set. The weight matrices are iteratively adjusted as detailed in Appendix A and with empirically optimized learning rates of $\alpha _W$ = 0.027 and $\alpha _V$ = 0.005. A complete flow diagram of forward- and back-propagation, data generation, and testing can be seen in Fig. 2.

## 4. Training data modeled on a high numerical aperture microscope

The optical system chosen to test the ANN is a high NA nonlinear microscope commonly used in microscopy and laser nanofabrication. The fluorescence and plasma emission from such systems has previously been used as guide-stars in direct [9,10] and indirect [13] aberration measurement. Only the imaging side of the system must be considered as the guide-star PSF provides a complete picture of the optical performance of the system, provided that the excitation and collection beam paths within the specimen or material overlap.

The microscope comprised an infinity corrected objective lens with a NA of 1.4, a magnification of 100x, a tube length of 180 mm, and an immersion oil of refractive index 1.5. The objective focal plane was imaged onto two simulated CCD cameras by a 200 mm focal length tube lens and a 50:50 beamsplitter such that the total magnification was 111x. The two cameras were axially offset from the geometric focus by a distance of $\pm$3.7 mm. The dimensions and resolution of the tube lens PSF images were based on a typical CCD camera with a pixel pitch of approximately 2.5 µm. For the microscope detailed above, a complete image of the PSF requires a lateral dimension of approximately 160 µm, consuming 64x64 pixels of the CCD sensor. The emission wavelength of the guide-star was 550 nm, a common emission wavelength of many fluorophores.

The training, validation, and test data sets consisted of known aberration functions and their corresponding PSF images. Random aberration functions were generated using Zernike polynomials [32] but at no point are the polynomials or their coefficients used in either forward- or back-propagation. Zernike polynomials from $Z_1$ to $Z_{14}$ (OSA/ANSI indices) were utilized as they covered the strongest aberrations typically seen in microscopy whilst being accurately represented on the 64x64 output grid. Each generated aberration function was used to model the PSF of both the objective lens for testing purposes, and the tube lens for forward propagation through the ANN. The PSFs were generated by a fast Fourier transform (FFT) implementation [33] of vectorial Debye theory [34], which provides an excellent approximation to solution of the Debye integrals but with significantly reduced computational time. Details of the PSF image generation can be found in Appendix B.

## 5. Network validation and testing

#### 5.1 Learning validation

Validation that the ANN can map PSFs to aberration functions can be seen in Fig. 3, which shows the performance of an ANN when trained on a training data set comprising PSF images generated from *single* Zernike polynomial aberrations, but tested using a test data set comprising PSF images generated from Zernike polynomials *combinations*. The training data comprised 10,000 randomly selected polynomials between $Z_1$ and $Z_{14}$ that were each assigned a random root-mean-square (RMS) magnitude between $\pm \pi /3$. A set of additional single polynomial aberrations were created as a validation data set. Network convergence during the first 25 of 100 epochs can be seen in Fig. 3(a). Histograms of the residual RMS aberration function magnitude generated by the validation data set can be seen in Fig. 3(b) after various lengths of learning. A significant reduction in magnitude occurs in as few as 5 epochs and a final relative error of 0.02 % is achieved after the full 100 epochs.

Testing of the ANN comprised the forward-propagation of 10,000 aberration functions from a test data set generated by the summation of random combinations of Zernike polynomials and coefficients. Each of these unique aberration functions was assigned a random RMS magnitude between $\pm \pi /3$. Figure 3(c) shows a comparison between the true and residual phase magnitudes of the test data set, whilst Fig. 3(d) shows the corresponding Strehl ratio of the objective lens PSFs. The network is clearly able to learn from single-polynomial aberrated PSFs and predict aberration functions in much more complicated PSFs effected by multiple-polynomial aberrations.

#### 5.2 Prediction capability

The full capability of the network when trained on a full set of randomized aberration functions can be seen in Fig. 4. In this case the ANN was trained, validated and tested on different sets of randomized aberration functions comprised of *combinations* of Zernike polynomials. The training was again performed for 100 epochs and converged to a relative error of 1.4 %. The reduced convergence and learning rate of the stochastic gradient descent algorithm when faced with a completely randomized data set can be seen in Figs. 4(a) and 4(b). However, once converged the ANN shows superior prediction of aberration functions.

Histograms of the true and residual aberration function magnitudes from the test data set can be seen in Fig. 4(c) and reveal a 6-fold improvement in the average magnitude from $0.2\pm 0.1~\pi$ to $0.03\pm 0.02~\pi$. The test set in this case comprised a further 10,000 random aberration functions generated from Zernike polynomial *combinations*. The Strehl ratio, shown in Fig. 4(d), shows similar improvement. The average Strehl ratio improves from $0.8\pm 0.2$ to $0.99\pm 0.02$ with 97 % of corrected PSFs improving to a Strehl ratio of at least 0.95. A video of the ANN aberration function prediction in operation is available in Visualization 1. The video illustrates the generation of the test data, the prediction of the aberration function, and the testing of RMS aberration function magnitude and Strehl ratio.

#### 5.3 Effect of the aberration magnitude

The effect of the true aberration function magnitude of the test data set on the prediction capability of the ANN can be seen in Fig. 5. At magnitudes approaching zero, the ratio of true to residual aberration magnitude $\psi ^\prime /\left (\psi ^\prime -\psi \right )$ diminishes and begins to drop below 1. This indicates that the ANN predicted aberration function when used in adaptive optics can result in an increase in the aberration magnitude. The reason for this behavior is simple: ambiguity in the training process. As the magnitude of any true aberration function approaches zero, the corresponding PSFs all approach an identical diffraction limited form. Accordingly, the network is increasingly hindered in discerning PSF intensity distributions. Figure 5 also shows that optimal improvement up to a factor of 18 occurs at the center of the training data magnitude range where the continuity of the data set is greatest. The spread of improvement at a given magnitude can be attributed to the breadth of different PSF shapes that different aberration functions of the same magnitude can produce. Some functions provide more drastic changes to the PSF that are more easily learnt.

#### 5.4 Effect of the aberration polynomial

The performance of the ANN in predicting well known aberration functions can be seen in Fig. 6. In this simulation, the performance of the ANN was evaluated individually using a test data set comprising aberration functions generated by each of the 14 Zernike polynomials used in generating the training data set. The magnitude of each individual polynomial was varied between $\pm \pi /3$ radians such that the entire training range was probed. To remove the influence of overfitting, the average behavior of 100 individually trained networks was calculated.

Figure 6(a) and 6(b) show the typical performance of the ANN when predicting astigmatism ($Z_5$) and quadrafoil ($Z_{14}$) which correspond to the best and worst predictions. The thin grey curves represent the individual performance of five unique ANNs, whilst the thick blue curve represents the average performance of all 100 ANNs. The variation in performance between ANNs is symptomatic of overfitting [26], where a selection of aberration functions in the test data set that are close to the magnitudes and polynomials in a training data set are predicted better than others. Fig. 6(c) shows a statistical breakdown of polynomial performance. Astigmatism ($Z_3$ and $Z_5$) is predicted markedly better than most aberrations, followed by spherical aberration ($Z_{12}$) and second order astigmatism ($Z_{12}$ and $Z_{14}$). The PSFs of these aberrations share a common feature: they exhibit strong axial asymmetry such that the two axially offset CCD cameras provide distinct input to the ANN. Conversely, aberrations that are symmetric along the optical axis such as tilt ($Z_{1}$ and $Z_{2}$) and coma ($Z_{7}$ and $Z_{8}$) appear more difficult to predict.

#### 5.5 Effect of noise

The performance of the ANN when subject to noise can be seen in Fig. 7. The test data set PSF images previously generated from random aberration functions were impaired by increasing levels of uniform noise and forward propagated through an ANN trained on a noiseless dataset. The noise was white noise with a peak to peak magnitude of up to 10 % of the diffraction limited PSF peak intensity. Figure 7(a) and 7(b) show the effect of increasing noise on the residual aberration function magnitude and corrected Strehl Ratio, respectively. When noise increases to 5 % of the maximum PSF intensity, the mean residual magnitude increases from $0.03\pm 0.02~\pi$ to $0.20\pm 0.05~\pi$, whilst the corrected Strehl ratio decreases from $0.99\pm 0.02$ to $0.76\pm 0.09$.

The reduction in performance with increasing noise is significant but can be restored by using noisy training data. Figures 7(c) and 7(d) show the residual aberration function magnitude and Strehl ratio respectively for an ANN trained on a training data set comprising PSF images effected by white noise with a peak to peak magnitude of 5 %. The test data set remained unchanged. At an test noise level of 5 %, the residual magnitude returns to a mean of $0.04\pm 0.03~\pi$, whist the corrected Strehl ratio returns to an average of $0.98\pm 0.03$. A typical tube lens PSF image pair compromised by 5 % noise can be seen in Fig. 7(e). When this PSF pair is forward propagated through the network trained on noiseless data, the predicted aberration function is poor as seen in Fig. 7(f). The improved aberration function prediction for the same PSF data propagated through a network trained on 5 % noise can be seen in Fig. 7(g).

## 6. Discussion

The learning capability of ANNs enables them to map complex nonlinear relationships and return a prediction in a single computationally efficient forward propagation. The ANN investigated in this work demonstrates that direct determination of an aberration function is possible when appropriately trained on PSF images of nonlinear or fluorescent guide stars, which can provide advantages over traditional direct measurement sensors. The use of standard CCD cameras as the sensing element eliminates the need for specialist wavefront sensors, and aberration function can also be returned in a resolution determined by computational rather than physical limitations (eg, micro-lens diameter). The ANN also benefits in signal to noise ratio when compared to image-transfer based sensors that analyze a comparatively unfocused beam. In terms of computational expense, the network is quite efficient. Generation of the 10,000 training and test data sets took 4 minutes using Matlab on a typical desktop workstation (HP Z640 with Intel Xeon E5-2620 processor), training for a full 100 epochs took 100 minutes, and a single forward propagation took 0.3 ms.

However, ANN based direct aberration function determination must be used appropriately. The tendency of PSFs to approach a uniform diffraction limited shape means that the prediction ability of the ANN is hindered at low aberration function magnitudes, a characteristic which may be mitigated by the use of additional PSF images [28]. The ANN must also be trained on the specific range of magnitudes for which it is designed to encounter in the optical system, and on the noise anticipated to be present in the PSF images. In the cases where aberration function prediction may be imperfect, the prediction may well be useful as an improved starting point for traditional indirect optimization algorithms. Implementation of the ANN with a particular microscope should also be accompanied by re-training based on the particular optical setup of that microscope. Where a microscope allows access to an image plane at a camera port, implementation of a dual CCD camera system can be achieved by the use of image relay optics. Where the image path cannot be accessed or modified, training on single PSF images using the default CCD camera is certainly possible although such ANNs can not lean the sign of axially symmetric aberrations such as defocus [19].

It should also be noted that the perceptron network itself needs to be considered carefully due to its tendency to overfit [35]. Whilst more complex networks such as convolutional neural networks can overcome these limitations, perceptron based ANNs can be implemented optically via diffractive ANNs and are worthy of investigation. In such a case, the effect of overfitting can be mitigated by dropout techniques [35].

## 7. Conclusion

High NA adaptive optical systems demand rapid determination of aberration functions. In this work we investigated an ANN for the direct determination of aberration functions from PSF images in microscopy. In a single forward propagation of the ANN, a predicted two-dimensional aberration function in radians is returned without the need for reconstruction from Zernike polynomials. We numerically modeled the ANNs capability to predict aberration functions in fluorescent/nonlinear guide-star applications and demonstrated strong reduction in aberration magnitude is possible after 100 epochs of training. The reduced magnitude was found to significantly improve the Strehl ratio to an average of 0.98. We analyzed the ANNs performance when exposed to various aberration function magnitudes, aberration polynomial, and noise revealing its robustness. Such an ANNs may provide an improved starting position for indirect modal and segmentation schemes, may be used as a direct measurement scheme on its own, or may form the basis of new adaptive optic systems determining aberration in the optical domain [30,31].

## Appendix A: Neural network architecture

## A.1 Forward propagation

The input layer $I$ accepts flattened and appended PSF images. Each neuron in the hidden layer $H$ accepts weighted input from each of the neurons in the input layer by the weight matrix $W$ such that the input to the hidden layer $H_i$ can be expressed by the matrix multiplication

The input to the hidden layer is then passed through a nonlinear activation, which is performed via the Sigmoid function such that the output of the hidden layer $H_o$ is given by

Finally, the output layer $\psi$ accepts weighted input from the hidden layer via the weight matrix $V$, such that the output aberration function is given by

## A.2 Back propagation

Backpropagation is performed by stochastic fixed step size gradient decent based on the error function in Eq. (1). After each iteration, the weight matrices were adjusted by the relations

where $n$ is the training iteration, $\alpha _W$ and $\alpha _V$ are learning rates, $\circ$ signifies a Hadamard product, and $^T$ is the matrix transpose operator.## Appendix B: Optical training data

PSF images were simulated by a fast Fourier transform implementation [33] of vectorial Debye theory [34]. Each PSF image comprised three electric field components

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **M. J. Booth, “Adaptive optical microscopy: the ongoing quest for a perfect image,” Light: Sci. Appl. **3**(4), e165 (2014). [CrossRef]

**2. **P. S. Salter and M. J. Booth, “Adaptive optics in laser processing,” Light: Sci. Appl. **8**(1), 110 (2019). [CrossRef]

**3. **J. M. Girkin, S. Poland, and A. J. Wright, “Adaptive optics for deeper imaging of biological samples,” Curr. Opin. Biotechnol. **20**(1), 106–110 (2009). [CrossRef]

**4. **P. S. Salter and M. J. Booth, “Focussing over the edge: adaptive subsurface laser fabrication up to the sample face,” Opt. Express **20**(18), 19978–19989 (2012). [CrossRef]

**5. **B. P. Cumming, M. D. Turner, G. E. Schröder-Turk, S. Debbarma, B. Luther-Davies, and M. Gu, “Adaptive optics enhanced direct laser writing of high refractive index gyroid photonic crystals in chalcogenide glass,” Opt. Express **22**(1), 689–698 (2014). [CrossRef]

**6. **R. D. Simmonds, P. S. Salter, A. Jesacher, and M. J. Booth, “Three dimensional laser microfabrication in diamond using a dual adaptive optics system,” Opt. Express **19**(24), 24122–24128 (2011). [CrossRef]

**7. **J.-W. Cha, J. Ballesta, and P. T. C. So, “Shack-hartmann wavefront-sensor-based adaptive optics system for multiphoton microscopy,” J. Biomed. Opt. **15**(4), 046022 (2010). [CrossRef]

**8. **C. Bourgenot, C. D. Saunter, G. D. Love, and J. M. Girkin, “Comparison of closed loop and sensorless adaptive optics in widefield optical microscopy,” J. Eur. Opt. Soc. - Rapid publications **8**, 13027 (2013). [CrossRef]

**9. **R. Aviles-Espinosa, J. Andilla, R. Porcar-Guezenec, O. E. Olarte, M. Nieto, X. Levecq, D. Artigas, and P. Loza-Alvarez, “Measurement and correction of in vivo sample aberrations employing a nonlinear guide-star in two-photon excited fluorescence microscopy,” Biomed. Opt. Express **2**(11), 3135–3149 (2011). [CrossRef]

**10. **X. Tao, B. Fernandez, O. Azucena, M. Fu, D. Garcia, Y. Zuo, D. C. Chen, and J. Kubby, “Adaptive optics confocal microscopy using direct wavefront sensing,” Opt. Lett. **36**(7), 1062–1064 (2011). [CrossRef]

**11. **N. Olivier, D. Débarre, and E. Beaurepaire, “Dynamic aberration correction for multiharmonic microscopy,” Opt. Lett. **34**(20), 3145–3147 (2009). [CrossRef]

**12. **D. Débarre, M. J. Booth, and T. Wilson, “Image based adaptive optics through optimisation of low spatial frequencies,” Opt. Express **15**(13), 8176–8190 (2007). [CrossRef]

**13. **A. Jesacher, G. D. Marshall, T. Wilson, and M. J. Booth, “Adaptive optics for direct laser writing with plasma emission aberration sensing,” Opt. Express **18**(2), 656–661 (2010). [CrossRef]

**14. **D. E. Milkie, E. Betzig, and N. Ji, “Pupil-segmentation-based adaptive optical microscopy with full-pupil illumination,” Opt. Lett. **36**(21), 4206–4208 (2011). [CrossRef]

**15. **N. Ji, D. E. Milkie, and E. Betzig, “Adaptive optics via pupil segmentation for high-resolution imaging in biological tissues,” Nat. Methods **7**(2), 141–147 (2010). [CrossRef]

**16. **Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature **521**(7553), 436–444 (2015). [CrossRef]

**17. **Q. Zhang, H. Yu, M. Barbiero, B. Wang, and M. Gu, “Artificial neural networks enabled by nanophotonics,” Light: Sci. Appl. **8**(1), 42 (2019). [CrossRef]

**18. **D. G. Sandler, T. K. Barrett, D. A. Palmer, R. Q. Fugate, and W. J. Wild, “Use of a neural network to control an adaptive optics system for an astronomical telescope,” Nature **351**(6324), 300–302 (1991). [CrossRef]

**19. **T. K. Barrett and D. G. Sandler, “Artificial neural network for the determination of hubble space telescope aberration from stellar images,” Appl. Opt. **32**(10), 1720–1727 (1993). [CrossRef]

**20. **J. R. P. Angel, P. Wizinowich, M. Lloyd-Hart, and D. Sandler, “Adaptive optics for array telescopes using neural-network techniques,” Nature **348**(6298), 221–224 (1990). [CrossRef]

**21. **T. Andersen, M. Owner-Petersen, and A. Enmark, “Neural networks for image-based wavefront sensing for astronomy,” Opt. Lett. **44**(18), 4618–4621 (2019). [CrossRef]

**22. **C. González-Gutiérrez, J. D. Santos, M. Martínez-Zarzuela, A. G. Basden, J. Osborn, F. J. Díaz-Pernas, and F. J. De Cos Juez, “Comparative study of neural network frameworks for the next generation of adaptive optics systems,” Sensors **17**(6), 1263 (2017). [CrossRef]

**23. **E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-storm: super-resolution single-molecule microscopy by deep learning,” Optica **5**(4), 458–464 (2018). [CrossRef]

**24. **W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nat. Biotechnol. **36**(5), 460–468 (2018). [CrossRef]

**25. **Q. Xin, G. Ju, C. Zhang, and S. Xu, “Object-independent image-based wavefront sensing approach using phase diversity images and deep learning,” Opt. Express **27**(18), 26102–26119 (2019). [CrossRef]

**26. **A. P. Dzyuba, “Optical phase retrieval with the image of intensity in the focal plane based on the convolutional neural networks,” J. Phys.: Conf. Ser. **1368**(2), 022055 (2019). [CrossRef]

**27. **H. Ma, H. Liu, Y. Qiao, X. Li, and W. Zhang, “Numerical study of adaptive optics compensation based on convolutional neural networks,” Opt. Commun. **433**, 283–289 (2019). [CrossRef]

**28. **L. Möckl, P. N. Petrov, and W. E. Moerner, “Accurate phase retrieval of complex 3d point spread functions with deep residual neural networks,” Appl. Phys. Lett. **115**(25), 251106 (2019). [CrossRef]

**29. **P.-O. Vanberg, G. Orban de Xivry, O. Absil, and G. Louppe, “Machine learning for image-based wavefront sensing,” in 33rd Conference on Neural Information Processing Systems (NeurIPS), (Neural Information Processing Systems Foundation, 2019), Machine Learning and the Physical Sciences Workshop, p. 107.

**30. **E. Goi and M. Gu, “Laser printing of a nano-imager to perform full optical machine learning,” in 2019 Conference on Lasers and Electro-Optics Europe and European Quantum Electronics Conference, (Optical Society of America, 2019), OSA Technical Digest, p. jsi_p_3.

**31. **X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science **361**(6406), 1004–1008 (2018). [CrossRef]

**32. **M. J. Booth and T. Wilson, “Refractive-index-mismatch induced aberrations in single-photon and two-photon microscopy and the use of aberration correction,” J. Biomed. Opt. **6**(3), 266–272 (2001). [CrossRef]

**33. **M. Leutenegger, R. Rao, R. A. Leitgeb, and T. Lasser, “Fast focus field calculations,” Opt. Express **14**(23), 11277–11291 (2006). [CrossRef]

**34. **M. Gu, T. Asakura, K. Brenner, T. Hansch, F. Krausz, H. Weber, and W. Rhodes, * Advanced Optical Imaging Theory* (Springer, 2000).

**35. **N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, * Dropout: a simple way to prevent neural networks from overfitting*, vol. 15 (JMLR.org, 2014).