## Abstract

It is well known that in-line digital holography (DH) makes use of the full pixel count in forming the holographic imaging. But it usually requires phase-shifting or phase retrieval techniques to remove the zero-order and twin-image terms, resulting in the so-called two-step reconstruction process, i.e., phase recovery and focusing. Here, we propose a one-step end-to-end learning-based method for in-line holography reconstruction, namely, the eHoloNet, which can reconstruct the object wavefront directly from a single-shot in-line digital hologram. In addition, the proposed learning-based DH technique has strong robustness to the change of optical path difference between reference beam and object light and does not require the reference beam to be a plane or spherical wave.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Digital holography (DH) has been established as an important scientific tool for many applications such as the imaging, inspection, and metrology [1]. Typically, DH employs two major configurations: in-line and off-axis structure. The off-axis configuration allows the wavefront reconstruction from a single-shot hologram. But it cannot make use of the full space-bandwidth product (SBP) that a camera can offer to reconstruct the object wavefront [2]. In-line configuration, on the other hand, can use of the full pixel count, but it usually suffers from the twin-image problem. Many methods have been proposed to address this issue. The most conventional one is to record multiple holograms with different phase retardation in the reference beam [3–5], or in different planes [6–8]. The logical motivation along this line is to reduce the number of exposure so as to increase the efficiency. For instance, the phase-shifting methods have been implemented in a parallel manner [9]. An alternative approach is to use the iterative phase retrieval technique [10–12] to reconstruct the whole wavefront from a single-shot in-line hologram [13]. However, iterative algorithms usually suffer from algorithmic issues, such as being sensitive to the initial guess of the solution and slow in convergence [14]. Filter theory [15] has also been used for this task. But this method relies on the assumption that the object must be real, and significantly weaker than the reference in intensity.

Here we propose an end-to-end learning approach for the reconstruction of in-line digital holography. We note that learning-based methods have been widely used to solve problems in divergent fields, such as speech recognition, visual object recognition, object detection, and among many others [16]. Learning method has also been used in solving inverse problems in optical imaging. For example, Horisaki *et al* [17] have proposed to use the support vector regression (SVR) to recover the image through scattering layer. Lyu *et al* have developed a deep-neural-network (DNN) architecture for imaging through an optically thick scattering media [18] and for the siginficant reduction of sampling rate in computational ghost imaging [19]. Sinha *et al* [20] have proposed a convolutional neural network (CNN) based technique for phase imaging. Yuan and Pu have implemented lensless compressive imaging in a parallel manner [21]. And Jiang *et al* have developed a CNN model for image reconstruction in the Fourier ptychography [22]. Particularly, in the field of digital holography, Rivenson *et al* [23] have demonstrated a holographic image reconstruction method using the CNN. Pitkäaho *et a* [24] and Ren *et al* [25] have proposed to use the CNN for automatic focusing distance calculation in holographic reconstruction. Wu *et al* have taken a further step to extend the depth of field in holographic imaging [26]. Nguyen *et al* [27] have proposed to compensate the phase aberration in digital holographic microscopy. However, in all these studies [23–27], the holographic reconstruction is still calculated as a two-step process. That is, object wavefront recovery and, usually, reverse diffraction. In the first step, the elimination of the zero-order the twin image terms is crucial. While in the second step, the search for the exact focusing distance is important [29]. For example, in Refs. [23, 26, 27], the wavefronts of all the diffraction orders are fist recovered from the hologram and then a CNN architecture is used either to eliminate the unwanted but overlapped terms, or to compensate the aberration. In Refs. [24, 25], on the other hand, the learning method is employed to search the focus distance, after object wavefront has been recovered using conventional methods. In contrast, the eHoloNet we propose in this manuscript is a one-step method. It directly learns the correspondence between the object and the associated hologram so that it can reconstruct the object wavefront directly from a single-shot in-line hologram, reversing the diffraction and eliminating the unwanted terms in a one-step process. Meanwhile, note that holography usually requires highly stable acquisition environment, precise adjustment and accurate parameter measurement. But we find that the eHoloNet method is pretty robust against the change of the optical path difference in different holograms as it can learn all these features from the interference patterns. and we have no requirement for the reference beam’s wavefront.

The manuscript is organized as follows: The experimental setup and eHoloNet architecture we developed is presented in Section 2, while the results and analysis are presented in Section 3. Part of this work has been presented in the 2018 Optical Society of America Topical Meeting on Digital Holography and 3D Imaging [28].

## 2. Methods

#### 2.1. Experimental setup

The schematic experimental setup is showed in Fig. 1, which is in principle a Michelson interferometer. Polarized light emitted from a He-Ne laser source (Thorlabs, HRS015) with the wavelength *λ* = 633 nm was first coupled into a polarization maintaining fiber with the core diameter equal to 10 *µ*m (not shown in Fig. 1). The fiber acted as a low-pass filter so that the light emitted from the end can be regarded as a point source. After collimated by the lens L (. = 250 mm), the light transmitted through a polarizer P and was then split into two arms by a beam splitter (BS). In the transmissive arm, a spatial light modulator (SLM; Holoeye LETO) was placed about 20 cm away from the BS. This modal can introduce 8-bit pixel-wise phase-only modulation to the incident beam if it was horizontally polarized. Objects were displayed sequentially on the SLM during the experiment. The reference beam from the BS was reflected by a mirror placed about 16 cm away from the BS and served as a constant reference. The object beam and the reference beam were recombined by the same BS, and formed interference fringes on the camera plane, which was about 23 cm away from the BS. In our experiment, we carefully aligned the angle between the object beam and the reference beam so that they were in-line structure. An sCMOS (PCO Edge 4.2) camera was used to capture the in-line holograms without any imaging optic.

Suppose that the object displayed on the SLM is written as *O*_{0}(*x*_{0}, *y*_{0}), where (*x*_{0}, *y*_{0}) is the coordinates in the object plane, and *R* (*x*, *y*) = *C* is the reference beam, with *C* and (*x*, *y*) being a complex constant and the coordinate at the hologram plane, respectively, the captured in-line hologram *I*(*x*, *y*) can be written as

*O*

_{0}(

*x*

_{0},

*y*

_{0}) over distance

*z*to the (

*x*,

*y*) plane and the symbol * is the phase conjugate.

Conventionally, the reconstruction of *O*_{0}(*x*_{0}, *y*_{0}) from the hologram *I* (*x,* *y*) is achieved by numerically calculating the reversed diffraction of $\mathcal{P}$, so that the third term in Eq. (1) yields the object wavefront. But such direct calculation usually results in strong disturbance contributed by the other three terms.

#### 2.2. Learning-based in-line holographic reconstruction

### 2.2.1. The end-to-end learning approach

Alternatively, the method we propose here employs a CNN to reconstruct the object wavefront from its in-line hologram. First we can rewrite Eq. (1) as

where ℱ{●} is the forward process operator that transforms the object*O*(

*x*

_{0},

*y*

_{0}) in the object plane to

*I*(

*x,*

*y*) in the image plane. It has the form described by Eq. (1). The reconstruction process then can be expressed as where ℛ{●} represents the mapping of the hologram

*I*to the object wavefront

*O*It is not equivalent to the inverse operator of ℱ. Note that ℛ{●} in this work is different from the one step processes such as scattering [17], diffraction [20] and interference [23]; it in fact represents a combination of two processes. The first one is a diffraction propagation and the second is the removal of the unwanted terms.

The end-to-end learning approach to solve this problem is to use a set of data consisting of ground-truth objects *O _{n}*, and their corresponding holograms

*I*, where

_{n}*n*= 1, …,

*N*, to learn the parametric inverse mapping operator ℛ

_{learn}which represents the reconstruction algorithm [30,31]. The objective function in this case can be defined as

*f*{●} is a

*loss functio*to measure the error between

*O*and ℛ

_{n}*{*

_{θ}*I*}, and Ψ (

_{n}*θ*) is a regularizer on the parameters with the aim of avoiding overfitting [31]. The set Θ has two types of parameters. The first type includes the parameters that specify the structure of the network, such as the type of the network, the number of layers and neurons in each layer, and the size of the convolutional kernel, etc. This type of parameters needs to be determined before training. The other type includes the internal weights of different convolutional kernels. They can be adjusted automatically during the training of the neural network [32]. Once the inverse mapping has been learned, the neural network can be used to retrieve the object from the in-line hologram directly, with the removal of all the unwanted zero-order and twin image terms.

### 2.2.2. The eHoloNet structure

When establishing and adjusting the structure of the neural network, we were attempting to establish the underlying structure of ℛ_{learn}{●}. We used the most extensively studied neural network, convolutional neural network, to establish our neural network structure [33]. Partially inspired by the work of [23, 34], we propose the eHoloNet structure, which is plotted in Fig. 2(a). Basically, the eHoloNet is based on the U-net structure [34], which manages to learn the features that end-to-end map the holograms and the corresponding images in the training set at different scales and then use these features to reconstruct the images out of the training set.

As shown in Fig. 2(b), there are mainly three types of functional blocks in the eHoloNet: the convolutional block, the residual block and the upsampling block.

For the convolutional block, it is consisted of a convolutional layer and an activation function. The value of the *x,* *y*-th pixel in the *i*-th feature map in the *i*-th convolutional layer (*i* ⩾ 2) is given by [23]

*b*is a common bias term for the

_{i,j}*j*-th feature map,

*r*indicates the

*r*-th feature map of the (. − 1) layer and

*R*is the total number of feature maps in the (. − 1)-th convolutional layer. When

*i*= 1,

*i*= 1 and ${v}_{1,1}^{x,y}$ represents the intensity value of input image at the pixel (

*x,*

*y*), ${w}_{i,j,r}^{p,q}$ is the weight of the convolutional kernel at the position (

*p,*

*Q*), and

*P*×

*Q*represents the size of the convolutional kernels in pixel counts, which is 5 × 5 in the first 3 convolutional blocks and 3 × 3 pixels in all the other convolutional layers. The convolutional kernel shifting stride is (1, 1) along the two transverse spatial dimensions. This means that the convolutional kernel shifts 1 pixel along the

*x*and

*y*directions each time and multiples with the image. Followed the convolutional layer is the activation function. For convenience, the rectified linear units (ReLU) [16] is used all the time in our neural network, i.e., ReLU(.) = max (0,

*x*).

The residual block includes 2 convolutional layers and 2 activation functions in the eHoloNet. Here the image processed by two convolutional layers and two activation functions is added with the image incoming to this block and forms the output image to the next block. In this way, it creates a shortcut between the input and output so as to optimize the neural network and gain accuracy from considerably increased depth [35].

The upsampling block, which is used to enlarge the image size and decode the convolutional process, includes a transposed convolutional layer and an activation function. The reason why we used a transposed convolutional layer here is that it can be seen as the decoding layer of a convolutional layer or as the projection of the feature maps to a higher-dimensional space [36]. The transposed convolutional operator is similar to the convolutional operator, except that it first enlarges the size of the image coming from the previous block by a factor equal to the stride value by means of inserting some zero-value pixels between two neighboring image pixels, and then performs a convolutional operation on the upsampled image [33]. Specifically in the eHoloNet architecture, the strides of the transposed convolutional layer is 2 × 2, which means that it adds one zero pixel between two neighboring image pixels, doubling the image size. The kernel size in the transposed convolutional layers is 5 × 5 for the last 3 blocks and 3 × 3 in the other blocks. All the kernel values are initialized with random numbers from a truncated Gaussian distribution and the values of biases are initialized as constant.

The hologram image *I _{n}* with the size of 768 × 768 pixels was used as the input to the neural network. The image was first sent to a convolutional block and then processed by a pooling layer. The pooling layer downsampled the feature maps of the incoming hologram and passed them to the next convolutional block. It could help to significantly reduce the spatial dimension of the representation and the number of internal weights which are needed to reconstruct the holographic image. We used the max-pooling layer to obtain the maximum pixel value among a small region with the size of 2 × 2 pixels in the eHoloNet so as to reduce the image size by a factor of 2. Then a dropout operation was used to avoid overfitting [37]. The keep probability of dropout was set as 0.9. The combination of a convolutional block, a max-pooling layer and the dropout operation was repeated 3 times, reducing the image size from 768 × 768 to 96 × 96 pixels. Then the network was divided into 4 independent paths, each of which had a max-pooling layer [indicated by the downward arrows in Fig. 2(a)] to downsample the incoming image from the previous layer by the factor of ×1, ×2, ×4, path ×8, respectively, creating 4 independent data flow paths. The image size in each path then became 96 × 96, 48 × 48, 24 × 24, 12 × 12 pixels respectively. For each path, the output of the downsampling layer was then sent to a set of 4 identical residual blocks, which were followed by a set of up-sampling blocks to resize the image to 96 × 96 pixels. Note that the upsampling layer only enlarges the image size by a factor of 2, the number of upsampling layers that is required in different paths is different. All the 4 different paths’ feature maps were then concatenated into one image, which was then processed by 2 subsequent convolutional blocks. Finally, 3 upsampling blocks were used to restore the image size to 768 × 768 pixes.

### 2.2.3. Network training

Now we discuss the way to train the network. We define the loss function as the mean square error (MSE):

*W*,

*H*is the width and height of the reconstructed image,

*N*

_{1}= 3 is the batch size,

*Õ*(

*u*,

*v*) is the reconstructed image from the

*n*

^{th}hologram In and

*O*is the corresponding ground-truth object image. According to the stochastic gradient descent (SGD) method [38] we randomly selected 3 images from the training dataset and calculated the MSE values of them with respect to the corresponding reconstructed images according to Eq. (6).

_{n}We used the back-propagation algorithm [32] to back propagate the error into the network, and the Adaptive Moment Estimation (Adam) [39] based optimization to optimize the weights. The training step was 300,000 for both two datasets so the training epochs for the MNIST dataset was about 90 and the resolution target dataset was about 71, the learning rate was 10^{−4}.

## 3. Results and discussion

#### 3.1. Data preparation

We employed the MNIST dataset [40] and a digital image of the USAF 1951 resolution chart as the training and test data in our proof-of-concept study. The handcraft MNIST dataset we used contains 10000 images which were obtained by resizing the original 28 × 28 images to the size of 768 × 768. As for the resolution chart dataset, it contained 12, 651 images which were obtained by first cutting 575 different parts of the digital USAF image, which were then followed by the processes of rotating, resizing, overturning. In Fig. 3, we show some examples projected on the SLM. All these images were displayed sequentially on the phase-only SLM, and the corresponding holograms were acquired by the sCMOS camera (2048 × 2048 pixels). Then we cut the central 768 × 768 pixels of the captured holograms, forming the holographic images input to the neural network. The dataset was divided into two disjoint subsets: the training set and the test set. For the case of MNIST, we used 9000 images together with the corresponding holograms to train the eHoloNet. The other 1000 images were used to test the performance of the trained network. As for the resolution chart dataset, the training dataset contained 11623 images together with the corresponding holograms, whereas the test set contains the other 1028 randomly selected images. It took about 16 hours to capture all these 22, 651 holograms.

#### 3.2. Experimental results

We implemented the eHoloNet using Tensorflow in a GPU workstation with a Quadro P6000 graphics card (NVIDIA). It took about 10 hours to optimize the network model ℛ_{learn} described by Eq. (4) for each of the dataset. When the network model was trained optimally, we could use it to reconstruct the test holograms. The results were plotted in Fig. 4. Figure 4(b) and 4(e) shows the reconstructed images from the holograms [Fig. 4(a) and 4(d)] of the objects in the test dataset [Fig. 4(c) and 4(f)]. From the result shown in Fig. 4, one can see that the images are successfully reconstructed from the single-shot in-line holograms, with the elimination of all the unwanted zero-order and twin image terms.

#### 3.3. Discussion

### 3.3.1. Accuracy

To evaluate the accuracy of the eHoloNet architecture, We used two different quantitative evaluation metrics to evaluate our reconstructed images, namely, the root of mean square error (RMSE)

*µ*,

_{f}*f*∈ {

*O, Õ*}, is the mean of the image

*f*, and ${\sigma}_{f}^{2}$ is the variance,

*σ*is the covariance of

_{õo}*Õ*and

*O*, and

*c*

_{1}and

*c*

_{2}are regularization parameters.

The RMSE is an average measure of the difference between two variables, and its best possible score is 0, whereas the SSIM is used for measuring the structural similarity between two images and the best value is 1. We calculated the RMSE and the SSIM averaging over all the test images as the metric values. For the MNIST data in our experiments, RMSE = 21, and SSIM = 0.92. As for the USAF chart, RMSE = 6, and SSIM = 0.99. The average deviation of the reconstructed phase with respect to the ground truth one is about 0.16 *π* for the MNIST data and 0.05 *π* for the USAF chart.

### 3.3.2. Robustness

In conventional reconstruction methods, we usually need to precisely control some parameters such as the optical path difference between the object beam and the reference beam, and the wavefront shape of the reference beam. For instance, phase-shift holography usually needs to precisely change the phase retardation of the reference beam and has high requirement for the reference beam as a plane wave. In our case, however, it took us about 16 hours to capture all holograms; the optical system inevitably changed during the course of hologram acquisition. As a consequence, the holograms we used to train the neural network were actually captured under different conditions. But the eHoloNet is able to learn all these features from the training set, as seen in Fig. 5.

Figure 5(a) shows examples of holograms we randomly selected from the training set, presenting the shift of the fringe patterns owing to the change of phase difference between the object and the reference beams. However, since the neural network learns all these features, it can be used to reconstruct the object wavefront [Fig. 5(c)] from the holograms [Fig. 5(b)] of the test images recorded with different phase retardation in the reference beam. One can see that all the zero-order and twin image terms have been successfully removed comparing the reconstructed images with the original ones shown in Fig. 5(d).

To test the maxium optical path difference the eHoloNet can endure, we used a Piezo Phase Shifters (S303.CD, Physik Instrumente) to introduce a phase retardation of *π* in the reference beam, and captured the holograms of the images in the test set. Then we used the eHoloNet for the reconstruction of these holograms. The result is plotted in Fig. 6. Figure 6(c) and Fig. 6(e) suggest that one can reconstruct the image from the single-shot in-line hologram successfully even when the phase difference has been changed up to *π* with respect to the ideal configuration for the acquisition of the holograms in the training set. Note that the maximum change of effective phase difference between the object beam and the reference beam is *π*. This result suggests that the eHoloNet learns the features that the change of phase retardation induced to the interference pattern.

Now let us examine how the shape of the reference beam affects the reconstruction. In particular, we compare it with the phase shifting algorithm. It is well known that phase-shifting holography usually requires the reference beam to be a plane wave [1]. Otherwise, the phase residue will yield error in the evaluation of the object wavefront, as shown in Fig. 6(f) and Fig. 6(g). However, the experimental results shown in Fig. 6(c) and Fig. 6(e) suggest that the eHoloNet does not have such a strict requirement as the network can learn the effect of such phase residues as well.

## 4. Conclusion

In conclusion, we have proposed the eHoloNet, a learning-based one-step end-to-end approach for in-line holographic reconstruction. By using the eHoloNet, one can directly reconstruct the object wavefront from a single-shot in-line hologram, with the removal of the unwanted zero-order and twin image terms. The reconstruction is a one-step process, i.e, end-to-end mapping the hologram directly to the object wavefront, without the need of back-propagation. The eHoloNet can learn the features that the hologram possesses due to the change of phase retardation between the two arms. Thus it is highly robust against the deviation of phase difference between the two arms in the test holograms with respect to the training holograms.

## Funding

Key Research Program of Frontier Sciences, Chinese Academy of Sciences (QYZDB-SSW-JSC002); the National Natural Science Foundation of China (61705241, 61327902); the Natural Science Foundation of Shanghai (No. 17ZR1433800).

## References and links

**1. **U. Schnars and W. P. O. Jüptner, “Digital recording and numerical reconstruction of holograms,” Meas. Sci. Technol. **13**, R85–R101 (2002). [CrossRef]

**2. **L. Xu, X. Peng, Z. Guo, J. Miao, and A. Asundi, “Imaging analysis of digital holography,” Opt. Express **13**, 2444–2452 (2005). [CrossRef] [PubMed]

**3. **I. Yamaguchi and T. Zhang, “Phase-shifting digital holography,” Opt. Lett. **22**, 1268–1270 (1997). [CrossRef] [PubMed]

**4. **X. F. Meng, L. Z. Cai, X. F. Xu, X. L. Yang, X. X. Shen, G. Y. Dong, and Y. R. Wang, “Two-step phase-shifting interferometry and its application in image encryption,” Opt. Lett. **31**, 1414–1416 (2006). [CrossRef] [PubMed]

**5. **J.-P. Liu and T.-C. Poon, “Two-step-only quadrature phase-shifting digital holography,” Opt. Lett. **34**, 250–252 (2009). [CrossRef] [PubMed]

**6. **Y. Zhang and X. Zhang, “Reconstruction of a complex object from two in-line holograms,” Opt. Express **11**, 572–578 (2003). [CrossRef] [PubMed]

**7. **Y. Zhang, G. Pedrini, W. Osten, and H. J. Tiziani, “Reconstruction of in-line digital holograms from two intensity measurements,” Opt. Lett. **29**, 1787–1789 (2004). [CrossRef] [PubMed]

**8. **G. Situ, J. P. Ryle, U. Gopinathan, and J. T. Sheridan, “Generalized in-line digital holographic technique based on intensity measurements at two different planes,” Appl. Opt. **47**, 711–717 (2008). [CrossRef] [PubMed]

**9. **Y. Awatsuji, T. Tahara, A. Kaneko, T. Koyama, K. Nishio, S. Ura, T. Kubota, and O. Matoba, “Parallel two-step phase-shifting digital holography,” Appl. Opt. **47**, D183–D189 (2008). [CrossRef] [PubMed]

**10. **J. J. Barton, “Removing multiple scattering and twin images from holographic images,” Phys. Rev. Lett. **67**, 3106–3109 (1991). [CrossRef] [PubMed]

**11. **G. Liu and P. D. Scott, “Phase retrieval and twin-image elimination for in-line fresnel holograms,” J. Opt. Soc. Am. A **4**, 159–165 (1987). [CrossRef]

**12. **T. Latychevskaia and H.-W. Fink, “Solution to the twin image problem in holography,” Phys. Rev. Lett. **98**, 233901 (2007). [CrossRef] [PubMed]

**13. **T. Latychevskaia and H.-W. Fink, “Simultaneous reconstruction of phase and amplitude contrast from a single holographic record,” Opt. Express **17**, 10697–10705 (2009). [CrossRef] [PubMed]

**14. **L. Rong, Y. Li, S. Liu, W. Xiao, F. Pan, and D. Wang, “Iterative solution to twin image problem in in-line digital holography,” Opt. Lasers Eng. **51**, 553–559 (2013). [CrossRef]

**15. **L. Onural and P. D. Scott, “Digital decoding of in-line holograms,” Opt. Eng. **26**, 1124–1132 (1987). [CrossRef]

**16. **Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature **521**, 436–444 (2015). [CrossRef] [PubMed]

**17. **R. Horisaki, R. Takagi, and J. Tanida, “Learning-based imaging through scattering media,” Opt. Express **24**, 13738–13743 (2016). [CrossRef] [PubMed]

**18. **M. Lyu, H. Wang, G. Li, and G. Situ, “Exploit imaging through opaque wall via deep learning,” arXiv preprint arXiv:1708.07881 (2017).

**19. **M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. **7**17865 (2017). [CrossRef] [PubMed]

**20. **A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica **4**, 1117–1125 (2017). [CrossRef]

**21. **X. Yuan and Y. Pu, “Parallel lensless compressive imaging via deep convolutional neural networks,” Opt. Express **26**, 1962–1977 (2018). [CrossRef] [PubMed]

**22. **S. Jiang, K. Guo, J. Liao, and G. Zheng, “Solving Fourier ptychographic imaging problems via neural network modeling and TensorFlow,” Biomed. Opt. Express **9**, 3306–3319 (2018). [CrossRef] [PubMed]

**23. **Y. Rivenson, Y. Zhang, H. Günaydin, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light. Sci. Appl. **7**, 17141 (2018). [CrossRef]

**24. **T. Pitkäaho, A. Manninen, and T. J. Naughton, “Performance of autofocus capability of deep convolutional neural networks in digital holographic microscopy,” in “Digital Holography and Three-Dimensional Imaging, OSA Technical Digest (online) (Optical Society of America), paper W2A.5,” (2017).

**25. **Z. Ren, Z. Xu, and E. Y. Lam, “Learning-based nonparametric autofocusing for digital holography,” Optica **5**, 337–344 (2018). [CrossRef]

**26. **Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica **5**, 704–710 (2018). [CrossRef]

**27. **T. Nguyen, V. Bui, V. Lam, C. B. Raub, L.-C. Chang, and G. Nehmetallah, “Automatic phase aberration compensation for digital holographic microscopy based on deep learning background detection,” Opt. Express **25**, 15043–15057 (2017). [CrossRef] [PubMed]

**28. **M. Lyu, C. Yuan, D. Li, and G. Situ, “Fast autofocusing in digital holography using the magnitude differential,” Appl. Opt. **56**, F152–F157 (2017). [CrossRef] [PubMed]

**29. **H. Wang, M. Lyu, N. Chen, and G. Situ, “In-line hologram reconstruction with deep learning,” in *Image and Applied Optics*, OSA Technical Digest (Optical Society of America, 2018), paper DW2F.2.

**30. **K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process. **26**, 4509–4522 (2017). [CrossRef]

**31. **M. T. McCann, K. H. Jin, and M. Unser, “Convolutional neural networks for inverse problems in imaging: A review,” IEEE Signal Process. Mag. **34**, 85–95 (2017). [CrossRef]

**32. **D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature **323**, 533–536 (1986). [CrossRef]

**33. **J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen, “Recent advances in convolutional neural networks,” Pattern Recognit. **77**, 354–377 (2018). [CrossRef]

**34. **O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” arXiv preprint arXiv:1505.04597 (2015).

**35. **K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint arXiv:1512.03385 (2015).

**36. **V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” arXive preprint arXiv:1603.07285 (2016).

**37. **N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res. **15**, 1929–1958 (2014).

**38. **T. S. Ferguson, “An inconsistent maximum likelihood estimate,” J. Am. Stat. Asso. **77**, 831–834 (1982). [CrossRef]

**39. **K. Diederik and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980 (2014).

**40. **Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE **86**, 2278–2324 (1998). [CrossRef]

**41. **Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans Image Process. **13**, 600–612 (2004). [CrossRef] [PubMed]