Fast phase retrieval in off-axis digital holographic microscopy through deep learning

Gong Zhang; Tian Guan; Zhiyuan Shen; Xiangnan Wang; Tao Hu; Delai Wang; Yonghong He; Ni Xie

doi:10.1364/OE.26.019388

1. Introduction

Digital Holographic Microscopy (DHM) is a non-contact, highly sensitive and high-resolution coherent imaging technique, which is widely used in many fields such as cell imaging [1–5], microfluidics [6] and metrology [7,8]. A DHM often uses $G$ a microscope objective to record digital hologram and uses propagation techniques to recover the raw image information [9]. The irreplaceable part in the traditional reconstruction algorithm is to gradually adjust the detached objective image from the defocusing state to the focusing state. In recent years, many methods have been proposed for the problem of digital holographic autofocus. Those methods are usually based on the gradient operator [10] or image entropy [11] in image processing theory, or the methods based on tamura coefficient [12]. Other focusing methods, such as [13], use dual hologram obtained through designed reverse symmetrical optical path to get the focusing objective image. The core ideas of such methods can be expressed as follows: 1) Find a discriminant indicator that can represent the definition of an objective image $O (x, y; z)$ , and map $O (x, y; z)$ to a value $g (z)$ . 2) Traverse the possible area ( $d z_{1}$ , $d z_{2}$ , $d z_{3}$ …) around the focal point, generate a series of objective images ( $I_{1}$ , $I_{2}$ , $I_{3}$ ...), and calculate the corresponding definition index ( $g_{1}$ , $g_{2}$ , $g_{3}$ ...). 3) Return I which makes g largest. This idea can be given in the following formula F in Eq. (1):

F (d z, O) : C (x, y; z + d z) = \arg \max G (f (d z, O (x, y; z))) .

However, there exist many problems within those types of approach based on iteration. The biggest drawback is time consuming. First, let’s calculate the composition of the time cost of such schemes: (a) The time cost $T_{1}$ of using propagation techniques to generate objective $O (x, y; z + d z)$ image after diffracting in a certain distance (b) The time cost $T_{2}$ of calculate image definition index $g$ . (c) The time cost $T_{3}$ of searching for global maximum of $g$ . Assuming that the number of iterations is $T_{3}$ , and the time cost $T_{f}$ of a single focus process equals $N (T_{1} + T_{2} + T_{3})$ . Usually, in (a), the commonly used propagation techniques include a) Fresnel transform, b) convolution, c) angular spectrum d) Point diffusion function (PSF). Their specific calculation formulas are as follows in Eq. (2-5):

Fresnel transform $(d z, O)$ :

C (x, y; z + d z) = \frac{e^{j k d z}}{j λ d z} e^{\frac{j k (x^{2} + y^{2})}{2 d z}} \times f f t {O (x, y; z) e^{\frac{j k (x^{2} + y^{2})}{2 d z}}} .

Convolution

(d z, O)

:

C (x, y; z + d z) = f f t^{- 1} {f f t {O (x, y; z)} \times f f t {\frac{e^{j k d z}}{j λ d z} e^{\frac{j k (x^{2} + y^{2})}{2 d z}}}} .

Angular spectrum

(d z, O)

:

C (x, y; z + d z) = f f t^{- 1} {f f t {O (x, y; z)} e^{j k \sqrt{1 - {(λ f_{x})}^{2} - {(λ f_{y})}^{2}}}} .

PSF

(d z, O)

:

C (x, y; z + d z) = O (x, y; z) * \frac{d z}{i λ} \frac{e^{i k \sqrt{d z^{2} + x^{2} + y^{2}}}}{d z^{2} + x^{2} + y^{2}} .

We take the first solution, which owns the least time costs, as our computing standard, for it only needs one calculation of FFT (Take no consideration of the FFT of $O (x, y; z)$ , for it is not changing with $d z$ ). in (b), $G$ does not show single peak characteristic in many cases, and considering the noise effect, the whole algorithm may get into local optimal solution [14]. Those designed complex $G$ introduce also more computational costs. In (c), the hill-climbing algorithm can be used to accelerate the global search process [15], but in practice, global search is more commonly used. Based on the above reasons, the real-time problem of traditional algorithms are difficult to solve. Recently, deep learning has been used to predict the distance from focus and it has achieved good results [16]. In this paper, we try to explore an end-to-end approach to recover the raw objective image from the defocusing one directly within a given diffraction range.

Since Alex won the 2012 championship on ImageNet image classification competition [17], deep convolutional neural networks(CNN)have become the core algorithm model in image classification. The core idea of CNN is to use convolution operation to extract image features automatically as the input of the support vector machine (SVM) for target detection and classification tasks. Jonathan long proposed the full convolutional neural network(FCN)architecture for image semantic segmentation in 2015 [18], and then a large number of similar architecture like FCN came out, such as Deconvnet [19] or U-net [20] for medical image segmentation. Those similar network architectures can also be used for regression problems, such as Generative Adversarial Network (GAN) [21] or SRCNN [22] for super-resolution reconstruction. This end-to-end method is also widely used in the various tasks of optical detection, imaging and disease diagnosis. Deep learning improves optical microscopy, enhancing its spatial resolution over a large field of view and depth of field [23]. Freerk G. Venh uizen [24] uses U-net to automatic divide the intraretinal cystoid fluid in SDOCT. Thanh Nguyen [25] uses U-net to realize the automatic phase aberration compensation. In lensless computational imaging, CNN is used to study for the law of light-field propagation, and it is proved that the original phase field can be restored from the intensity field, which has propagated a certain distance in certain conditions [26]. In other works [27–30], researchers also have a lot of throughput points through diffractive pattern learning using deep learning.

Here in this paper, a modified U-net is proposed. The objective image of phase-type samples in DHM can be quickly focused. First, the U-net will be tested on data sets which lose additional background phase. The main purpose of this experiment is to test whether this end-to-end architecture can recover those phase-type objective image submerged in the interference fringes. Second, additional phases of different distributions are added to our data set, and test whether it could perceive and segment the interested region in such an environment. Finally, the trained U-net is utilized to restore the real sample’s phase image obtained by our off-axis DHM, and the imaging speed of our DHM can reach 9~10 fps.

2. Principles and methods

The whole idea of our experiment is shown as follows: (1). Use the off-axis holographic path to obtain the hologram of the sample. The speed of separating the objective image from hologram is faster than the coaxial optical path. (2). In the digital focusing part, the U-net is used instead of the iterative focusing process. The input of U-net is the intensity and phase of the defocus objective image, and the output of the network is the focusing phase image. We generate the corresponding defocus images through a series of hd focused images and use them as training sets to build our network. We hope that our network can quickly recover the defocus phase within a certain diffraction range. (3) Considering the phase wrap caused by the background additional phase, the random background interference is added to the training set, testing whether our network can recover the real object phase.

2.1 Construction of off-axis DHM

Our experimental platform is set up as follows in Fig. 1. The hologram of objects is obtained by off-axis DHM, and the phase images are recovered from them through filtering process and deep neural network. The working wavelength of $λ = 830.0$ nm is attained by filtering a superluminescent diode (SLD, Inphenix, center wavelength 844 nm, 3dB bandwidth 50.8 nm, 8 mW) with 3 nm laser line filter (Semrock; LLF, LL01-830). Two optical delay lines(ODLs) are here to adjust the optical path length difference between the reference and sample arms. A back focal plane of the objective lens (OL, Olympus, MPlan N 5x) is arranged to coincide with the front focal plane of the tube lens (TL, $f$ 125mm), which cancels the spherical phase curvature. The combination of lens1 and lens2 is set to adjust the beam width of the reference arm. Finally, the hologram of the sample is captured by a monochrome CCD (charge-coupled device, CCD, ZWO, ASI174MM).

Fig. 1 off-axis DHM system. SLD, superluminescent diode; SMF, single-mode fiber; CL, collimator; LLF, laser line filter; BS, beam splitter; ODL optical delay line; OL, objective lens; TL, tube lens.

Download Full Size | PDF

2.2 Construction of deep convolutional neural network

Our deep convolutional neural network uses a U-net architectural design, which is widely used in many fields of medical image segmentation [31–33]. In general, the symmetric encoder-decoder structure is mostly used in classify tasks, but we also find that it has good effect in many regression tasks, such as Image super-resolution [22] and image generation [34]. The architecture of our U-net is shown in Fig. 2(a). The input layer $X$ is the filtered defocus objective image from hologram. Because the objective image is a complex amplitude field, its intensity $| O (x, y; z) |$ and phase $\arg (O (x, y; z))$ can be used to fully express it. In the first half of the network architecture, the pattern of convolution + downsampling is repeated, and the other half of the pair is deconvolution + upsampling. The neural network uses this method to decompose and refactor the structure of the image. Finally, the required features are retained while those structures, which are not related to the target task is discarded. Using batch normalization [35] can accelerate the convergence of the entire network. U-net uses skip connections to record and pass part of the space information, especially high-frequency information learned in the encoder layers, to the corresponding decoder layer with the same image size [20]. Several full convolution layers are set in the last part of the network to make it learn more details, so that the final output reconstruction can be fine-tuned. The number of fine-tuned layers can be set as 2 to 7, which depends on the diffraction distance. The value of the last layer estimates our object phase $Y$ . In the second row of Fig. 2(b), the phase diagram is mapped by sines and cosines, which is done to eliminate the neural network misjudgment caused by the discontinuity of image pixels caused by phase wrapping. We will use this strategy in the training set that mixes with additional phases.

Fig. 2 (a) The architecture of U-net, indicating the number of layers, nodes in each layer, etc. The input is the intensity and phase information of the image, the size of which is 512x512. The output is the reconstructed phase of the objective image. (b) Two scenarios for network input. The first row represents the input of intensity and phase diagrams; The second row represents the input of intensity, sin(phase) and cos(phase) diagrams.

Download Full Size | PDF

2.3 Generation of the training set

Generated data set is used to simulate the real image collected by CCD. Reasons are as follows: (1) To collect enough data, different samples need to be repeated focused, which is not conducive to the practical development of algorithm. (2) By using propagation techniques, samples can be generated at any diffracting distance, and there is only one corresponding true phase. large enough samples can be produced and different additional phases can be optionally added. Here in those experiments, at least 10000 couples of generated images (phase and intensity) were used for different distances as training sets. According to specific application situations, the above mentioned optical propagation techniques can be applied to conduct a coarse focus experiment, determining the most suitable for systems. In this series of experiments, Eqs. (2) and (5) are used.

Figure 3 is the concrete process of generating training sets. As is shown in Fig. 3(a), select a random two-value pattern (without texture) as the object contour, and randomly select a texture pattern as the phase distribution. In 3(b), by mixing with the random gaussian noise and randomly adjusting the gray value of the object contour, the intensity map of the focusing state can be obtained. In 3(c), by superimposing the two patterns in 3(a), we obtain the simulated focusing phase distribution. 3(d) represents random additional background phase, which simulates the phase difference caused by an optical lens. We superimpose 3(c) with 3(d) and act together with 3(b) as a complex amplitude field. According to optical propagation techniques, as shown in 3(e), we can obtain the diffraction complex amplitude field at any distance. 3(f) is the result of mapping the phase of 3(e) by trig function. Our two training sets are Generated: (1) without considering the effect of 3(d), we set 3(e) and 3(c) as input and the corresponding truth value of the neural network. (2) considering the influence of 3(d), we use 3(f) and 3(c) as the input and the corresponding truth value of the neural network.

Fig. 3 Process of generating training sets. (a) object contour and texture pattern (b) The generated focused intensity of objective image. (c) The generated focused phase of objective image. (d) Additional background phase (e) The intensity and phase of the defocused objective image. (f) The intensity, cos(phase) and sin(phase) of the defocused objective image.

Download Full Size | PDF

In the first set of experiments, the data sets are generated at diffraction distance of $25 \pm 10$ mm, $95 \pm 10$ mm and $155 \pm 10$ mm respectively, using Eq. (3), the main purpose of which is to investigate whether deep neural network can recover phase at different defocused state. In the second set of experiments, we add the additional background and set the defocusing range to $\pm 10$ um especially for our off-axis DHM (Compared with the general digital holography, the microimaging defocus control is more precise), use Eq. (5) for generating training set near focus, and test whether deep neural network can recover phase of true sample.

2.4 Training methods for neural networks

The connection weights in our U-net are trained using backpropagation on our loss function. It is defined as follows in Eq. (6):

L = \frac{1}{w h} (a_{1} w_{1} + a_{2} w_{2} + a_{3}) \cdot | | Y - G | |_{2} .

Here: G represents the corresponding ground truth. $a_{1}$ , $a_{2}$ and $a_{3}$ are three weights. In order to learn the reconstruction of high-frequency information of phase image, two weight matrices are introduced: $w_{1}$ and $w_{2}$ . Their specific definition is below:

w_{1} = m a g (\nabla X) = {[{(\frac{\partial X}{\partial x})}^{2} + {(\frac{\partial X}{\partial y})}^{2}]}^{\frac{1}{2}} \otimes G_{σ} (x, y) .

w_{2} = m a g (\nabla G) = {[{(\frac{\partial G}{\partial x})}^{2} + {(\frac{\partial G}{\partial y})}^{2}]}^{\frac{1}{2}} \otimes G_{σ} (x, y) .

Gradient operator is used to edge the image, and Gaussian convolution kernel is used to make weights distribution smoother. $w_{1}$ matrix can record the diffraction fringes after Fresnel-Fraunhofer diffraction, while $w_{2}$ matrix is used to make the high spatial frequency information of raw image, such as edges, better preservation. These weights distributing in the space domain can help the neural network to better fix the image evolution law in a range of distance, which is conducive to the rapid convergence of network training. In first part of experiment, as the diffracting distance increases, the original phase distribution is covered by the diffraction fringes. Therefore, it is necessary to introduce $w_{1}$ and $w_{2}$ , which assist us in recovering the phase distribution without obvious diffraction pattern and keeping the object structure intact. we set $a_{1}$ to 0.3; $a_{2}$ to 0.3; $a_{3}$ to 1. For near-field diffraction, the phase information is usually kept relatively well, and $w_{1}$ , $w_{2}$ can be ignored, and just consider the $L 2$ distance between the network output and the ground truth training samples.

Gradient descent is utilized to optimize our network parameters. The learning rate is set as a variable of exponential decay form. In this way, the neural network can avoid those cases that network parameters can’t quickly converge or fail to learn more detail because of overlarge or too small initial value of learning rate. Training each network took $\approx 17$ h using Tensorflow on our GPU after ~10 epochs (iterations in the backpropagation algorithm over all examples). The analysis of the trained U-net is provided in Section 3.

3. Results and network analysis

3.1 No additional phase experiment

The first experiment (no additional background phase) proves that the U-net architecture does a good job of learning the law of light field propagation. Here, we demonstrate the phase reconstruction results under different diffraction distances in Fig. 4. The column 4(a) represents the defocus phase, and 4(b) represents the corresponding intensity image. 4(c) represents a focused phase diagram recovered from a trained network, and 4(d) represents the corresponding ground truth.

Fig. 4 Reconstruction results of trained U-net. The baseline distance on which the network was trained is 15mm to 35 mm, and the stride is 0.05 mm.

Download Full Size | PDF

The reconstruction effects of different diffraction distances are compared in Fig. 5. Figures 5(a), 5(b), and 5(c) represent construction results under three diffraction distances $d z$ = $25 \pm 10$ mm, $95 \pm 10$ mm, $155 \pm 10$ mm. Results show that it is more difficult to recover the original phase as the diffraction distance increases. Especially in Fig. 5(c), although the reconstructed image can be distinguished from the overall contour in which most of the diffraction fringes are eliminated, the details are seriously damaged.

Fig. 5 Reconstruction results of our trained U-net under three different generated data sets. The baseline distance on which the network was trained is (a) z1 = 25 ± 10 mm, (b) z2 = 95 ± 10 mm, and (c) z3 = 155 ± 10 mm, respectively.

Download Full Size | PDF

Three mean absolute error (MAE) curves are plotted here to represent the reconstruction error changing with distance in Fig. 6. In Fig. 6(a), MAE is approximately $0.036 π$ , while these values reach $0.105 π$ and $0.095 π$ in 3(b) and 3(c). The result also indicates that when the diffraction pattern completely submerges the original image, the reconstructive quality of the neural network will deteriorate. In general, the appropriate method should be adjusting the optical path and estimating the approximate diffraction distance $d z$ . Propagation techniques are used to realize the coarse focusing and use the neural network to realize rapid fine focusing.

Fig. 6 Quantitative analysis of the sensitivity of the trained U-net to three different generated data sets. The baseline distance on which the network was trained is (a) z1 = 25 ± 10 mm, (b) z2 = 95 ± 10 mm, and (c) z3 = 155 ± 10 mm, respectively

Download Full Size | PDF

Our MAE curve also contains an interesting pattern: the whole curve is oscillating and the frequency is gradually decreasing. In fact, this phenomenon is associated with the samples which we used to train the model. All of our samples are generated by Eq. (2). Therefore, the variation of the MAE is related to the propagation law. In view of the fact that the actual meaning of the internal parameters in the neural network are not known, here a qualitative analysis is just given for this phenomenon. Let original complex amplitude field be $A_{0} (x, y, z)$ in Eq. (9), and $B_{0} (x, y, z + d z)$ in Eq. (10) is the light field that has propagated in dz distance. In this case, the process of A to B is formed by a convolution kernel associated with the diffraction distance dz. Let's define that $h (x, y, d z)$ in Eq. (11). Our MAE can be represented by the following formula in Eq. (12):

A_{0} (x, y, z) = I (x, y, z) e^{^{i ψ} (x, y, z)} .

B_{0} (x, y, z + d z) = A_{0} (x, y, z) \otimes h (x, y, d z) = \frac{e^{i k d z}}{i λ d z} e {}^{\frac{i k (x^{2} + y^{2})}{2 d z}}f f t {A_{0} (x_{0}, y_{0}, z) e {}^{\frac{i k (x_{0}^{2} + y_{0}^{2})}{2 d z}}} .

h (x, y, d z) = \frac{e^{i k d z}}{i λ d z} e {}^{\frac{i k (x^{2} + y^{2})}{2 d z}}.

\begin{array}{l} M A E = \frac{1}{w h} \sum_{x} \sum_{y} | \arg (B_{0} \otimes h (x, y, - d z)) - (\arg (B_{0}), a b s (B_{0})) \otimes H_{F C N} (x, y) | \\ = \frac{1}{w h} \sum_{x} \sum_{y} | ψ (x, y, z) - (\arg (A_{0} (x, y, z) \otimes h (x, y, d z)), a b s (A_{0} (x, y, z) \otimes h (x, y, d z))) \otimes H_{F C N} (x, y) | . \end{array}

H_U-net represents the overall performance of trained U-net. $\arg (A_{0} (x, y, z))$ , $a b s (A_{0} (x, y, z))$ represents the input layer: intensity matrix and phase matrix. It can be noticed that H_U-net has no parameter $d z$ , so MAE's cyclical changes are constrained by $h (x, y, d z)$ in the process of light field propagation. Considering that the neural network for a certain range of reconstruction is many-to-one mapping, therefore, the intensity dominates the final reconstruction (phase changes are more cyclical than intensity, and the intensity changes are more stable). And when $d z$ is large enough, there is an approximation in Eq. (13):

a b s (A_{0} (x, y, z) \otimes h (x, y, d z)) \approx a b s (A_{0} (x, y, z) \otimes h (x, y, d z + d t))

and considering the cycle of phase

2 π

, let's make this a little bit more simplifying in Eq. (14):

\begin{array}{l} \Leftrightarrow a b s (\frac{e^{i k d z}}{i λ d z} e {}^{\frac{i k (x^{2} + y^{2})}{2 d z}}f f t^{- 1} (A_{0} (x_{0}, y_{0}, z) e {}^{\frac{i k (x_{0}^{2} + y_{0}^{2})}{2 d z}})) \approx a b s (\frac{e^{i k (d z + d t)}}{i λ (d z + d t)} e {}^{\frac{i k (x^{2} + y^{2})}{2 (d z + d t)}}f f t^{- 1} (A_{0} (x_{0}, y_{0}, z) e {}^{\frac{i k (x_{0}^{2} + y_{0}^{2})}{2 (d z + d t)}})) \\ \Leftrightarrow e {}^{\frac{i k (x_{0}^{2} + y_{0}^{2})}{2 d z}}\approx e {}^{\frac{i k (x_{0}^{2} + y_{0}^{2})}{2 (d z + d t)}} \\ \Leftrightarrow \frac{k (x_{0}^{2} + y_{0}^{2})}{2 d z} - 2 π \approx \frac{k (x_{0}^{2} + y_{0}^{2})}{2 (d z + d t)} \end{array}

The relationship between the diffraction distance and the next cycle is examined based on the current distance in Eq. (15). Their derivative relationships are in Eq. (16-18):

g (d z, d t) = \frac{k (x_{0}^{2} + y_{0}^{2})}{2 d z} - 2 π - \frac{k (x_{0}^{2} + y_{0}^{2})}{2 (d z + d t)} = 0.

\frac{\partial g}{\partial d z} = - \frac{k (x_{0}^{2} + y_{0}^{2})}{2 d z^{2}} + \frac{k (x_{0}^{2} + y_{0}^{2})}{2 {(d z + d t)}^{2}} .

\frac{\partial g}{\partial d t} = \frac{k (x_{0}^{2} + y_{0}^{2})}{2 {(d z + d t)}^{2}} .

\frac{\partial d z}{\partial d t} > 0.

Here, $d t$ represents the interval between the next periodic change. According to $\frac{\partial d z}{\partial d t}$ in Eq. (18), $d t$ is positively related to $d z$ , which means that the frequency of this periodic oscillation is attenuated with distance.

3.2 Additional phase experiment

Usually, in the absence of perfect precision measurements, it is difficult to implement a DHM system without additional phase difference. By using Principal Components Analysis(PCA) [36,37] or Zernike polynomial fitting (ZPF) [38–40], residual aberrations compensation can be realized, and [25] give an automatic phase aberration compensation DHM based on U-net. But generally, the steps to implement phase compensation are behind the focus process, which is an inevitable iteration process. Therefore, to achieve rapid phase retrieval, the situation with additional background phase must be considered.

Additional background phase makes the image grayscale discontinuous, which is a great disturbance to the training of neural network. Here, the trig functions are used to map the phase, where the pixel gray-value mutation in a large background becomes smooth and continuous. Figure 7(a) represents a phase distribution of micro-quartz pieces(MQPs), and Figs. 7(b) and 7(c) are the results mapped by sine and cosine. The combination of the three images is used as input to the neural network, as is shown in Fig. 2.

Fig. 7 The phase trig function mapping of micro-quartz pieces

Download Full Size | PDF

The phase reconstruction of the training set with additional phase difference ( $d z = - 10$ um to $10$ um, stride $=$ 0.025um) is shown in Fig. 8. The first three columns of the table are the input of the network; the fourth column is the recovered phase, and the last one is the corresponding ground truth. The overall reconstruction effect of the validation set is satisfactory. The real background phase has complex frequency domain components, where the low-frequency components occupy most of the energy distribution. By using this method, automatic aberrations compensation can be realized for most cases. At the same time, we compare the situation where background and the target are not distinguished in Fig. 9. When the phase difference between the background phase and the objects are close to $2 k π$ , their gray values are consistent, leading to poor reconstruction results. One possible explanation is that their structural details are mistaken for background, and swept away in the transmission of the neural network through the activation Function $Re L U$ .

Fig. 8 Reconstruction results of our trained U-net under three different generated data sets. The baseline distance on which the network was trained is −0.01 mm to 0.01 mm.

Download Full Size | PDF

Fig. 9 Poor reconstruction of the neural network (without distinguishing between background and target).

Download Full Size | PDF

By plotting the progression of training and test error across training, our neural network training can be tested. The curve is shown in Fig. 10. Most of the error comes from the high frequency components of the phase image. In the actual application, the error of 0.1 π has satisfied most of the imaging requirements.

Fig. 10 The training and testing error curve for the network trained on training set with background additional phase at distance dz = ± 0.01mm over 10 epochs.

Download Full Size | PDF

3.3 Fast phase imaging of micro-quartz pieces

Here, a complete schematic of the rapid phase recovery technique of the off-axis DHM is given in Fig. 11. The red line represents the flow from the hologram obtained from the CCD to the input of U-net. This process usually involves separating the objective image spectrum from the hologram and coarse focusing through Fresnel transform formula. The details of this section are shown briefly in the Fig. 11, without elaborating. The green line represents the flow of U-net’s training, which we've discussed about in detail in the previous argument; The blue line indicates the process of rapid reconstruction through U-net.

Fig. 11 The overall flow chart of the fast-focus algorithm. The green line represents the network training section. The red line represents the process from hologram to coarse focus. The blue line represents the trigonometric mapping process and the final result (1) Hologram received from CCD (2) The spectrum of the hologram (3) The spectrum of the separated light field (4) coarse - focused light field through Fresnel transform formula (5) Label of training set generation (6) trigonometric mapping process of training set (7) trigonometric mapping process of practical samples (8) U-net’s phase reconstruction of practical samples.

Download Full Size | PDF

In our previous study [41], an optical thickness coding technique was applied to quantitative phase and fluorescence microscopy (QPFM), where we used micro-quartz pieces(MQPs) as the encoding vectors. Due to the difference in physical thickness ( $30 - 70$ um), those MQPs with a width of 100 um showed different grayscale in phase image. Here, they are used to test our algorithm. Our results are shown in Fig. 12. Figure 12(a) is the hologram of MQPs captured by CCD. Through filtering and coarse focusing, the phase and intensity images can be obtained as is shown in 12(b) and 12(c). 12(d) and 12(e) are the results of the mapping by our trigonometric functions. 12(f) is the result of rapid phase recovery by U-net. Because the thickness of the object changes more than 2 $π$ , the reconstructed phase may have a lot of phase aberration, which is not associated with U-net training. As a whole, the diffraction phenomenon of phase image restored by U-net has obviously decreased. For real samples, it is difficult to isolate the target directly from the complex background. This requires prior knowledge: if the additional phase distribution of our optical system is known, more reasonable and bigger background phase data sets can be directly build (This is very time consuming, for we need to collect a large amount of real phase background separately.). Although our U-net doesn't do very well in this respect, we can still see that the overall background phase become weakened in 12(f).

Fig. 12 Rapid phase focusing of micro-quartz pieces. (a)is the hologram captured by CCD (b) and (c) are phase and intensity image of MQPs. (d) and (e) are the phase distribution after triangulation. (f) is the reconstructed phase.

Download Full Size | PDF

3.4 Time cost analysis

A noticeable acceleration can be reached by using the GPU. The CPU and GPU configuration: Intel(R) Core(TM) i5-6500 @ 3.20GHZ and NVIDIA GeForce GTX 1060 6GB. For a single complete focus, using traditional iterative calculation (iterations N = 400), it takes more than 11.714 s on Matlab2017b platform using CPU only. Using CUFFT library, the parallel computing architecture provided by NVIDA, our test result in C + + environment is 360 ms, the speed of which increases 31.541 times. In contrast to this, using our U-net, the final test result is 108 ms, which is faster than traditional methods. It is worth mentioning that there is still some redundancy in our network architecture. Some network cropping, such as removing part of layers or reducing the number of convolution kernels, can be done to speed up the whole process.

4. Discuss

Here we further discuss about the capability of the trained U-net under various situations, to explain the details of our experiment design and the universality of the method.

4.1 Different loss function

In this experiment, we mainly explore the method of phase recovery using deep neural network, and take MAE as the evaluation index of focusing effect. Under the framework of this model, two types of loss functions, which base on $L 1$ distance and $L 2$ distance, are compared with each other. Figure 13 shows the test error across training of two loss function. Both two models can converge to the error range of 0.04 $π$ , without considering the background phase. However, in terms of the actual imaging effect, the model using $L 2$ loss performs better than $L 1$ . Figure 14 shows the construction results of them, in which 14(b) represents $L 1$ and 14(c) represents $L 2$ . 14(b) performs worse than 14(c) in elimination diffraction stripes, which may be due to the fact that $L 2$ requires more gentle curve fitting than $L 1$ ,and it inhibits the perform of regions with large local differences(high frequency details such as diffraction fringe and object edges). Therefore, the L2 loss function with moderate effect was selected in this experiment.

Fig. 13 The testing error curve for the network trained on training set without background additional phase for L1 and L2 loss.

Download Full Size | PDF

Fig. 14 Comparison of reconstruction results of two loss functions. (a) represents ground truth; (b) represents the reconstruction result by using L1 loss function; (c) represents the reconstruction result by using L2 loss function;

Download Full Size | PDF

4.2 Different parameters setting

A good phase imaging effect should ensure a clear outline of the object and the elimination of most diffraction stripes with a small overall error. While in actual fast reconstruction of U-net, it is often difficult to reach the imaging accuracy of traditional iteration algorithm, in which case we should modify the loss function to make the measurement precision of the object image balanced with the visual effect. Figure 15 shows the reconstruction effect comparison by adding additional loss in high frequency parts(object edges and diffraction stripes) through $w_{1}$ and $w_{2}$ matrixes. In this case, $a_{1}$ and $a_{2}$ are both set to 0.3. 15(a) is the ground truth, and 15(b) and 15(c) respectively represent the construction results by using additional loss or not. The MAE of both models reach 0.04 $π$ , while 15(b) shows clearer edge profile than 15(c), which indicates that the introduction of additional loss is beneficial to the improvement of reconstruction effect.

Fig. 15 Comparison of reconstruction results by using additional loss. (a) represents ground truth; (b) represents the reconstruction result by using additional loss ; (c) represents the reconstruction result without additional loss;

Download Full Size | PDF

4.3 General off-axis digital holographic system (non-coarse focusing)

In a general digital holographic system, the hologram captured is usually far away from the focus, in which case coarse focusing of DH needs to be completed by estimating the approximate distance from focus and some filtering steps. Here, we do a comparative experiments, using U-net and traditional methods to recover the phase image of resolution plate far from the focal point (dz = 19.5mm), to explore the focusing ability on real objects (non-microscopic samples) in general situations. Figure 16 shows the reconstruction results. 16(a) and 16(b) represent the defocusing intensity and phase image respectively; 16(c) and 16(d) are trig functions of 16(b); 16(e) is the focusing result of U-net; 16(f) and 16(h) are the focusing intensity and phase obtained by the traditional method. The result of 16(e) can generally reflect the phase information of the resolution plate, and it is more intuitive than 16(h) due to considering the additional background phase effect, which shows that this method has a certain universality. But due to the loss of input information (far from the focus) and the large depth span of the object, it is difficult to eliminate all diffraction fringes and many details are hard to be fully recovered. To achieve high precision reconstruction, it is necessary to carry out coarse focusing through system adjustment and algorithm design as far as possible, so as to make the images of the input high-quality.

Fig. 16 phase focusing experiment of resolution plate. (a) defocused intensity image; (b) defocused phase image; (c) and (d) represent the phase operated by trig function; (e) reconstructed result of trained U-net; (f) Focused intensity images obtained through traditional methods; (h) Focused intensity images obtained through traditional methods.

Download Full Size | PDF

In addition, the U-net used in this experiment is relatively simple. The main purpose of using this network is to achieve high rate phase recovery, which does not indicate that this achitecture is the most suitable network for phase recovery. In general, those kinds of full convolution network using skip-connection mode may all have the ability for this task. In the case of no account of time cost, using more complex backbone network, such as resnet101 [42], densenet [43], pspnet [44] or deeplab v3 + [45], can get more advanced information of the object, while using structure such as refinenet [46] can achieve more efficient decoding. In the following work, phase recovery for biological samples with complex information will be conducted, and the specific performance of these networks will be verified through experiments.

5. Conclusion

The phase recovery is different from the process of deblurring, de-noising and focusing process, which is more difficult to realize due to additional background phase. This paper verifies the feasibility of the neural network for this task, which provides a new way for us to realize accurate and fast phase imaging in DHM. In subsequent experiments, we will focus on the studies how to remove complex background additional phase in order to improve our imaging quality. At the same time, the neural network structure need to be adjusted and optimized further to speed our imaging.

Funding

National Natural Science Foundation of China (NSFC) (61675113, 61527808, 81571837); Science and Technology Research Program of Shenzhen City (JCYJ20160428182247170, JCYJ20170412170255060, JCYJ20160324163759208, JCYJ20170412171856582).

References and links

1. N. Pavillon and P. Marquet, “Cell volume regulation monitored with combined epifluorescence and digital holographic microscopy,” Methods Mol. Biol. 1254(1254), 21–32 (2015). [CrossRef] [PubMed]

2. N. Pavillon, J. Kühn, C. Moratal, P. Jourdain, C. Depeursinge, P. J. Magistretti, and P. Marquet, “Early Cell Death Detection with Digital Holographic Microscopy,” PLoS One 7(1), e30912 (2012). [CrossRef] [PubMed]

3. J. Kühn, F. Montfort, T. Colomb, B. Rappaz, C. Moratal, N. Pavillon, P. Marquet, and C. Depeursinge, “Submicrometer tomography of cells by multiple-wavelength digital holographic microscopy in reflection,” Opt. Lett. 34(5), 653–655 (2009). [CrossRef] [PubMed]

4. J. Kühn, E. Shaffer, J. Mena, B. Breton, J. Parent, B. Rappaz, M. Chambon, Y. Emery, P. Magistretti, C. Depeursinge, P. Marquet, and G. Turcatti, “Label-Free Cytotoxicity Screening Assay by Digital Holographic Microscopy,” Assay Drug Dev. Technol. 11(2), 101–107 (2013). [CrossRef] [PubMed]

5. B. Rappaz, E. Cano, T. Colomb, J. Kühn, C. Depeursinge, V. Simanis, P. J. Magistretti, and P. Marquet, “Noninvasive characterization of the fission yeast cell cycle by monitoring dry mass with digital holographic microscopy,” J. Biomed. Opt. 14(3), 034049 (2009). [CrossRef] [PubMed]

6. V. P. Pandiyan and R. John, “Optofluidic bioimaging platform for quantitative phase imaging of lab on a chip devices using digital holographic microscopy,” Appl. Opt. 55(3), A54–A59 (2016). [CrossRef] [PubMed]

7. L. Williams, P. P. Banerjee, G. Nehmetallah, and S. Praharaj, “Holographic volume displacement calculations via multiwavelength digital holography,” Appl. Opt. 53(8), 1597–1603 (2014). [CrossRef] [PubMed]

8. G. Nehmetallah and P. P. Banerjee, “Applications of digital and analog holography in three-dimensional imaging,” Adv. Opt. Photonics 4(4), 472 (2012). [CrossRef]

9. E. Cuche, P. Marquet, and C. Depeursinge, “Simultaneous amplitude-contrast and quantitative phase-contrast microscopy by numerical reconstruction of Fresnel off-axis holograms,” Appl. Opt. 38(34), 6994–7001 (1999). [CrossRef] [PubMed]

10. S. Yazdanfar, K. B. Kenny, K. Tasimi, A. D. Corwin, E. L. Dixon, and R. J. Filkins, “Simple and robust image-based autofocusing for digital microscopy,” Opt. Express 16(12), 8670–8677 (2008). [CrossRef] [PubMed]

11. C. E. A. Shannon, “A mathematical theory of communication,” Mob. Comput. Commun. Rev. 5(1), 3–55 (2001). [CrossRef]

12. P. Memmolo, C. Distante, M. Paturzo, A. Finizio, P. Ferraro, and B. Javidi, “Automatic focusing in digital holography and its application to stretched holograms,” Opt. Lett. 36(10), 1945–1947 (2011). [CrossRef] [PubMed]

13. J. Zheng, P. Gao, and X. Shao, “Opposite-view digital holographic microscopy with autofocusing capability,” Sci. Rep. 7(1), 4255 (2017). [CrossRef] [PubMed]

14. Y. Yao, B. Abidi, N. Doggaz, and M. Abidi, “Evaluation of sharpness measures and search algorithms for the auto focusing of high-magnification images,” Proc. SPIE 6246(6424), 62460G (2006). [CrossRef]

15. P. B. Gibbons and R. Mathon, “The use of hill‐climbing to construct orthogonal steiner triple systems,” J. Comb. Des. 1(1), 27–50 (1993). [CrossRef]

16. E. Y. Lam, Z. Ren, and Z. Xu, “Learning-based nonparametric autofocusing for digital holography,” Optica 5(4), 337 (2018). [CrossRef]

17. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in International Conference on Neural Information Processing Systems (2012), pp. 1097–1105.

18. E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017). [CrossRef] [PubMed]

19. H. Noh, S. Hong, and B. Han, “Learning Deconvolution Network for Semantic Segmentation,” in 2015 IEEE International Conference on Computer Vision (2015),pp. 1520–1528. [CrossRef]

20. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (2015), pp. 234–241. [CrossRef]

21. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Networks,” Adv. Neural Inf. Process. Syst. 3, 2672–2680 (2014).

22. Y. Yoon, H. G. Jeon, D. Yoo, J. Y. Lee, and I. S. Kweon, “Light-Field Image Super-Resolution Using Convolutional Neural Network,” IEEE Signal Process. Lett. 24(6), 848–852 (2017). [CrossRef]

23. A. Ozcan, H. Günaydin, H. Wang, Y. Rivenson, Y. Zhang, and Z. Göröcs, “Deep learning microscopy,” Optica 4(11), 1437–1443 (2017). [CrossRef]

24. F. G. Venhuizen, B. van Ginneken, B. Liefers, F. van Asten, V. Schreur, S. Fauser, C. Hoyng, T. Theelen, and C. I. Sánchez, “Deep learning approach for the detection and quantification of intraretinal cystoid fluid in multivendor optical coherence tomography,” Biomed. Opt. Express 9(4), 1545–1569 (2018). [CrossRef] [PubMed]

25. T. Nguyen, V. Bui, V. Lam, C. B. Raub, L. C. Chang, and G. Nehmetallah, “Automatic phase aberration compensation for digital holographic microscopy based on deep learning background detection,” Opt. Express 25(13), 15043–15057 (2017). [CrossRef] [PubMed]

26. A. Sinha, G. Barbastathis, J. Lee, and S. Li, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]

27. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Convolutional neural network for Fourier ptychography video reconstruction: learning temporal dynamics from spatial ensembles,” arXiv:1805.00334 (2018).

28. Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light Sci. Appl. 7(2), 17141 (2017). [CrossRef]

29. T. Shimobaba, T. Kakue, and T. Ito, “Convolutional neural network-based regression for depth prediction in digital holography,” arXiv:1802.00664 (2018).

30. T. Nguyen, V. Bui, and G. Nehmetallah, “Computational Optical Tomography Using 3D Deep Convolutional Neural Networks (3D-DCNNs),” Opt. Eng. 57(4), 043111 (2018).

31. H. Dong, G. Yang, F. Liu, Y. Mo, and Y. Guo, “Automatic Brain Tumor Detection and Segmentation Using U-Net Based Fully Convolutional Networks,” in Conference on Medical Image Understanding and Analysis (2017), pp. 506–517. [CrossRef]

32. A. Sevastopolsky, “Optic disc and cup segmentation methods for glaucoma detection with modification of U-Net convolutional neural network,” Pattern Recognit. Image Anal. 27(3), 618–624 (2017). [CrossRef]

33. Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (2016), pp. 424–432.

34. P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in IEEE conference on Computer Vision and Pattern Recognition(2017), pp. 5967–5976.

35. S. Ioffe, and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” arXiv:1502.03167 (2015).

36. C. Zuo, Q. Chen, W. Qu, and A. Asundi, “Phase aberration compensation in digital holographic microscopy based on principal component analysis,” Opt. Lett. 38(10), 1724–1726 (2013). [CrossRef] [PubMed]

37. H. N. D. Le, M. S. Kim, and D. H. Kim, “Comparison of Singular Value Decomposition and Principal Component Analysis applied to Hyperspectral Imaging of biofilm,” in Photonics Conference (2012), pp. 6–7. [CrossRef]

38. T. Nguyen, G. Nehmetallah, C. Raub, S. Mathews, and R. Aylo, “Accurate quantitative phase digital holographic microscopy with single- and multiple-wavelength telecentric and nontelecentric configurations,” Appl. Opt. 55(21), 5666–5683 (2016). [CrossRef] [PubMed]

39. T. Colomb, F. Montfort, J. Kãn, N. Aspert, E. Cuche, A. Marian, F. Charrie, S. Bourquin, P. Marquet, and C. Depeursinge, “Numerical parametric lens for shifting, magnification, and complete aberration compensation in digital holographic microscopy,” J. Opt. Soc. Am. A 23(12), 3177–3190 (2006). [CrossRef]

40. T. Colomb, E. Cuche, F. Charrière, J. Kühn, N. Aspert, F. Montfort, P. Marquet, and C. Depeursinge, “Automatic procedure for aberration compensation in digital holographic microscopy and applications to specimen shape compensation,” Appl. Opt. 45(5), 851–863 (2006). [CrossRef] [PubMed]

41. Z. Shen, Y. He, G. Zhang, Q. He, D. Li, and Y. Ji, “Dual-wavelength digital holographic phase and fluorescence microscopy for an optical thickness encoded suspension array,” Opt. Lett. 43(4), 739–742 (2018). [CrossRef] [PubMed]

42. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778.

43. G. Huang, Z. Liu, L. d. Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 2261–2269.

44. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid Scene Parsing Network,” in IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 6230–6239.

45. L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” arXiv:1802.02611 (2018).

46. G. Lin, A. Milan, C. Shen, and I. Reid, “RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation,” arXiv:1611.06612 (2016).

Fast phase retrieval in off-axis digital holographic microscopy through deep learning

Abstract

1. Introduction

2. Principles and methods

2.1 Construction of off-axis DHM

2.2 Construction of deep convolutional neural network

2.3 Generation of the training set

2.4 Training methods for neural networks

3. Results and network analysis

3.1 No additional phase experiment

3.2 Additional phase experiment

3.3 Fast phase imaging of micro-quartz pieces

3.4 Time cost analysis

4. Discuss

4.1 Different loss function

4.2 Different parameters setting

4.3 General off-axis digital holographic system (non-coarse focusing)

5. Conclusion

Funding

References and links

Cited By

Figures (16)

Equations (18)

Optics Express