## Abstract

A two-frame phase-shifting interferometric wavefront reconstruction method based on deep learning is proposed. By learning from a large number of simulation data based on a physical model, the wrapped phase can be calculated accurately from two interferograms with an unknown phase step. The phase step can be any value excluding the integral multiples of π and the size of interferograms can be flexible. This method does not need a pre-filtering to subtract the direct-current term, but only needs a simple normalization. Comparing with other two-frame methods in both simulations and experiments, the proposed method can achieve better performance.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

As one of the most popular techniques in optical metrology [1], traditional Phase-shifting interferometry (PSI) requires three or more interferograms with fixed, known phase steps [2–4]. However, the acquisition of these interferograms is time-consuming and sensitive to mechanical vibrations, ambient air turbulence and temperature changes. Thus, the number of interferograms should be reduced to minimize the recording time. But it is difficult to reconstruct the phase with only one interferogram. For example, only one interferogram cannot distinguish the positive or negative of defocus, therefore, it is difficult to separate convex surfaces from concave surfaces with one interferogram. If there is no additional prior information, at least two frames of interferograms are needed to solve the sign ambiguity problem. Takeda *et al*. developed a Fourier-transform method to extract the phase from a single interferogram [5]. By introducing a large spatial-carrier frequency, the phase information can be separated from unwanted irradiance variations in Fourier domain. However, the limitation is that an interferogram with closed-fringes cannot be applied. Thus, a large tilt is needed to generate the wanted spatial carrier, which will make the fringes denser. If the fringes are too dense, not only the camera cannot record them, but also it will cause considerable retrace error [6,7]. To simplify the measurement process and reduce the instrument cost, the phase reconstruction using two frames of interferograms has been investigated extensively in the past decade [8–11]. In reference [8], a demodulation method, called Kreis method, based on the Fourier transform of two interferograms was proposed. It can demodulate the phase from two interferograms without sign ambiguity. It is one of the most classical algorithms in this field, but it is very sensitive to noise. In [9], a two-step interferometric method based on a regularized optical flow algorithm (OF) was proposed. This method does not need to normalize the fringe pattern, but needs to subtract the direct-current (DC) term. In reference [10], a phase reconstruction method based on Gram Schmidt (GS) orthogonalization with two fringe patterns as independent vectors was proposed. Because of its high accuracy and small computation consumption, this method has achieved great success and is widely used in two-frame interferometry. This method also requires the DC term be removed. In reference [11], a fast and accurate (FA) two-frame PSI wavefront reconstruction method was proposed. Before phase reconstruction, it is also necessary to use a high-pass Gaussian filter to filter out the DC term, and then estimate the cosine value of the unknown phase step between two interferograms by solving a quartic polynomial equation, and finally calculate the wrapped phase.

In recent years, deep learning (DL) technology based on artificial neural networks (ANN) has developed rapidly, especially in the field of computer vision. In reference [12], a convolutional neural network named U-Net was proposed for biomedical image segmentation. In reference [13], the U-Net was modified to be the generator in a Generative Adversarial Networks (GANs) applying to Image-to-Image Translation. These latest developments in computer vision have inspired the field of optical metrology, such as phase unwrapping and denoising [14–16]. In reference [17], Kando *et al*. proposed a deep learning-based method to extract phases from single-shot interferograms, even when the interferogram includes closed ring-shaped fringes. However, this method is not suitable for the precise measurement of freeform surfaces because it cannot be applicable to interferograms including more than one closed-fringes. In order to solve the shortcomings of the above methods, we modified the generator in [13], and proposed a new two-frame phase-shifting method based on deep learning. We call the proposed network the Phase U-Net (PUN). The proposed PUN neither needs a spatial carrier, nor a filter for subtracting the DC term, but only a simple normalization. It can accurately recover the wrapped phase from two interferograms with an unknown phase step excluding the singular case, which corresponds to integral multiples of π.

The paper is organized in the following way: In Section 2 we analyze the advantages of two-frame methods over one-frame methods. Then we illustrate how we generated the datasets for training the network and explain the architecture of the PUN, including details in the training process in Section 3. In Section 4, we do simulations and experiments to compare the algorithms and make a detailed error analysis, followed by the conclusion.

## 2. Advantages of two-frame methods over one-frame methods

In traditional four-step PSI [18], the four phase-shifted interferograms can be described as:

where ${I_\textrm{1}}$, ${I_\textrm{2}}$, ${I_\textrm{3}}$ and ${I_\textrm{4}}$ are the intensities of the four interferograms respectively,*a*means the DC term,

*b*means the modulation term, $W$ is the original phase, and $({x,y} )$ is the coordinate.

We can filter out the unwanted background intensities *a* and *b* by using the equation below:

*W*.

The traditional four-step PSI requires four interferograms with accurate phase steps of π/2. This method is sensitive to mechanical vibrations, ambient air turbulence and temperature changes. Thus, many advanced algorithms have been proposed to reduce the number of interferograms.

The Fourier-transform method was first proposed by Takeda *et al*. [5]. If the tilt is not set to zero, the fringe pattern can be expressed as:

*a*,

*b*and $W$ vary slowly compared with ${f_0}$.

An image-sensing device (such as CCD and CMOS) that has enough resolution to satisfy the sampling-theory requirement is used to capture the fringes. The fringe pattern is rewritten in the following form for convenience of explanation:

Then the fast-Fourier-transform (FFT) algorithm is used to transform Eq. (7) into Fourier domain.

where $FI$ means the Fourier spectra of intensity*I*, and the other capital letters denote the corresponding Fourier spectra in Eq. (7). $({fx,fy} )$ are the spatial frequency in the x direction and y direction respectively. Since the spatial variations of

*a*,

*b*and

*W*are slow compared with the spatial frequency ${f_0}$, the Fourier spectra in Eq. (9) are separated by the carrier frequency ${f_0}$. Thus, $C({fx - {f_0},fy} )$ can be extracted and to obtain $C({fx,fy} )$. Note that the unwanted background variation

*a*has been filtered out in this stage.

Again, using the FFT algorithm, we compute the inverse Fourier transform of $C({fx,fy} )$ with respect to $({fx,fy} )$ and obtain $c({x,y} )$, defined by Eq. (8). Then we calculate a complex logarithm of Eq. (8):

Now we have the wrapped phase $\phi ({x,y} )$ in the imaginary part completely separated from the unwanted amplitude variation $b({x,y} )$ in the real part. However, if the spatial-carrier frequency is not large enough, this method becomes invalid. Besides, this kind of cross-talk in Fourier domain corresponds to closed-fringes in spatial domain. In other words, there is a limitation of the Fourier-transform method, that an interferogram including closed-fringes cannot be applied.

Each closed-fringe pattern in interferograms corresponds to two possible situations: a concave surface or a convex surface. This is a typical one-to-multiple mapping relationship. In fact, a one-to-one mapping relationship can be determined by appointing all closed-fringe patterns to be concave surfaces (or convex surfaces) [19]. Based on this assumption, a deep learning-based method or an improved Fourier-transform method can be proposed to extract phases from single-shot interferograms with closed-fringe patterns [17,19]. However, freeform surfaces can be considered to be composed of many convex surfaces and concave surfaces, we can never really know whether they are convex or concave in only one interferogram [19]. For example, Fig. 1(a) shows a sphere with two small defects in the center: one of them can be seen as a small convex surface while the other one can be seen as a small concave surface. A tilt has been added to the sphere but is not large enough to turn closed-fringes in the interferogram into open-fringes. Figure 1(b) is the interferogram of that sphere. There are two points P1 and P2 on the adjacent fringes as shown in Fig. 1(b). It is known that the phase difference between P1 and P2 is 2π but we cannot know which point is higher, and it is the same with the other defect. Thus, there are four possibilities for those two small defects with the same interferogram in Fig. 1(b), and the other three possible situations are shown in Fig. 2.

Two-frame methods do not have the sign ambiguity problem. Figure 3(a) shows the second interferograms for those four surface shapes in Fig. 1(a) and Fig. 2, Those interferograms are different so that the four situations above can be well distinguished when combined with the first interferograms in Fig. 1(b). Figure 3(b) shows the wrapped phases calculated by the PUN and Fig. 3(c) shows the unwrapped phases unwrapped from Fig. 3(b). As we can see, the PUN can reconstruct those four different surfaces correctly.

In conclusion, whether the Fourier-transform method or the deep learning-based one-frame method, in order to fully solve the sign ambiguity problem, all kinds of one-frame interferometry methods need a large tilt which may cause too dense fringe patterns and the retrace error, while two-frame methods do not need such tilts and are more suitable for high precision optical metrology. Therefore, in Section 4, we will compare the PUN with other two-frame methods rather than one-frame methods.

## 3. Data generation and the training process

#### 3.1 Data generation

All deep learning-based methods start with data. In this paper, we generate the data by simulation for training. In order to enhance the generalization ability of the network and make it perform well for all possible situations, we use two different ways to generate the original phase. The first way is to use the Zernike polynomials. Since any continuous wavefront shape may be represented by a linear combination of the Zernike polynomials [18], the original phase $W$ can be expressed as follows.

where*L*is the maximum power, ${Z_r}$ is the Zernike coefficient, $\rho$ and $\theta$ are radius and angle respectively. The single index r is defined by: and the polynomial ${U_r}$ can be expressed as follows.

*k*,

*d*and

*g*are coefficients to simulate different types of deformable mirrors and will have little impact to the final results in this paper, because they are still the same type of interferograms even when the coefficients change.

*k*is the coupling coefficient and is set to 0.15 in this paper,

*d*is the actuator spacing and is set to 8,

*g*is the Gaussian index and is set to 2.5. So, the original phase is obtained as follows. where $W$ is the original phase, ${H_i}$ is the influence amplitude coefficient and

*N*means the number of actuators.

We used several groups of Zernike coefficients (up to 11 terms, including the tip and tilt) to generate original phases. The terms and amplitudes of the Zernike coefficients in each group were carefully selected to prevent from generating too dense fringes. For example, there were 5 terms of Zernike coefficients (1^{st} to 5^{th} order) in Group 1, and the amplitudes of the Zernike coefficients were randomly chosen from −15 to 15. In Group 2, there were 3 Zernike coefficients (8^{th}, 9^{th} and 10^{th} order) with the amplitudes vary between −10 and 10. In this way, we could generate many different types of aberrations without too dense fringes in the interferograms. As a supplement to the Zernike polynomials, we also generated original phases by using the actuator influence function. The influence amplitude coefficients were randomly chosen from −25 to 25. After the original phases were generated, we computed the wrapped phases to be the outputs in the training set. The wrapped phase $\phi$ is the phase angle of the original phase *W*, and can be calculated by using the MATLAB function ‘angle’:

Figure 4 shows an example of two original phases generated by the Zernike polynomials method and the actuator influence function method respectively. Their corresponding wrapped phases and interferograms are also shown.

According to [18], we can model the interferograms in two-frame phase-shifting interferometry as:

where $\delta$ is the unknown phase-shifting step.In order to simulate the non-uniform background intensity and modulation amplitude, *a* and *b* were not constants when generating the training set. Also, to better simulate the real situation, additive white Gaussian noise was added into these interferograms and the signal-to-noise ratio (SNR) varied from 20 to 100 dB. Figure 5 shows some simulated interferograms with different background intensities and noise levels. These interferograms are normalized between 0 and 1 to be the inputs in the training set, while the wrapped phases $\phi$ are the outputs.

To read the data more efficiently and accelerate the training process, we used the TFRecord format to store the training set. The TFRecord format is a simple format for storing a sequence of binary records. It can be helpful to serialize the data and store it in a set of files (100-200MB each) that can each be read linearly. Thus, we stored 64 pairs of inputs and outputs in one TFRecord format file (192MB). Since the PUN was a deep network, we wanted to generate more than 50,000 pairs of data to train the network. Therefore, we generated 782 TFRecord format files, which were 50,048 pairs of data in total. We also generated a test dataset different from the training set to test the network and to evaluate whether the network was overfitting. Note that the image sizes of both interferograms and wrapped phases were 512×512. The phase step $\delta$ should be between 0 and π when generating the training data.

#### 3.2 Architecture of PUN and the training process

The architecture of our network the PUN is illustrated in Fig. 6. Different from the original U-Net in [12], the proposed network has no pooling layer because details are important while pooling layers may lose those features.

In the PUN shown above, we used LeakyReLu [21] activation functions in downsampling steps and ReLU [22] in upsampling steps. Also, Batch Normalization [23] was used in some layers to accelerate training. The Dropout technique [24] was also used in some upsampling steps in order to avoid over-fitting. Different from the original U-Net using softmax in the output layer to solve the image segmentation problem as a classification problem, we used the ELU activation function [25] in the output layer to make predictions of the pixel values. It has little impact that the outputs of the ELU activation function do not range from 0 to 1, because all negative outputs can be easily replaced by zeros and values larger than one can also be replaced by ones in post-processing. We have tried different loss functions and found the mean absolute error loss function performed the best.

*M*and

*N*are the image size. We applied the Adam solver [26] to train the PUN on a High Performance Computing cluster using one Intel Xeon E5-2695 V3 CPU, one Nvidia P100 GPU (with 16 GB VRAM), and 224 GB RAM for 120 epochs. The version of Tensorflow framework was 2.0.0.

It is significant to note that there is no fully-connected layer in the network architecture so that we can not only predict wrapped phases using inputs with the size of 512×512, but also lager images such as 768×768, 1024×1024, and 1280×1280. However, it doesn’t mean that there is no limit in the input size. The input size must be integral multiples of 256×256. For example, the size of 32×32 is not acceptable. Because the architecture of the PUN consists of two parts: the downsampling part and the upsampling part. In the downsampling part, the image size will be halved from the previous layer of the network to the next layer of the network. The data cannot even reach the bottom layer of the downsampling part when the input size is 32×32. Most ANNs do not allow users to change the input size for two reasons. One reason is that most of ANNs have fully-connected layers in their network architectures. Changing the input size will lead to the change of the number of parameters in the fully-connected layers. But the number of parameters cannot be changed in a concrete network. Thus, the input sizes of ANNs with fully-connected layers just cannot be changed. The other reason is that the test set (or we can call it the real problem) should be independent and identically distributed (i.i.d.) with the training set. Although in the PUN, there is no fully connected layer and thus it allows users to change the input size, the receptive field of the network is limited by the convolution layers. In the same convolution kernel and with the same original phase, the fringe pattern of a large size interferogram is sparser than the fringe pattern of a small size interferogram. That means, the input size cannot be enlarged without limit because that may break the i.i.d. rule. In one word, this network architecture allows users to choose the input size freely within a certain range. For example, the well-trained PUN in this paper prefers the input sizes between 512×512 to 1280×1280, and we have tested in Section 4 that the performances fluctuate little in that range. But if the CCD size changed to 2048×2048, it is better to re-train the network using a new dataset, and that new well-trained network may perform better with input sizes between 2048×2048 to 2816×2816.

## 4. Error analysis and experiments

#### 4.1 Error analysis by simulation

We first examined the robustness of the proposed method when phase step changed. We have calculated the RMS errors of the wrapped phases computed by all five two-frame methods at different phase steps among the range from 0 rad to π rad (not including 0 and π,). No noise was added to the interferograms and the results are plotted in Fig. 7, showing that the trained PUN is consistently better than other methods. In order to test the robustness under noisy situations, we simulated interferograms with a phase step of 1 rad and added additive white Gaussian noise with SNR ranging from 20 dB to 100 dB and plotted the RMS errors in Fig. 8. This simulation validated that the PUN had the best performance over the whole range of tested SNR.

We also tested the PUN with different input sizes to make sure that it would perform well in common image sizes. We generated another two sets of interferograms with sizes of 512×512, 768×768, 1024×1024 and 1280×1280. One set of interferograms had the phase steps of 1 rad and Gaussian noise with 100 dB SNR. The RMS errors of all five methods were calculated in Table 1, followed by the corresponding processing times. We then did the same experiment on the second set which only changed the SNR to 20 dB, as shown in Table 2. It should be noted that the PUN was written in Python and ran in Spyder IDE, while other algorithms were implemented in MATLAB. All methods (including the PUN) ran on a laptop with one Intel Core i5-8250U processor and 8 GB RAM when testing the performances. As can be seen in Table 1 and Table 2, the proposed method PUN is the most accurate among all those methods, while the FA algorithm remains the fastest especially with large image sizes. To have a more intuitive understanding, the results of PUN in Table 1 are demonstrated in Fig. 9.

#### 4.2 Experimental validation

In order to further verify the effectiveness of the proposed method, we collected two sets of interferograms in experiments with π/2 phase step as shown in Fig. 10(a) and Fig. 12(a), and calculated the wrapped phases using the standard four-step algorithm as the ground truth as shown in Fig. 10(b) and Fig. 12(b). The sizes of interferograms were 604×674 (we used a polarization camera PolarCam^{TM} with the pixel number of 1208×1348 and the pixel size of 7.4 µm from 4D Technology, Inc. to take 4 pictures), so we extended them into 768×768 by padding zeros. The rest images in Fig. 10(b) and Fig. 12(b) are the wrapped phases calculated from the first two interferograms using Kreis, OF, GS, FA, and PUN. The RMS errors (comparing with the four-step algorithm) in Fig. 10(b) were 0.6485 rad, 1.2680 rad, 0.6590 rad, 0.7231 rad and 0.4007 rad respectively. And the RMS errors in Fig. 12(b) were 0.7080 rad, 1.0370 rad, 0.5610 rad, 0.5832 rad and 0.5019 rad respectively. Then we unwrapped the wrapped phases using the ‘unwrap’ function in the MATLAB. As shown in Fig. 11, comparing with the four-step interferometry, the RMS errors of Kreis, OF, GS, FA, and PUN after unwrapping were 0.7172 rad, 11.3122 rad, 0.2917 rad, 0.5061 rad and 0.1681 rad, respectively. And in Fig. 13, the RMS errors after unwrapping were 1.9243 rad, 6.5721 rad, 0.1975rad, 0.8105 rad and 0.1688 rad respectively. Although the standard four-step interferometry cannot actually be the ground truth, it can be seen reliable regardless of the noise in interferograms. Therefore, the PUN results can be seen more accurate than the other two-frame interferometry algorithms.

## 5. Conclusion

In conclusion, we have proposed a new deep learning-based method called PUN to estimate the wrapped phase by using only two interferograms. The advantages of the PUN over one-frame methods and other two-frame methods have been explained in this paper. Admittedly, the proposed method is not the fastest comparing with other two-frame methods, however, it is indeed the most accurate. Moreover, deep learning-based methods normally need a fixed input size, our network architecture can use different input sizes such as 512×512, 768×768, 1024×1024, 1280×1280, or even larger after re-trained with new datasets. Both simulations and experiments have been done to verify the performance.

## Funding

National Institutes of Health (S10OD018061); National Science Foundation (1918260); China Scholarship Council (201904910369).

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **D. Malacara, * Optical Shop Testing* (Wiley, 2007).

**2. **G. Lai and T. Yatagai, “Generalized phase-shifting interferometry,” J. Opt. Soc. Am. A **8**(5), 822–827 (1991). [CrossRef]

**3. **Z. Wang and B. Han, “Advanced iterative algorithm for phase extraction of randomly phase-shifted interferograms,” Opt. Lett. **29**(14), 1671–1673 (2004). [CrossRef]

**4. **J. Deng, K. Wang, D. Wu, X. Lv, C. Li, J. Hao, J. Qin, and W. Chen, “Advanced principal component analysis method for phase reconstruction,” Opt. Express **23**(9), 12222–12231 (2015). [CrossRef]

**5. **M. Takeda, H. Ina, and S. Kobayashi, “Fourier-transform method of fringe-pattern analysis for computer-based topography and interferometry,” J. Opt. Soc. Am. **72**(1), 156–160 (1982). [CrossRef]

**6. **C. J. Evans and J. B. Bryan, “Compensation for Errors Introduced by Nonzero Fringe Densities in Phase-Measuring Interferometers,” CIRP Ann. **42**(1), 577–580 (1993). [CrossRef]

**7. **H. Yiwei, X. Hou, Q. Haiyang, and W. Song, “Retrace error reconstruction based on point characteristic function,” Opt. Express **23**(22), 28216–28223 (2015). [CrossRef]

**8. **T. M. Kreis and W. P. O. Juptner, “Fourier transform evaluation of interference patterns: demodulation and sign ambiguity,” Proc. SPIE **1553**, 263–273 (1992). [CrossRef]

**9. **J. Vargas, J. A. Quiroga, C. O. S. Sorzano, J. C. Estrada, and J. M. Carazo, “Two-step interferometry by a regularized optical flow algorithm,” Opt. Lett. **36**(17), 3485–3487 (2011). [CrossRef]

**10. **J. Vargas, J. A. Quiroga, C. O. S. Sorzano, J. C. Estrada, and J. M. Carazo, “Two-step demodulation based on the Gram-Schmidt orthonormalization method,” Opt. Lett. **37**(3), 443–445 (2012). [CrossRef]

**11. **Z. Cheng and D. Liu, “Fast and accurate wavefront reconstruction in two-frame phase-shifting interferometry with unknown phase step,” Opt. Lett. **43**(13), 3033–3036 (2018). [CrossRef]

**12. **O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2015), pp. 234–241.

**13. **P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (IEEE, 2017), pp. 1125–1134.

**14. **K. Wang, Y. Li, Q. Kemao, J. Di, and J. Zhao, “One-step robust deep learning phase unwrapping,” Opt. Express **27**(10), 15100–15115 (2019). [CrossRef]

**15. **J. C. Zhang, X. B. Tian, J. B. Shao, H. B. Luo, and R. G. Liang, “Phase unwrapping in optical metrology via denoised and convolutional segmentation networks,” Opt. Express **27**(10), 14903–14912 (2019). [CrossRef]

**16. **K. Yan, Y. Yu, T. Sun, A. Asundi, and Q. Kemao, “Wrapped phase denoising using convolutional neural networks,” Opt. Laser Eng. **128**, 105999 (2020). [CrossRef]

**17. **D. Kando, S. Tomioka, N. Miyamoto, and R. Ueda, “Phase Extraction from Single Interferogram Including Closed-Fringe Using Deep Learning,” Appl. Sci. **9**(17), 3529 (2019). [CrossRef]

**18. **D. Malacara, M. Servin, and Z. Malacara, * Interferogram analysis for optical testing* (CRC press, 2005).

**19. **Z. Ge, F. Kobayashi, S. Matsuda, and M. Takeda, “Coordinate-transform technique for closed-fringe analysis by the Fourier-transform method,” Appl. Opt. **40**(10), 1649–1657 (2001). [CrossRef]

**20. **L. Huang, C. Rao, and W. Jiang, “Modified Gaussian influence function of deformable mirror actuators,” Opt. Express **16**(1), 108–114 (2008). [CrossRef]

**21. **A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proceedings of the International Machine Learning Society, 30 (2013)

**22. **X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics (2011), pp. 315–323.

**23. **S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv:1502.03167 [cs] (2015).

**24. **N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” J. Mach. Learn. Res. **15**, 1929–1958 (2014).

**25. **T. U. Djork-Arné Clevert and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” arXiv:1511.07289 [cs.LG] (2015).

**26. **D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), (San Diego, 2015).