## Abstract

Interferogram demodulation is a fundamental problem in optical interferometry. It is still challenging to obtain high-accuracy phases from a single-frame interferogram that contains closed fringes. In this paper, we propose a neural network architecture for single-frame interferogram demodulation. Furthermore, instead of using real experimental data, an interferogram generation model is constructed to generate the dataset for the network's training. A four-stage training strategy adopting appropriate optimizers and loss functions is developed to guarantee the high-accuracy training of the network. The experimental results indicate that the proposed method can achieve a phase demodulation accuracy of 0.01 λ (root mean square error) for actual interferograms containing closed fringes.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Optical interferometry is an essential type of non-contact surface profilometry. Coherent light waves produce an interferogram with bright and dark fringes through interference. The pattern of these fringes reflects the surface profile information of the measured object. By interferogram demodulation, the phase distribution can be obtained, so does the surface profile of the measured object. The interferogram demodulation is a pivotal issue in optical interferometry because its accuracy directly determines the performance of the final measurement. A dynamic surface metrology [1] is becoming more and more urgent and important with the development of advanced optical systems, such as the new generation of astronomical telescopes. For modern aspheric surfaces with various shapes and high accuracy, the in-process metrology is significant to guide the fabrication procedure, so an instantaneous solution of the surface figure error from a single interferogram is critical.

An interferogram is a kind of two-dimensional (2D) sinusoidal fringe pattern. After decades of research, there are two significant categories of classic demodulation methods for 2D sinusoidal fringes, the phase-shifting (PS) methods [2–5] and spatial phase-demodulation (SPD) methods [6–8] such as Fourier transform method. The PS methods extract phases by acquiring multiple fringe patterns. They have good robustness over image noises, high accuracy which is usually better than 0.005 λ (root mean square (RMS) error), and insensitivity to background illumination. Phase shifting are generally achieved through mechanical moving parts, such as piezoelectric ceramics, and thus several seconds are required to capture enough fringe patterns and the measurement is sensitive to the environmental vibration. Consequently, PS methods are difficult to be adopted in dynamic or *in-situ* measurement in optical manufacturing. Compared with the PS methods, the SPD methods only need a single-frame fringe pattern to extract the phase, enabling the dynamic measurement. The phase information of a single-frame fringe pattern is spatially distributed and mainly represented by the gradient relationship of the pixel value between adjacent pixels of the fringe pattern. So, the SPD methods are sensitive to image noises and uneven background illumination, and their accuracy (about 0.05 λ RMS) is usually lower than the PS methods. Moreover, the SPD methods cannot handle fringe patterns with closed fringes due to its internal principle. Interferograms with closed fringes and uneven background illumination are inevitable, and thus the application of the SPD methods is limited.

To effectuate the high accuracy phase demodulation of singe-frame interferograms with closed fringes, several methods have been proposed. The most straightforward way is converting the close fringes into open ones by adding suitable carrier wave, e.g. by largely tilting the reference plane, and then utilizing the conventional SPD strategy to achieve phase demodulation [9]. Nevertheless, the carrier waves are not practical for all closed fringes, because it will bring in extra retrace error [9]. Moreover, supplementary carrier wave may produce fine and low-contrast fringes whose spatial frequency exceeds the resolution of the detector [10] and limits the dynamic range of the interferometer measurement. Polar coordinate transformation [10], regularized frequency stabilization [11], phase unwrapping [12], phase stitching [13], generalized regularized phase tracker [14], optimization [15], and symmetric phase processing [16] can also achieve demodulation of a single-frame fringe pattern with closed fringes. These methods also have their limitations: methods in Refs. [10,12,13,15,16] can only process the interferograms with some special modes; methods in Refs. [11,13,14] require manual settings of the parameters, and the demodulation accuracy depends on the correct selection of those parameters. The accuracies of the methods in Refs. [11–16] were distributed between about 0.02 λ and 0.2 λ. It is still challenging to solve the phase of closed fringes by a conventional method with high adaptability to various pattern modes and with higher accuracy.

Besides the conventional methods, deep learning has also been showing attractive potential in phase demodulation. In the past decade, deep learning technology has made significant progress and is used in various fields, including calibration of surface misalignment in general interferometer [17], and phase unwrapping [18–22]. Deep learning technology is essentially an optimization method especially suitable for non-linear problems or the problems that cannot be described with analytical models. The phase demodulation problem is a typical non-linear inverse problem. Deep learning technology can find the internal connection of the input and the output of the problem and infer the best solution from groups of inputs and outputs, i.e., the training set. On this basis, Kando [23] proposed a single-frame interferogram demodulation method compatible with closed fringes based on deep learning. The relative accuracy of the method was 2.7%, which was evaluated by the relative root mean square error. Feng [24] and Qiao [25] proposed single-frame sinusoidal fringe demodulation methods based on deep learning and applied them to the fringe projection profilometry and phase measuring deflectometry. These methods enabled the demodulation of dense or straight OPEN projection sinusoidal fringes, and the accuracies were about 0.016 λ RMS and 0.0065 λ (mean absolute error), respectively.

In summary, ultra-high-accuracy optical manufacturing has put forward a demand on the measurement accuracy of higher than 0.01 λ, and the high-accuracy demodulation of a single-frame interferogram with closed fringes is still a challenging problem.

We propose a high-accuracy deep-learning-based singe-frame interferogram demodulation method with good adaptability. It aims to design a specific neural network to handle the interferograms with closed fringes, but also keeps compatibility to open fringes. Instead of training the network with real experimental data in various situations, which is difficult to obtain, an interferogram generation model is constructed to generate the simulated dataset for the network's training. A set of specific loss functions are also proposed to achieve the high-accuracy training of the network. The remainder of this paper is organized as follows. Section 2 introduces the deep learning framework for interferogram demodulation, including the dataset, network structure, and training process. Section 3 includes experimental results and detailed analysis. Section 4 discusses the similarities and differences between our proposed method and the previous fringe demodulation method based on deep learning. Section 5 summarizes the work of this article and proposes directions for future research.

## 2. Method

The schematic flow chart of the proposed method is shown in Fig. 1. For an interferogram to be processed, we first normalize the pixel values of the interferogram to obtain the normalized fringe pattern. Then, we obtain the wrapped phase from the normalized fringe pattern through a neural network. The dataset for the training is obtained by an interferogram generation model. Finally, we unwrap the wrapped phase to obtain the absolute phase of the interferogram.

It is worth emphasizing that the neural network aims to obtain the wrapped phase, so the network’ output is limited to a fixed range [−π, π], rather an uncertain range of the final absolute phase. Limiting the neural network output to a fixed range can improve the network’s numerical stability, thereby improving the accuracy of the network.

Next, the normalization method, interferogram generation model, dataset, neural network structure, neural network training, and unwrapping method are introduced in detail.

#### 2.1 Normalization

Actual interferograms usually have different brightness and contrast. The normalization operation scales the grayscale range of interferograms to [0, 1], so the network does not need to adapt to the dynamic input range. From the dataset perspective, normalization extracts the critical information about the spatial variation of the phase in the interferogram, avoids the meaningless diversity in the gray range of the dataset, and increases the information density of the dataset. Normalization is also applied in the training of the subsequent neural networks. During the normalization, to obtain reliable maximum and minimum, a 5×5 median filter is performed to suppress the local noise.

#### 2.2 Interferogram generation model

The dataset is the foundation of the neural network, and it plays a critical role in the final performance of the neural network. We use simulation to generate the dataset and improve the proposed method’s applicability and expansion potential. The interferometer generally uses a laser light source, leading to the fact that the background light intensity of actual interferograms generally conforms to the Gaussian distribution. The laser light source also introduces complex noise into the interferogram, such as speckle noise. We analyze the actual interferogram captured by a ZYGO interferometer. Then we model the light intensity unevenness and noise of the interferogram to propose an interferogram generation model.

The interferogram is generated by the interference between the two beams of the reference arm and the measuring arm inside the interferometer, as shown in Eq. (1).

$I(x,y)$ is the light intensity of the interferogram, ${I_1}(x,y)$ is the light intensity of the reference arm, ${I_2}(x,y)$ is the light intensity of the measuring arm, and Δ*ϕ*is the absolute phase. We assume that the background light intensity provided by the interferometer light source is ${I_{\textrm{bg}}}(x,y)$, the modulation of the light intensity of the reference arm optical path is $a(x,y)$, the modulation of the light intensity of the measurement arm optical path is $b(x,y)$, and the modulation of the measured mirror’s reflection is $c(x,y)$. Then, We ignore the modulation of the light intensity reflected by the mirror to simplify the model. Then, $c(x,y)$ is a constant equal to 1. Moreover, we consider that the modulation of the light intensity of the optical path of both arms is uniform. Then, $a(x,y)$ can be denoted as

*a*, and $b(x,y)$ can be denoted as

*b*. We introduce a parameter

*k*to express the ratio of the light intensity of the measuring arm to the light intensity of the reference arm. Then, Thus, we have The dataset needs the normalized interferogram, which is denoted as ${I_\textrm{f}}(x,y)$. Then, the normalization operation needs to be performed on $I(x,y)$. Thus,

*a*in $I(x,y)$ can be ignored. Thus, we have

*ϕ*, the background light intensity ${I_{\textrm{bg}}}(x,y)$, and the light intensity proportionality factor

*k*are randomly generated. The entire generation process of the dataset is shown in Fig. 2.

Next, we address how to generate background light intensity ${I_{\textrm{bg}}}(x,y)$. We consider that the laser light source of the interferometer introduces a Poisson noise, and the background light intensity ${I_{\textrm{bg}}}(x,y)$ is a Gaussian distribution ${I_\textrm{d}}(x,y)$ added to the Poisson noise $P(x,y)$. That is,

The maximum value of the Gaussian distribution is 1. The minimum value is the parameter*p*

_{u}, and the distance of the maximum pixel from the center of the image is the parameter

*p*

_{b}. We use Gaussian noise to simulate Poisson noise. Moreover, we introduce a parameter

*p*

_{n}to reflect the noise intensity. That is, We use a random value generated by a Gaussian distribution with mean ${I_\textrm{d}}(x,y)$ and standard deviation ${p_\textrm{n}}{I_\textrm{d}}(x,y)$ to replace each pixel of distribution ${I_\textrm{d}}(x,y)$. In practice, a Gaussian filter with a standard deviation of 1 is applied to the simulated background light intensity to obtain the final background light intensity and make the simulated background light intensity closer to the actual background light intensity captured by the ZYGO interferometer.

The absolute phase Δ*ϕ* is a random defocus and tilt aberration, and the corresponding interferogram can contain open or closed fringes. We use the second, third, and fourth terms of the Zernike polynomial to generate a random phase distribution. Then, we denote the corresponding Zernike coefficients as *z*_{2}, *z*_{3}, and *z*_{4}. The defocus phase distribution *φ* can be obtained by taking random values for *z*_{2}, *z*_{3}, and *z*_{4}. Thus, we introduce the parameter phase range *p*_{r} and the parameter initial piston phase *ϕ*_{0}. Finally, we scale *φ* to [0, *p*_{r}] and add the initial phase *ϕ*_{0} to obtain the absolute phase Δ*ϕ*.

The above parameters comprise the proposed interferogram generation model.

#### 2.3 Dataset

We need to determine each parameter in the interferogram generation model to generate a suitable dataset. Thus, we analyze the actual interferograms captured by the ZYGO interferometer and estimate various parameters in the model accordingly. First, we remove any measured mirror and capture a background intensity image reflected by the reference mirror of the interferometer. The unevenness *p*_{u} and the bias *p*_{b} can be estimated by fitting the background intensity image to a Gaussian distribution. Then we subtract the background image from the fitted Gaussian distribution, and the background noise can be obtained. A histogram can be obtained by counting the background noise's pixel values. Then we fit the histogram to a Gaussian distribution, and the noise rate *p*_{n} can be estimated by the standard deviation of the Gaussian distribution. About the ratio *k*, we used an experienced value of 0.2. In practice, we adjust these estimated parameters manually to make the simulated interferogram more closed to the actual interferogram.

The size of the generated dataset is 256×256, and all the values of these generated model parameters are shown in Table 1. We also use random values on some parameters to make the dataset more diverse.

#### 2.4 Neural network architecture

The neural network in the proposed method is a convolutional neural network, whose input and output are the normalized interferogram and the normalized wrapped phase, respectively. The input and the output have the same size of 256×256. The structure of the network is inspired by U-Net [26] and DenseNet [27], as shown in Fig. 3.

The network’s main body is a four-layer U-Net structure, and the processing of each layer is realized through DenseBlock. The U-Net structure scales the feature map multiple times to help the network extract features of different scales. This multi-scale capability is beneficial for the network to deal with fringes of different widths. DenseBlock expands the network’s depth and convolution times, enabling the network to extract higher-level and abstract features. U-Net and DenseBlock make extensive use of concatenation operation, which ensures the transfer of gradients, improves the performance of the network, and shortens the time required for training. We modify ConvBlock in the original implementation of DenseBlock. In our network, the bias of the two Conv2D layers of ConvBlock is enabled. We believe that bias is necessary for regression problems, and practice has proven that the network can obtain better results when the bias is enabled. A Clamp layer is placed at the end of the network to limit the output of the network to [0, 1] and ensure that the network produces outputs with expected values. We also consider that the output range of the network is set to [0, 1] instead of [0, 2π] or [−π, π] to prevent the network from fitting an extra irrational number π and improves the accuracy of the network. Finally, we implemented the network using PyTorch.

#### 2.5 Training

We propose a four-stage strategy to train the proposed neural network with different optimization methods, loss functions, and learning rates at different stages. As shown in Table 2, each group includes 500 steps of training.

The loss functions are defined as follows:

In terms of optimization methods, we first use adaptive moment estimation (Adam) [28] and then switch to stochastic gradient descent (SGD). We also reduce the learning rate in stages. We can achieve better results by combining Adam and SGD than by using them individually [29]. During the training process, a brand-new dataset sample is generated at each training step to avoid overfitting. Each training step contains 20 samples. We use an NVIDIA Quadro M6000 graphics card for the training.

#### 2.6 Unwrapping

The normalized wrapped phase of the neural network output range is [0, 1], and the desired absolute phase can be obtained after unwrapping. Current phase unwrapping algorithms often require a 2π jump. We first multiply the wrapping phase by 2π and then use Miguel Arevallilo Herraez’s fast 2D phase unwrapping method [30] to obtain the final absolute phase.

## 3. Results

After the network training, we used the simulated and actual interferograms to test the proposed demodulation method’s accuracy. We also analyzed the possible sources of error.

The loss function curve during the network training process is shown in Fig. 4. At the end of the loss function curve, the test curve and the training curve maintained the same downward trend. Moreover, no apparent difference was observed between the test loss value and the training loss value, indicating that the network training has not been overfitted.

We used the simulated and actual interferograms to test the trained network and quantitatively evaluate the network’s accuracy. However, the neural network’s output in the proposed method was the wrapped phase of the interferogram. Thus, we unwrapped this phase to obtain the corresponding absolute phase and compare it with the ideal absolute phase. The unwrapping method used in our proposed method is introduced in Section 2.6.

#### 3.1 Simulated interferogram test

We used the simulated interferograms to test the network. We used the interferogram generation method mentioned in Section 2.2 to generate 1000 sets of the simulated interferograms and the corresponding ideal absolute phase. We introduced the 1000 interferograms as input to the trained neural network and unwrapped the wrapped phases of the network output to obtain the predicted absolute phases. Then, we calculated the point-to-point error between the predicted absolute phase and the ideal absolute phase. The distribution of the peak-to-valley (PV) and RMS error for 1000 groups is shown in Fig. 5. The average PV value of the error was 0.6074 rad, and the average RMS value was 0.0289 rad. These values indicated that the proposed method’s RMS accuracy was better than 0.0050 λ under the simulated interferograms.

To show the results intuitively, we selected four groups from 1000 sets of the simulated interferograms: two groups of good results (Fig. 6(a1)-(d1) and Fig. 6(a2-d2)) and two groups of poor results (Fig. 6(a3)-(d3) and Fig. 6(a4)-(d4)). The quantitative demodulation results of these interferograms and the phase point-to-point deviations are shown in Fig. 6.

#### 3.2 Actual interferogram test

In the actual interferogram test, we used the ZYGO DynaFiz interferometer to obtain several actual interferograms of the same tested lens (Thorlabs concave mirror, interferometer standard spherical lens F#/3.3) under different defocus and tilt conditions. Then, we used the phase-shifting function of the ZYGO interferometer to obtain the corresponding absolute phase. We also used the absolute phase measured by the phase-shifting method as the ideal real value of the interferogram phase. We normalized these actual interferograms in accordance with the method mentioned in Section 2.1. Subsequently, we introduced the normalized interferograms as the input to the trained neural network and unwrapped the wrapped phase of the network output to obtain the predicted absolute phase. Finally, we calculated the point-to-point deviation between the predicted absolute phase and the ideal absolute phase.

The absolute phase measured by the ZYGO interferometer initially have 1200×1200 pixels, and the available data area is about 900×900 pixels during actual measurement. The proposed method's input and output both have 256×256 pixels. We scale the ZYGO's interferogram to 256×256 pixels by the bilinear algorithm as the proposed method's input. The absolute phase measured by ZYGO was also scaled to 256×256 pixels by the bilinear algorithm. Then we can compare the results of the proposed method and ZYGO.

The demodulation results and phase point-to-point deviations of the three typical interferograms are shown in Fig. 7. The average PV value of the point-to-point deviation was 1.1597 rad, and the average RMS value was 0.0641 rad. These values indicated that the proposed method’s RMS accuracy achieved 0.0102 λ under the actual interferograms. Besides, the average of the relative RMS error of the measured absolute phase is 1%.

For comparison, the interferograms are demodulated with classical SPD method [6]. The demodulation results and phase point-to-point deviations of the three typical interferograms are shown in Fig. 8. It is obvious that the demodulation result is not correct when the fringes are closed as shown in Fig. 8 (a1)-(d1). Great error will appear at the edge of the interferogram when the added carrier is not great enough as shown in Fig. 8 (a2)-(d2). And a demodulation error with a PV value of greater than 0.27 λ will occur especially near the edge of the good open-fringe interferogram because Fourier transform is sensitive to edges due to truncation effect, as shown in Fig. 8 (a3)-(d3).

#### 3.3 Error analysis

The phase information of a single interferogram was mainly represented by the gradient relationship of the pixel value between adjacent pixels of the interferogram. Consequently, the accuracy of the single-frame interferogram demodulation methods was easily affected by the uneven illumination of the background and the noise of the light source. The decreased accuracy was closely related to the density of the interference fringes. Local noise could easily destroy the gradient relationship of the pixels for the dense fringes due to the limited number of pixels. In this case, the phase information could also be distorted. Sparse fringes were abundant with pixels and sensitive to uneven illumination. The widening of stripes could also decrease the gradient of the pixels, making them more susceptible to noise. Their heightened susceptibility to noise could decrease the demodulation’s accuracy.

The relationship between the phase range of the interferogram and the RMS value of the point-to-point deviation of the absolute phase in the simulated interferogram test is shown in Fig. 9. The proposed method had the best accuracy for the interferograms with moderately dense fringes. The accuracy of demodulation would reduce when the fringes become sparse or dense. However, the decreased accuracy caused by sparse fringes was more evident than that by dense fringes. In general, the accuracy of dense fringes was still better than 0.0080 λ, and the accuracy of sparse stripes was still better than 0.0160 λ.

We also analyzed the error of the proposed method from the perspective of Zernike fitting residuals. We calculated the 37-item Zernike polynomial fitting residuals of the two sets of samples in the actual interferogram test. The results are shown in Fig. 10.

The residual results showed that the proposed method exhibited high-frequency errors related to the distribution of interference fringes. The error was more evident at the brightest and darkest fringes. The error in the sparse region was more significant than that in the dense region, and the error in the bright fringe region was more significant than that in the dark fringe region. These phenomena were in line with our expectations. The gradient of the pixel value at the bright and dark fringes was close to zero, which made them more susceptible to noise. Consequently, the sparse fringes would further reduce the gradient. In our proposed interferogram generation model, the laser light source introduced a Poisson noise. Thus, the pixels with larger grayscale values in the interferogram had a higher noise level. This phenomenon could be the reason why the error at the bright fringes was more significant than that at the darker fringes.

#### 3.4 Generalization analysis

Deep learning approaches often have generalization problems. A neural network has good accuracy only when the input is sufficiently similar to the training set. If the input is too far from the training set, the neural network's accuracy will drop rapidly. A neural network with good generalization ability can maintain its output accuracy on diverse inputs. This section analyzes the proposed method's generalization ability from two aspects of different fringe patterns and variable noise levels.

We regenerate some diverse patterns of simulated interferograms. These interferograms contain not only defocus and tilt aberrations but also high order terms of the Zernike polynomials, such as astigmatism, coma and spherical aberration. So, the fringe patterns in these interferograms are not ideally circular. The proposed method is to be verified by these diverse simulated interferograms. The demodulation results and phase point-to-point deviations of the three typical diverse simulated interferograms are shown in Fig. 11. The average PV value of the point-to-point deviation was 0.5272 rad, and the average RMS value was 0.0279 rad. These values indicated that the proposed method’s RMS accuracy achieved 0.0044 λ under the typical diverse simulated interferograms, which demonstrates the generalization ability of the proposed neural network over different ring-like patterns. However, if the interferogram display characteristics corresponding to pure coma or astigmatism aberration, the network may fail in the phase demodulation.

We also consider the influence of the noise level of the interferogram on the proposed method. When generating the dataset for training, the noise rate *p*_{n} is set to 0.05, which matches the noise level of the actual interferometer used in the experiment. We regenerate a series of simulated interferograms with different noise rates (0.05 to 0.4). Figure 12(a)-(e) shows two typical interferogram patterns with different noise rates.

Moreover, the relationship between the noise rate and the demodulation accuracy is shown in Fig. 12(f). With the increase in the noise rate, the demodulation accuracy of the proposed method is continuously decreasing. When the noise rate is less than 0.3, the proposed method can achieve demodulation accuracy better than 0.01 λ, which indicated that the proposed method can perform high-accuracy demodulation of simulated interferograms whose noise level is higher than the dataset within a specific range.

The above experiments show that the proposed method has a specific generalization ability and can adapt to some interferograms with high order terms of the Zernike polynomials. It also can adapt to interferograms with a specific higher noise level.

## 4. Discussion

#### 4.1 When hardware involved in deep learning

There are two typical technical routes for the hardware-related neural network problem. The difference between the two routes is mainly reflected in the dataset. The first route is the hardware-oriented route wherein the datasets are all real data and from the same hardware. Moreover, the neural network is a dedicated system for the hardware, and it can only work with the hardware. The advantage of this scheme is that the accuracy of the network is good. Given that the dataset is all from the same hardware, it contains the characteristics of the hardware itself, such as the aberration of the optical system, mechanical vibration, and digital camera noise. Using the manual modeling to fit these hardware characteristics entirely and accurately is often complicated, but the neural network can find and adapt to these characteristics. However, the shortcomings of this solution are also apparent. The hardware system is not allowed to change once the network has completed training. Otherwise, it may cause the network to fail. The second route involves the use of a simulated dataset. The simulation of the dataset does not consider the inherent characteristics of specific hardware. The network trained in this way has better compatibility with different hardware, but its performance on specific hardware is not as good as that of the hardware-led route.

The proposed method can be regarded as a compromise between two existing routes. The network used in this study is hardware-oriented. The proposed method uses simulation to generate datasets for specific hardware, but the simulation also considers the specific hardware’s inherent characteristics. In this way, the advantages of the two routes are absorbed, maintaining the high performance dominated by the hardware and accepting the hardware system to change within a specific range. This model considers the diversity of the hardware itself to a certain extent because the dataset in this article is based on an interferogram generation model. For example, the background light intensity parameters in our dataset are obtained by estimating the actual background light intensity of the interferometer hardware. Thus, we use random numbers to make these inaccurate parameters float within a specific range when generating the dataset because those parameters are estimated and inaccurate. The network’s accuracy can be maintained as long as the parameter’s accurate value falls within the floating range. Furthermore, the floating parameters make the network learn the variant hardware features. Consequently, the proposed method can accept a specific degree of hardware changes without retraining the network. For example, the same type of light source in the interferometer can be replaced, or multiple interferometers of the same type can use the same network for mass production. This hardware adaptability is extremely friendly in the production environment.

#### 4.2 Neural network architecture design for regression problems

Deep learning approaches are mainly classified into two significant categories: classification problems and regression problems. The most fundamental difference between the two categories is that regression problems need a quantitative result, but classification problems do not. It is an interesting question about how to improve a neural network's quantitative accuracy. The interferogram demodulation is also a kind of regression problem. We also find some practical ways to improve the neural network's quantitative accuracy during this paper's research. These ways may be useful in other deep learning-based regression problems.

The first point is to make the question as simple as possible. A simple question is always more comfortable for a neural network to obtain good accuracy than a complicated question. For example, we design the neural network outputs the wrapped phase rather than the absolute phase in the proposed method. If we let the neural network directly outputs the absolute phase, the neural network needs to consider two problems simultaneously, the phase retrieval problem and the phase unwrapping problem. Because the phase unwrapping problem has mature solutions, we do not need the neural network to consider it. The proposed neural network can focus on the phase retrieval problem, which may lead to better accuracy.

The second point is using normalized input and output. Normalized input and output have several benefits for accuracy. After the normalization, the inputs have the same grayscale range, which simplifies the question because the neural network does not need to adapt to the dynamic input range. Besides, limiting the neural network output to a fixed range can improve the neural network's numerical stability, thereby improving its accuracy. Because the proposed neural network's output is the wrapped phase, [−π, π] or [0, 2π] is a straight output range. However, [0, 1] is a better choice. The normalized range [0, 1] can prevent the network from fitting an extra irrational number π and improve the neural network's accuracy.

The last point is to ensure that the neural network has enough receptive field. Because the sparse fringes in interferograms should have more pixels than the dense ones, the neural network may not handle the sparse fringes correctly if the receptive field is not sufficient. The U-Net and DenseNet structures in the proposed neural network enlarge the network's receptive field, which improves the network's accuracy.

#### 4.3 Interferogram and projection sinusoidal fringe pattern

The interferogram and projection sinusoidal fringe patterns have many similarities. Both are 2D sinusoidal fringe patterns. Some methods can also be used in two scenarios simultaneously, such as the phase-shifting method. However, the interferogram is still different from projection sinusoidal fringe patterns. Thus, some personalized issues are still needed to consider when exploring neural networks. First, the fundamental generation mechanism of the interferogram and projection sinusoidal fringe patterns are different. Interference is a natural physical phenomenon, but projection sinusoidal fringe patterns are artificially generated. The direction and the width of the stripes can be artificially fixed when projecting sinusoidal stripes. Projecting straight stripes can fundamentally avoid the appearance of closed fringes, which significantly reduces the complexity of projection sinusoidal fringe patterns. In the interferogram, the fringes’ direction and width are variant, and the appearance of closed fringes cannot be controlled artificially. Consequently, the characteristics of the interferogram are more complicated than projection sinusoidal fringe patterns. More feature maps, deeper networks, and sufficient multi-scale capabilities are often required when dealing with the interferogram. These requirements are reflected in the problem of neural networks and can lead to a more complex network structure.

#### 4.4 Phase sign uncertainty in interferogram demodulation

The singe-frame interferogram demodulation methods inherently have the limitation of uncertain phase signs. The demodulation aims to recover $\Delta \phi $ from $\cos (\Delta \phi )$, but the cosine function degenerates the signs of $\Delta \phi $ in some quadrants. It is impossible to determine whether $\cos (\Delta \phi )$ corresponds to $\Delta \phi $ or $- \Delta \phi $ through a single-frame interferogram. Our proposed method also inherits this limitation. During the demodulation, we consider $\Delta \phi $ and $- \Delta \phi $ simultaneously by introducing correlation coefficients to deal with the problem of uncertain phase sign. In practical interferometry mensuration, a little phase-shifting or a priori way can determine the phase sign.

#### 4.5 Resolution of the proposed method compared with the PS method

The interferogram demodulation method's spatial resolution is fundamentally limited by the number of pixels in the interferogram. For interference systems with the same optical aperture, the spatial resolution will generally be higher if the detector has more pixels. The proposed method is based on deep learning technology, which contains complicated calculations. A graphic card with a graphics processing unit (GPU) is usually needed in the deep learning approach. The graphics card memory size limits the interferogram's size that can be processed, so we can only handle interferograms of 256×256 size at present. Under the current hardware technology conditions, graphic card memory is still a kind of expensive resource, which may practically limit the size of the interferogram that the proposed method can handle. Nowadays, a powerful graphics card is much expensive than a medium performance central processing unit (CPU). In contrast, the computational complexity and hardware requirements of the PS method are much simpler. The PS method can be easily implemented in most of medium performance CPU. So, the PS method can easier handle a large-size interferogram. From this perspective, in practice, the PS method is easier to achieve higher spatial resolution under a limited computational hardware condition.

Moreover, the proposed method only uses a single-frame interferogram, which makes it more sensitive to image noises and uneven background illumination than the PS method. In principle, a single-frame interferogram is more likely to lose some local or high-frequency phase information in the noise. This property will both reduce the spatial resolution and the vertical resolution of the proposed method.

In summary, we think the theoretical accuracy of the proposed method will not be greater than the PS method. The proposed method is more suitable for the occasions requiring both high-accuracy and dynamic measurement.

#### 4.6 Limitations of the proposed method

The network can only deal with interferograms generated by closed or non-closed defocus aberrations because our current dataset only contains defocus and tilt aberrations. Increasing the diversity of the phase distribution of the dataset can make the proposed method applicable to more interferograms with different phase distributions, which may be a future research direction. The network can only handle interferograms of 256×256 size because of the graphics card memory limitations. However, the proposed method does not limit the size of the interferogram in principle. It can handle large sizes of interferograms as long as a powerful graphics card is used.

## 5. Conclusions

We propose a high-accuracy singe-frame interferogram demodulation method based on deep learning. The proposed method is compatible with close fringes with variant density. We also propose a generation model for the simulated interferograms to generate datasets for the training of the neural network. Then, we discuss the neural network structure and its corresponding training strategy. The test results show that the proposed method’s RMS accuracy is better than 0.005 λ for the simulated interferogram and 0.0102 λ for the actual interferogram. We also find that the proposed method has errors related to the distribution of interference fringes, which are difficult to avoid in a single-frame interferogram demodulation. We aim to increase the dataset’s diversity in the future to make the proposed method more adaptable to more complex interferograms.

## Funding

National Natural Science Foundation of China (61705008, 51735002).

## Acknowledgments

The authors appreciate the valuable suggestions from Dr. Shaopu Wang.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **I. Trumper, B. T. Jannuzi, and D. W. Kim, “Emerging technology for astronomical optics metrology,” Optics and Lasers in Engineering **104**, 22–31 (2018). [CrossRef]

**2. **P. Hariharan, B. F. Oreb, and T. Eiju, “Digital phase-shifting interferometry: a simple error-compensating phase calculation algorithm,” Appl. Opt. **26**(13), 2504–2506 (1987). [CrossRef]

**3. **G. Lai and T. Yatagai, “Generalized phase-shifting interferometry,” J. Opt. Soc. Am. A **8**(5), 822–827 (1991). [CrossRef]

**4. **C. Zuo, S. Feng, L. Huang, T. Tao, W. Yin, and Q. Chen, “Phase shifting algorithms for fringe projection profilometry: A review,” Optics and Lasers in Engineering **109**, 23–59 (2018). [CrossRef]

**5. **Q. Liu, L. Li, H. Zhang, W. Huang, and X. Yue, “Simultaneous dual-wavelength phase-shifting interferometry for surface topography measurement,” Optics and Lasers in Engineering **124**, 105813 (2020). [CrossRef]

**6. **X. Su and Q. Zhang, “Dynamic 3-D shape measurement method: A review,” Optics and Lasers in Engineering **48**(2), 191–204 (2010). [CrossRef]

**7. **K. Qian, “Two-dimensional windowed Fourier transform for fringe pattern analysis: Principles, applications and implementations,” Optics and Lasers in Engineering **45**(2), 304–317 (2007). [CrossRef]

**8. **J. Zhong and J. Weng, “Spatial carrier-fringe pattern analysis by means of wavelet transform: wavelet transform profilometry,” Appl. Opt. **43**(26), 4993–4998 (2004). [CrossRef]

**9. **D. Malacara, * Optical shop testing*, 3rd ed. (John Wiley & Sons, Inc, 2007).

**10. **Z. Ge, F. Kobayashi, S. Matsuda, and M. Takeda, “Coordinate-transform technique for closed-fringe analysis by the Fourier-transform method,” Appl. Opt. **40**(10), 1649–1657 (2001). [CrossRef]

**11. **C. Tian, Y. Yang, S. Zhang, D. Liu, Y. Luo, and Y. Zhuo, “Regularized frequency-stabilizing method for single closed-fringe interferogram demodulation,” Opt. Lett. **35**(11), 1837–1839 (2010). [CrossRef]

**12. **J. Muñoz-Maciel, F. J. Casillas-Rodríguez, M. Mora-González, F. G. Peña-Lecona, V. M. Duran-Ramírez, and G. Gómez-Rosas, “Phase recovery from a single interferogram with closed fringes by phase unwrapping,” Appl. Opt. **50**(1), 22–27 (2011). [CrossRef]

**13. **B. Li, L. Chen, J. Bian, and J. Wang, “Processing technology for closed fringe interferogram using phase stitching,” Infrared Laser Eng. **40**, 674 (2011). (in Chinese).

**14. **K. Li and K. Qian, “Improved generalized regularized phase tracker for demodulation of a single fringe pattern,” Opt. Express **21**(20), 24385–24397 (2013). [CrossRef]

**15. **U. H. Rodriguez-Marmolejo, M. Mora-Gonzalez, J. Muñoz-Maciel, and T. A. Ramirez-delreal, “FSD-HSO Optimization Algorithm for Closed Fringes Interferogram Demodulation,” Math. Probl. Eng. **2016**, 1–11 (2016). [CrossRef]

**16. **J. Muñoz-Maciel, V. M. Duran-Ramírez, M. Mora-Gonzalez, F. J. Casillas-Rodriguez, and F. G. Peña-Lecona, “Demodulation of a single closed-fringe interferogram with symmetric wavefront and tilt,” Opt. Commun. **436**, 168–173 (2019). [CrossRef]

**17. **L. Zhang, S. Zhou, J. Li, and B. Yu, “Deep neural network based calibration for freeform surface misalignments in general interferometer,” Opt. Express **27**(23), 33709–33729 (2019). [CrossRef]

**18. **G. E. Spoorthi, S. Gorthi, and R. K. S. S. Gorthi, “PhaseNet: A Deep Convolutional Neural Network for Two-Dimensional Phase Unwrapping,” IEEE Signal Process. Lett. **26**(1), 54–58 (2019). [CrossRef]

**19. **K. Wang, Y. Li, K. Qian, J. Di, and J. Zhao, “One-step robust deep learning phase unwrapping,” Opt. Express **27**(10), 15100–15115 (2019). [CrossRef]

**20. **J. Zhang, X. Tian, J. Shao, H. Luo, and R. Liang, “Phase unwrapping in optical metrology via denoised and convolutional segmentation networks,” Opt. Express **27**(10), 14903–14912 (2019). [CrossRef]

**21. **T. Zhang, S. Jiang, Z. Zhao, K. Dixit, X. Zhou, J. Hou, Y. Zhang, and C. Yan, “Rapid and robust two-dimensional phase unwrapping via deep learning,” Biomed. Opt. Express **27**(16), 23173–23185 (2019). [CrossRef]

**22. **C. Wu, Z. Qiao, N. Zhang, X. Li, J. Fan, H. Song, D. Ai, J. Yang, and Y. Huang, “Phase unwrapping based on a residual en-decoder network for phase images in Fourier domain Doppler optical coherence tomography,” Biomed. Opt. Express **11**(4), 1760–1771 (2020). [CrossRef]

**23. **D. Kando, S. Tomioka, N. Miyamoto, and R. Ueda, “Phase extraction from single interferogram including closed-fringe using deep learning,” Appl. Sci. **9**(17), 3529 (2019). [CrossRef]

**24. **S. Feng, Q. Chen, G. Gu, T. Tao, L. Zhang, Y. Hu, W. Yin, and C. Zuo, “Fringe pattern analysis using deep learning,” Adv. Photonics **1**(02), 1–7 (2019). [CrossRef]

**25. **G. Qiao, Y. Huang, Y. Song, H. Yue, and Y. Liu, “A single-shot phase retrieval method for phase measuring deflectometry based on deep learning,” Opt. Commun. **476**, 126303 (2020). [CrossRef]

**26. **O. Ronneberger, P. Fischer, and T. Brox, “* U-Net: Convolutional Networks for Biomedical Image Segmentation*,” (Springer International Publishing, 2015), pp. 234–241.

**27. **G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)4700–4708 (2017).

**28. **D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 (2014).

**29. **N. Shirish Keskar and R. Socher, “Improving Generalization Performance by Switching from Adam to SGD,” arXiv:1712.07628 (2017).

**30. **M. A. Herráez, D. R. Burton, M. J. Lalor, and M. A. Gdeisat, “Fast two-dimensional phase-unwrapping algorithm based on sorting by reliability following a noncontinuous path,” Appl. Opt. **41**(35), 7437–7444 (2002). [CrossRef]