Two-dimensional phase unwrapping based on U<sup>2</sup>-Net in complex noise environment

Jie Chen; Yong Kong; Dawei Zhang; Yinghua Fu; Songlin Zhuang

doi:10.1364/OE.500139

1. Introduction

Numerous imaging or measurement techniques, including optical interferometry (OI) [1], magnetic resonance imaging (MRI) [2], fringe projection paper measurements (FPP) [3], and interferometric synthetic aperture radar (InSAR) [4,5], among others, depending on the computation of absolute phase. Typically, the phase of the wavefront, which is computed directly using the inverse tangent function, is wrapped in the range of $[-\pi,\pi ]$ [6]. Therefore, phase unwrapping (PU) must be carried out to acquire the absolute phase. According to Eq. (1), the relationship between the wrapped phase $\varphi (x,y)$ and the absolute phase $\Psi (x,y)$ is satisfied as follows:

(1)$$\Psi(x,y) = \varphi(x,y) + 2\pi k(x,y)$$

where $(x,y)$ is the coordinate of the pixel, and $k$ is the wrap count. In the last decade or so, many PU methods have been proposed, which are divided into the following three main categories: path-tracing algorithms [7–9], path-independent algorithms with minimum-parametric solutions [10,11], and deep learning-based methods [12–35], etc.

The fundamental idea behind the path-tracing method is to do a line integration of the wrapped phase difference along an appropriate and trustworthy integration path that has been chosen with the aid of the Itoh condition. Thus, the goodness of the unwrapped phase of the path-tracing method depends on the optimal path choice. It is difficult to determine a suitable integration path when there is high-level noise or a complex structure in the wrapped phase. Based on global optimization, the minimum-norm method aims to reduce the difference between the wrapped phase gradient produced by the Itoh condition and the genuine phase gradient. Although the minimum-norm method is noise-resistant, the computationally demanding parameters make them unsuitable for real-time measurements. In short, the core of the traditional spatial PU methods, such as the path-tracing method and the minimum parametric method, is to avoid the negative impact of invalid points in the wrapped phase map as much as possible, so they are only suitable for normal cases where the noise is not too severe, or the phases are continuous and not mixed.

In recent years, the rapid development of deep learning has made great progress in many fields. Compared to traditional methods, deep learning-based approaches have demonstrated exceptional problem-solving abilities. In pixel-level computer vision tasks, convolutional neural network (CNN) based deep learning methods tend to outperform other methods. PU is also considered an image-to-image mapping problem, so CNNs can also be used for PU. Jin [12] was the first to propose using deep networks to learn the input-to-output mapping connection from paired datasets. It makes it possible to perform PU in extreme circumstances, such as when the Itoch criterion is not met [36].

Dardikman et al. [13] demonstrated that CNNs can solve the problem of steep phase gradients by unwrapped phase images with steep spatial gradients with a CNN based on residual blocks. Wang et al. [14] proposed a residual UNet network called DLPU-Net by visualizing and analyzing the generalization ability of the network through the intermediate layer for the first time and verified the superiority of this network for PU in extreme cases such as high-level noise and phase aliasing by comparing the results with traditional algorithms. Qin et al. [15] first proposed VUR-Net by alternating two different residual blocks in each layer of UNet, significantly improving PU performance. Spoorthi et al. [17] proposed a PU network based on PhaseNet [16], which requires no post-processing and is highly robust to noise. Zhang and Li [22] proposed the edge-enhanced self-focus network EESANet for PU, which maintains the long-range dependence in PU by spatial pyramid pooling and position self-attentiveness mechanism, and combines edge-enhanced blocks to enhance the effective edge features. Based on the structure of VUR-Net, Zhao et al. [32] proposed a new VDE-Net by fusing dilated convolution and a weighted jump-edge attention mechanism to improve the noise immunity and under-sampling resistance of the network model. Zhou et al. [33] developed a 2-D PU method called PU-GAN that uses generative adversarial networks to process InSAR images in a single step.

Most of the methods mentioned above have achieved excellent results in PU with noise, but they are still limited to the single noisy environment of Gaussian white noise. In the real PU of digital holography, there exists more non-statistical speckle noise, and these depth networks designed for Gaussian white noise are not necessarily effective in removing the speckle noise. Therefore, to accurately fit the real phase data with noise, this paper utilized the methodology outlined in Ref. [37] by constructing a 4-f optical double diffraction system and performing a numerical simulation of the speckle phase data. Drawing on the source code of Ref. [38], 12,000 real phase images were generated, and the corresponding unwrapped images were obtained using Eq. (2), of which 10,000, 1,000, and 1,000 were used for training, validation, and testing, respectively. In addition, randomly distributed Gaussian white noise (standard deviation of the noise is 0.025-0.20) and speckle noise (2 pixels per speckle grain) were added to the absolute phase to form our training set, respectively.

When performing PU under various ambient noises, it is found that U$^2$-Net with a nested U-Net structure is robust to noise. This paper first proposes applying U$^2$-Net to 2-D PU. The overall structure of U$^2$-Net is similar to that of U-Net, except that a new residual U-type block (RSU) is designed to replace each convolutional layer in the network, and this new convolutional block can extract intra-level multi-scale features without reducing the resolution of the feature map. Meanwhile, borrowing the idea of deep supervision from HED [39] networks, U$^2$-Net fuses the outputs of each upsampling layer to generate the final unwrapped image. Since most of the feature extraction processes in U$^2$-Net are focused on downsampling, they do not incur a large computational cost but have a significant memory occupation rate. Therefore, to further reduce the memory consumption and computational cost, this work fixes the number of intermediate channels in most of U$^2$-Net and proposes a lightweight U$^2$-Netp with only 1.13MB memory consumption, which still achieves the high performance of PU accuracy.

In this experiment, our research team compared our proposed network to existing models such as U-Net [40], DLPU-Net [14], VUR-Net [15], and PU-GAN [33]. We evaluated the PU results quantitatively based on criteria such as mean square error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) in the presence of various types and levels of noise interference. A training test was conducted on the wrapped phase, with varying heights, while subjected to high noise interference levels. This exercise aimed to eliminate any possible impact the phase height could have on the model’s unwrapping performance. Finally, the trained model was qualitatively assessed by conducting tests on 421 wrapped phase images of dynamic candle flames. The evaluation process involved analyzing the number of parameters, floating point operations, and PU speed [14]. The experimental findings reveal the significant advantages of our proposed U$^2$-Net in terms of noise immunity and structural stability, which fully demonstrate the outstanding performance of the structure in terms of model size, model generalization, and model robustness.

The rest of this paper is organized as follows. Section 2 describes the data generation method, the three evaluation metrics, the network structure of U$^2$-Net, the loss function, and the application details. Section 3 presents the experiments and results. Section 4 concludes the paper.

2. Proposed method

2.1 Data generation

Deep learning-based techniques often need a sea of data to train on, but gathering plenty of wrapped phases and matching data for unwrapped phases in actual measurements is challenging. Therefore, this research needs to construct the dataset by numerical simulation techniques. Three data generation methods are described in Ref. [38]: random matrix amplification (RME), Gaussian function superposition (GFS), and Zernike polynomial superposition (ZPS). They mainly consist of two consecutive processes. A random phase is generated, which is then linearly scaled to the range. In this paper, the range of values for h is fixed to be between 10 and 40 pixels. However, the range can be adjusted in practical applications to suit the requirements. The three methods vary only in the first step of generating the random phase, while the linear scaling in the second step remains unchanged. This paper utilizes RME to create the absolute phase, as illustrated in Fig. 1. RME can consider various types of data distributions (uniform or Gaussian) and matrix sizes (ranging from 2x2 to 8x8) and can also scale matrices using diverse interpolation techniques (such as bilinear or bicubic interpolation) [14]. Ref. [37] outlines simulating spectral patterns and generating speckles using a 4-f optical double diffraction system. This method inserts a specific radius diaphragm into the optical path to regulate the size of speckle particles in the simulated phase data. To achieve this, Ru is adjusted, where $Ru = 1/Nspx$ (Ns represents the pixel of each speckle grain, and px is the pixel pitch in the image plane).

Fig. 1. Four random square matrices and the corresponding generated real phases. (a) 2 $\times$ 2 matrix to real phase;(b) 3 $\times$ 3 matrix to real phase;(c) 5 $\times$ 5 matrix to real phase;(d) 10 $\times$ 10 matrix to real phase.

Download Full Size | PDF

Finally, we generated 12,000 absolute phase images using RME with the highest generalization performance and obtained the corresponding wrapped phase images by Eq. (1). The noteworthy point is that to eliminate the effect of different phase heights on the model, we generated the dataset so that 50% of the data is within 2/3 of h, 20% of the data is between 2/3 and 5/6 of h, and 30% of the data is between 5/6 and 1 of h. The range of h is set to [10, 40]. In addition, to verify that the framework in this paper still has high unwrapping performance at high noise levels, we also generate a separate set of high-level noise groups with a noise level of 0.5, which is the highest level in the anti-noise performance test [14].

2.2 Evaluation criteria

We use the three most popular evaluation metrics for each data set: mean square error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity to assess the unwrapping performance of noisy data with various signal-to-noise ratios (SSIM). MSE measures how much the phase values of the output image deviate from those in the labeled image; the lower the deal, the higher the accuracy of this network’s PU. A measure of image quality based on error sensitivity, PSNR is the ratio of peak signal energy to average noise energy and MSE. The higher the PSNR, the higher the quality of the output image. However, PSNR does not consider the visual features of human eyes, which may not be consistent with subjective human experience. Therefore, SSIM is designed to quantify image similarity and structural stability from three aspects: image brightness, contrast, and structure. In a practical setting, sliding windows can chunk the images. The mean value is then utilized as the structural similarity measure for the two images after Gaussian weighting is applied to determine each window’s mean, variance, and covariance. Here is how the three metrics are computed:

(2)$$MSE = \frac{1}{N} \sum_{i = 1}^n(y_i-y'_i)^2$$

(3)$$PSNR = 10\log_{10} \frac{MAX^2}{MSE}=20\log_{10} \frac{MAX}{MSE}$$

(4)$$SSIM(\psi,\psi_u) = \frac{(2\mu_x\mu_y+c_1)(\sigma_{xy}+c_2)}{(\mu_x^2\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}$$

where $y_i$ represents the phase value of ground truth (GT) and $y'_i$ represents the phase value of the unfolding result. MAX usually represents the gray level of the image, which is taken as 255 here. $\psi _u$ represents the estimate of the unwrapped phase $\psi$. $\mu _x$ and $\mu _y$ represent the mean values of GT and network unfolding result, respectively. similarly, $\sigma _x^2$ and $\sigma _y^2$ represent the variance of x and y, respectively, $\sigma _x$ and $\sigma _x$ represent the covariance of x and y, respectively. $c_1$ and $c_2$ are the constants used to maintain stability, and the expressions are as follows:

(5) $$c_1 = (k_1L)^2, c_2 = (k_2L)^2$$

where L represents the dynamic range of the phase value, $k_1$ is generally taken as 0.01, and $k_2$ 0.03.

The definition of SSIM shows that while SSIM can represent the overall degree of similarity between images, it lacks a measure of the degree of difference between corresponding pixel values. It should be noted that the pixel values of the unwrapped image are critical for later processing and analysis. To evaluate the unwrapped results, this research employs the binary error map (BEM) [15], used in biomedical imaging to quantify biological cells’ thickness and phase values. The BEM can be mathematically expressed as follows:

(6)$$\begin{cases} 1, |\psi_u(x,y)-\psi(x,y)|\le[\psi(x,y)-min(\psi(x,y))]\times5{\%} \\ 0, otherwise \\ \end{cases}$$

where the operator min($\cdot$) represents taking the minimum. With the aid of BEM, it is easy to obtain the proportion among the total pixels that the correctly unwrapped pixels (CUPs) account for, and we refer to it as the accuracy of unwrapping (AU) [15]. Assuming the real phase, the wrapped phase, and the BEM have an identical size of M $\times$ N, the AU can be formularized as:

(7)$$AU = \frac{\sum_{x=1}^{X}{\sum_{y=1}^{Y} {BEM(x,y)}}}{X\times Y} \times 100{\%}$$

where X and Y represent the numbers of rows and columns of the unwrapped phase image, respectively.

In the following discussion, these metrics are combined to provide a comprehensive estimate of the output results of PU for different methods to facilitate a different perspective on the superior performance of our approach.

2.3 Structure of U$^2$-Net

2.3.1 Residual U-block

In the complex noise environment of a two-dimensional PU, Gaussian white noise exists with Gaussian statistics and scattered noise with non-Gaussian statistics. The depth network obtained by training the phase data wrapped by additive Gaussian noise is not necessarily adapted to the phase wrapped by multiplicative speckle noise. Therefore, exploring the PU networks in complex noise environments is necessary. Among the spatial PU methods based on deep learning in recent years, plain convolutional blocks, ResNet blocks or DenseNet blocks, etc., are primarily used in CNNs. Plain convolutional blocks extract features by downsampling; ResNet blocks effectively avoid the problem of gradient explosion or gradient disappearance by short jump connections; DenseNet blocks achieve multiple reuse of feature layers by dense jump connections. In computer vision, the Inception block proposed in 2019 [41] extracts more local features by expanding the perceptual field using dilated convolution, but dilated convolution requires too much computation and storage space. PoolNet [42] proposed in the same year, uses a pyramidal pooling module (PPM) with parallel configuration to enlarge the receptive field of the whole network. Still, the feature maps obtained by directly fusing upsampling and concatenation operations may have the problem of degrading high-resolution features. Through many comparison experiments, we found that the codec structured U-Net has powerful feature extraction ability, so can we design a residual block shaped like U-Net to extract features?

We propose a new residual RSU block to capture intra-level multi-scale features. The structure of the residual RSU block is shown in Fig. 2, which consists of three main parts: the input convolutional layer, a symmetric U-Net-like codec structure with height L, and the residual connection.

Fig. 2. The architecture of the residual RSU block

Download Full Size | PDF

The input convolution layer converts the input features x into the intermediate mapping F1(x); the codec structure takes F1(x) as input and learns to extract and encode multi-scale contextual information to obtain U(F1(x)). Therefore, the larger L is, the deeper the residual RSU block is, the larger the receptive field is, and the RSU block can extract richer local and global features. Drawing on the residual connection of ResNet [43]: x+ F2(F1(x)), the RSU block connects local features and multi-scale features by summation to form a new residual connection: F1(x) + U(F1(x)) [44]. The difference between the RSU block and the standard residual block lies in the fact that RSU replaces the plain convolution with a U-Net-like structure and the original features with weighted multi-scale features. Another point worth noting is that the computational cost of the U-shaped structure is small since most of the operations are focused on downsampling the feature map.

2.3.2 Structure of U$^2$-Net

U$^2$-Net is a two-stage nested U-shaped structure, as shown in Fig. 3. U$^2$-Net comprises 11 residual RSU blocks, including a six-stage encoder, a five-stage decoder, and a depth-supervised feature map fusion module, forming a large U-shaped nested structure. This U-shaped nested structure can make the extraction of intra-stage multi-scale features and aggregation of inter-stage multi-scale features more effective. Meanwhile, depending on the height (L), U$^2$-Net uses five different sizes of RSU blocks, as shown in Fig. 3. It is worth noting that the left four RSU-7 to RSU-4 only change the value of L, and the coding and decoding process remains the same. The RSU-4F is a new residual RSU block designed separately, which can effectively avoid the problem of excessive downsampling leading to a partial loss of contextual information. The "F" stands for dilated convolution instead of pooling and upsampling operations in ordinary RSU blocks.

Fig. 3. Illustration of the U$^2$-Net architecture

Download Full Size | PDF

Fig. 4. Comparison of the unwrapped results of an example of the model application. (a) A wrapped phase image from the test set; (b) GT; (c1)U-Net output; (c2)DLPU-Net output; (c3)VUR-Net output; (c4)PU-GAN output; (c5) U$^2$-Net output; (c6) U$^2$-Netp output; (d1) BEM between (c1) and (b); (d2) BEM between (c2) and (b); (d3) BEM between (c3) and (b); (d4) BEM between (c3) and (b); (d5) BEM between (c5) and (b); (d6) BEM between (c6) and (b); (e1) U-Net loss plot; (e2) DLPU-Net loss plot; (e3) VUR-Net loss plot; (e4) PU-GAN loss plot; (e5) U$^2$-Net loss plot; (e6) U$^2$-Netp loss plot.

Download Full Size | PDF

In the six-stage encoder of U$^2$-Net, Encoder1-Encoder4 use the residual blocks RSU-7, RSU-6, RSU-5, and RSU-4, respectively. The input feature maps in Encoder5 and Encoder6 have lower resolution and are encoded using RSU-4F. Similarly, in the five-stage decoder, Decoder1-Decoder4 have the same structure as the residual RSU blocks of the first four encoders. Decoder5 uses the same structure as the residual block RSU-4F in Encoder5 for decoding. According to Fig. 3, each decoder takes the concatenation of the upsampled feature maps from its previous stage and those from its symmetric encoder stage as the input. It is worth noting that our proposed U$^2$-Net and U$^2$-Netp are identical, except that U$^2$-Netp solidifies the output channels of the network’s layers to 64, the intermediate channels to 16, and the input channels to 64 except for the first layer. Due to the constant number of channels, the computation of U$^2$-Netp is substantially lower, with minimal memory use, fulfilling the lightweight goal. In the feature map fusion module, using the deep supervision concept from HED [39], U$^2$-Net first generates six unwrapped pure phase images (Sup1-Sup6) using a 3–3 convolution and sigmoid function at Encoder6 and Decoder1-Decoder5 levels, and then progressively upsamples to the size of the input wrapped phase maps and pieces them together as Sup0 using a cascade operation. The final unwrapped image ($S_{final}$) is generated by 1$\times$1 convolution and sigmoid function.

2.3.3 Loss function

Since an intensely supervised network structure is used, the training loss is expressed as the composite loss function L:

(8)$$L = \sum_{m = 1}^M w_{side}^{(m)}l_{side}^{(m)} + w_{final}l_{final}$$

where $l_{side}^{(m)}$ (M = 6, as Sup1, Sup2,…, Sup6 in Fig. 3) is the loss of the side output unwrapped phase diagram $S_{side}^{(m)}$, while $l_{final}$ is the loss of the final fused output unwrapped phase diagram $S_{final}$. $w_{side}^{(m)}$ and $w_{final}$ are the weights of each loss term. For each term $l$, this research uses MAE loss to calculate as follows:

(9)$$l ={-}\sum_{r,c}^{(H,W)} |P_{G(r,c)}-P_{S(r,c)}|$$

where $(r,c)$ is the pixel coordinates and $(H,W)$ is the image height and width. $P_{G(r,c)}$ and $P_{S(r,c)}$ denote the pixel values of GT and the predicted unwrapped phase maps, respectively. The training process minimizes the overall loss of $L$. During testing, the fused output is chosen as the final unwrapped image. Moreover, U-Net, DLPU-Net, and VUR-Net used for comparison all use MAE loss, and PU-GAN uses its unique L1 loss and Adversarial loss.

2.4 Application details

During training, all wrapped images and corresponding labels were adjusted to 256 $\times$ 256. Since U$^2$-Net was trained from scratch, all convolutional layers were initialized by Xavier [45], loss weights $w_{side}^{(m)}$ and $w_{final}$ were both set to 1, and the optimizer used Adam [46], which used default values for all its parameters (initial learning rate lr=1e-3, betas=(0.9, 0.999), weight decay=0). For consistency, all comparison networks’ learning rates and weights are the same as U$^2$-Net. The networks proposed in this paper were all applied to a Python (Python 3.7) environment with the PyTorch framework, and all training and testing were implemented on a single CPU and NVIDIA GeForce GTX 2080Ti GPU.

3. Experimental results and discussion

This section performs qualitative and quantitative tests on simulated data samples under different noise environments. First, feasibility and accuracy tests are performed on noise-free data samples. Then, under the complex noise environments of additive Gaussian white noise (noise standard deviation from 0.025 to 0.2) and multiplicative speckle noise (2 pixels per speckle grain), the U$^2$-Net and U$^2$-Netp proposed in this paper are compared with the existing U-Net [40], DLPU-Net [14], VUR-Net [15], and PU-GAN [33]. To eliminate the effect of phase height on PU performance under high-level noise interference, this research was trained and tested individually on simulated samples with a noise standard deviation of 0.5 and compared the number of parameters, floating point operations, and PU speed of different network models. Finally, the wrapped phase images of 421 dynamic candle flames consisting of different arrangements of pits, different shapes of grooves, and different shapes of tables supplied by Ref. [14] were tested. All test results are evaluated overall using MSE, PSNR, and SSIM metrics. The experimental results amply show the advantages of the network proposed in this paper regarding model anti-noise performance, model generalization, and model robustness.

3.1 Feasibility and accuracy test

To demonstrate the feasibility and accuracy of the proposed network, training tests were conducted on noise-free samples. Figure 4 compares the phase unwrapped results of the above four network models and the two network models proposed in this paper. Figure 4(a) shows the wrapped phase. Figure 4(b) shows the GT. Figure 4(c) shows the unwrapped results of U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp, respectively, and Fig. 4(d) shows the BEMs of the six networks. The output results from Figs. 4(c1)–4(c6) show that the deep networks trained with a large amount of data all have high unwrapping performance, confirming the feasibility of converting PU to a semantic segmentation task. As shown in Fig. 4(d), there are still specific errors in the unwrapped results of the first four networks; the first three networks with U-Net as the backbone can roughly distinguish the boundaries of phase differences but not accurately. The unwrapped results of PU-GAN based on generative networks are closer to GT but cannot avoid the random artifacts produced by the generators. In contrast, the unwrapped results of U$^2$-Net and U$^2$-Netp proposed in this paper are almost the same as GT, and the BEMs in Fig. 4(d5) and Fig. 4(d6) are also negligible. Figure 4(e) shows the loss plots of the training and validation processes for each of the six networks. In this case, the blue line segment represents the training loss, and the orange represents the validation loss. The test losses of the U$^2$-Net and U$^2$-Netp proposed in this paper almost overlap with the training losses. Even if there are fluctuations, they are within a small error. This phenomenon is also reflected in the evaluation metrics of Table 1.

Table 1. Comparison of M_MSE, M_PSNR, and M_SSIM over the test set unwrapped by the U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp.

View Table | View all tables in this article

Table 1 gives the mean values of MSE (M_MSE), PSNR (M_PSNR), and SSIM (M_SSIM) of the six network models for 1000 test samples. As shown in Table 1, the original U-Net without the residual block is much lower than the other networks in terms of performance in all three metrics. As the depth of the network deepens, the plain convolutional blocks in U-Net may have the problem of gradient disappearance or gradient explosion, making the network unable to converge. The M_MSE error of DLPU-Net is higher, and M_PSNR is lower, but M_SSIM is higher than PU-GAN and close to VUR-Net, which indicates that the PU accuracy of DLPU-Net using the ordinary residual structure is average, but does not unduly destroy the integrity of the image structure. In comparison, PU-GAN has better unwrapping performance but may lose some structural information since the generator test pictures are unstable, resulting in lower M_SSIM values. The VUR-Net, which combines two different residual blocks and uses U-Net as its backbone, effectively absorbs the benefits of VGG, U-Net, and Res-Net and achieves relatively high accuracy in unwrapping performance and structural stability. Inspired by VUR-Net, this paper proposes U$^2$-Net composed of U-like RSU residual blocks, which has much higher PU performance and structural stability than the above four network models, especially M_SSIM, which is approximately equal to 1 for U$^2$-Net and U$^2$-Netp after four decimal places, proving that the proposed network has excellent structural stability.

3.2 Anti-noise performance test

Noise has a significant impact on PU performance. Therefore, this study proposes to test the anti-noise performance of deep networks in a complicated noisy environment. First, we simulate the additive Gaussian white noise, which is the most popular in the anti-noise performance test, and the multiplicative speckle noise, which is the most prevalent in the actual phase. Then the two noise environments compare the PU results of different depth networks.

3.2.1 Anti-Gaussian white noise test

The phase images unwrapped by U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp under additive Gaussian white noise interference are presented in Fig. 5. Figure 5(a) shows the wrapped phase image with Gaussian noise, with the data directly below the image showing the Gaussian noise level, which increases linearly from 0.025 to 0.2. Figure 5(b) shows the corresponding GT. Figures 5(c)–5(h) show the unwrapped results using U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp. The corresponding SSIM values are shown below. As shown in Figs. 5(c)–5(h), the SSIM values of each network do not decrease monotonically with increasing noise levels as the Gaussian noise grows linearly but fluctuate within a range of values, which reflects the excellent anti-Gaussian white noise performance of the deep network model. Figures 5(i)–5(n) show the BEMs of six networks, and the corresponding AUs are shown below. In Figs. 5(c)–5(h), the BEMs do not increase linearly with the noise but remain fluctuating, and this result is unpredictable. Combined with the fluctuating phenomenon of SSIM values in Figs. 5(c)–5(h), it is speculated that there may be two reasons for this: One is that the anti-noise performance test conducted at this stage contains only Gaussian white noise, which does not completely replace the complex noise environment in the real PU, and the imperfect noise dataset may also cause the deep model not to learn all types of noise features. The second is that the data-driven deep network model has a black box characteristic, in which there are some illogical feature mapping relationships in the deep network model. However, the training of large amounts of data and iterations of the network structure can infinitely converge to such mapping relationships. Therefore, the fluctuation phenomenon may be related to the incomplete training of the model. Throughout the six network unwrapped results of and the corresponding BEM, the U$^2$-Net and U$^2$-Netp proposed in this work show better performance than other models under different levels of noise interference.

Fig. 5. Anti-Gaussian white noise test. (a) Network input (noisy wrapped phase); (b) GT; (c) unwrapped phase using U-Net; (d) unwrapped phase using DLPU-Net; (e) unwrapped phase using VUR-Net ; (f) unwrapped phase using PU-GAN; (g) unwrapped phase using U$^2$-Net; (h) unwrapped phase using U$^2$-Netp; (i) BEM between (c) and (b); (j) BEM between (d) and (b); (k) BEM between (e) and (b); (l) BEM between (f) and (b); (m) BEM between (g) and (b); (n) BEM between (h) and (b).

Download Full Size | PDF

Table 2 gives the M_MSE, M_PSNR, and M_SSIM of the PU results of the six networks on wrapped phase images with different Gaussian noise levels. We bolded the Top5 data in the three evaluation metrics. The table shows that U$^2$-Net and U$^2$-Netp perform better than the other models in terms of unwrapping accuracy, robustness against noise, and image structure stability. To present the experimental results more intuitively, we visualize the data in Table 2 as Fig. 6. From Fig. 6(a), we can see that the M_MSE of our proposed U$^2$-Net and U$^2$-Netp converges to the 0 axis, which is significantly lower than the other four networks. The difference in MSE is the most significant among the three evaluation metrics since the MSE calculation is based on pixel statistics and does not perform log function scaling similar to that in PSNR. In Fig. 6(b), the M_PSNR of U$^2$-Net and U$^2$-Netp has been stable between 60 and 70 (minimum value of 61.3391 and maximum value of 70.0816), which is much higher than the M_PSNR values of other networks. Compared with the results for the wrapped phase image with single noise in Fig. 5, the M_SSIM values of the PU results for all models in Fig. 6(c) are generally consistent. The M_SSIM values of individual models such as U-Net decrease significantly (minimum value is 0.9148 obtained and maximum value is 0.9610). The possible reason is that the plain convolutional blocks in training images with gradually increasing Gaussian noise cannot extract effective feature information directly from the noise-wrapped phases. Compared to the four models in which the M_SSIM value fluctuates, the two models proposed in this paper exhibit excellent structural stability, with M_SSIM values always converging to 1.

Fig. 6. Comparison of M_MSE, M_PSNR, and M_SSIM on Gaussian noise.

Download Full Size | PDF

Table 2. Comparison of M_MSE, M_PSNR, and M_SSIM for 1000 phase images wrapped with different levels of Gaussian noise under six models.

View Table | View all tables in this article

Noise	PU-GAN			U$^2$-Net			U$^2$-Netp
ratio	M_MSE	M_PSNR	M_SSIM	M_MSE	M_PSNR	M_SSIM	M_MSE	M_PSNR	M_SSIM
0.0250	3.9499	42.9384	0.9549	0.0130	67.6875	0.9998	0.0468	66.1069	0.9999
0.0500	2.7592	44.6987	0.9635	0.0166	69.8169	1.0000	0.0295	66.3512	0.9999
0.0750	3.2000	43.8430	0.9607	0.0131	67.6868	0.9998	0.0282	67.0780	0.9999
0.1000	2.9270	44.3523	0.9590	0.0305	69.5827	1.0000	0.0291	65.8617	0.9999
0.1250	2.8926	44.3112	0.9611	0.0641	63.2153	0.9997	0.0368	70.0816	0.9999
0.1500	2.2786	45.5378	0.9677	0.0160	69.1541	1.0000	0.0265	69.2116	1.0000
0.1750	3.8657	43.1470	0.9494	0.0661	62.9181	0.9999	0.0428	63.6302	0.9998
0.2000	2.9038	44.3677	0.9644	0.0593	61.3391	0.9998	0.0415	63.5990	0.9998

3.2.2 Anti-speckle noise test

The phase images unwrapped by U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp under multiplicative speckle noise interference are presented in Fig. 7. Figure 7(a) shows the phase-wrapping diagram including speckle noise. Figure 7(b) shows the corresponding GT. Figures 7(c)–7(h) show the unwrapped results using U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp, and their corresponding SSIM values are shown below. Affected by speckle noise, the unwrapped phases of U-Net and DLPU-Net have substantial phase errors at the edge of the wrapping, demonstrating that a single U-Net is insufficient to fit the statistical rule of speckle noise. The VUR-Net with varied residual structures exhibits lower PU errors and more structural stability. It is important to acknowledge that the three networks exhibit greater PU errors in images with lower phase heights while demonstrating lesser PU errors in images with higher phase heights. This substantiates the relationship between PU errors and phase heights in complex noise environments, established through specific experiments in the subsequent section. The PU results of PU-GAN with speckle noise interference are comparable to the Gaussian noise interference results above but with reduced PU errors. However, the image structure is unstable, and the SSIM value is low. In contrast, the U$^2$-Net and U$^2$-Netp proposed in this research have lower phase errors and greater SSIMs.

Fig. 7. Anti-speckle noise test. (a) Network input (noisy wrapped phase); (b) GT; (c) unwrapped phase using U-Net; (d) unwrapped phase using DLPU-Net ; (e) unwrapped phase using VUR-Net ; (f) unwrapped phase using PU-GAN; (g) unwrapped phase using U$^2$-Net; (h) unwrapped phase using U$^2$-Netp.

Download Full Size | PDF

Table 3 displays the M_MSE, M_PSNR, and M_SSIM of the six networks’ unwrapped outcomes on the phase-wrapping diagram with speckle noise. U-Net and DLPU-Net have larger phase errors, lower PSNR, and similar SSIM, as demonstrated in Table 3. The three metrics of VUR-Net have improved compared to the prior two networks, and the SSIM has even achieved 0.99. When combined with the data in Table 2, it can be concluded that the VUR-Net with residual structure has higher SSIM values regardless of the existence of Gaussian noise or speckle noise in the phase image, confirming the powerful feature extraction ability of the residual block. The phase error of PU-GAN and the proposed network in this paper are smaller, but as previously stated, PU-GAN still has the lowest SSIM and PSNR values. In summary, the method presented in this research is more accurate and superior to previous algorithms.

Table 3. Comparison of M_MSE, M_PSNR, and M_SSIM for 1000 phase images wrapped with speckle noise under six models.

View Table | View all tables in this article

3.3 High-level noise comparison experiment and network model comparison

In the previous section, it has been demonstrated that the phase height influences the performance of PU with noise during the noise immunity performance test in the complicated noise environment with additive Gaussian white noise and multiplicative speckle noise. As a result, this study generates new wrapped phase data by injecting high-level Gaussian noise with a standard deviation of 0.5 in two sets of absolute phase heights of 25 pixels and 35 pixels and then trains and tests six network models on this dataset. The test results of PU are shown in Fig. 8. The test results are shown in Fig. 8. Figures 8(a1)–8(a2) show the phase images of heights 25 pixels and 35 pixels wrapped by high-level noise. Figures 8(b1)–8(b2) show the corresponding GT. Figures 8(c1)–8(h1) show the unwrapped results of the six models for the phase image of height 35 pixels, and correspondingly, Figs. 10(c2)-10(h2) show the unwrapped results of the six models for the phase image of height 25 pixels, with the corresponding SSIM values below the image. The vertical comparison shows that for the phase image of height 35 pixels, the SSIM values of U$^2$-Net and U$^2$-Netp proposed in this work are the highest, followed by VUR-Net, under the interference of high-level noise, and the SSIM values of the other three models are relatively close. For a phase image with a height of 25 pixels, the SSIM values of U$^2$-Net and U$^2$-Netp are still the highest, followed by PU-GAN, VUR-Net is closer to DLPU-Net, and U-Net has the lowest value. In a horizontal view, different phase heights under high-level noise interference do not affect the unwrapping performance of the two models proposed in this work. In contrast, the SSIM values of all other models show a decline. Compared to phase height 35 pixels, the SSIM values of PU-GAN are higher at phase 25 pixels, and the SSIM values of U-Net, DLPU-Net, and VUR-Net are lower at phase 25 pixels. Figures 8(i1)–8(n1) show the BEMs of the above model at a height of 35 pixels, and similarly, Figs. 8(i2)–8(n2) show the BEMs of the above model at a height of 25 pixels, with the AUs shown below the images. Overall, all the above models have higher errors in PU at a height of 25 pixels. When the number of wrapped stripes is small, and the phase height is low, the depth models learn all the large feature areas that are not fine enough under the interference of high-level noise, and thus many mis-segmentations occur. However, except in Fig. 8(l2), where the BEMs of PU-GAN have the smallest errors, the U$^2$-Net and U$^2$-Netp unwrapping errors are the smallest at all other times, maintaining the best performance overall against high-level noise interference. In addition, we present the M_MSE, M_PSNR, and M_SSIM values of the unwrapped results of the above models for the test set of high-level noise in Table 4.

Fig. 8. Comparison of PU results of the involved models for heights of 25 and 35 under high-level noise. (a1) A wrapped phase image(h = 35); (a2) A wrapped phase image(h = 25); (b1) GT(h = 35); (b2) GT(h = 25); (c1)U-Net output(h = 35); (c2)U-Net output(h = 25); (d1)DLPU-Net output(h = 35); (d2)DLPU-Net output(h = 25); (e1)VUR-Net output(h = 35); (e2)VUR-Net output(h = 25); (f1)PU-GAN output(h = 35); (f2)PU-GAN output(h = 25); (g1)U$^2$-Net output(h = 35); (g2)U$^2$-Net output(h = 25); (h1)U$^2$-Netp output(h = 35); (h2)U$^2$-Netp output(h = 25); (i1) BEM between (c1) and (b1); (i2) BEM between (c2) and (b2); (j1) BEM between (d1) and (b1); (j2) BEM between (d2) and (b2); (k1) BEM between (e1) and (b1); (k2) BEM between (e2) and (b2); (l1) BEM between (f1) and (b1); (l2) BEM between (f2) and (b2); (m1) BEM between (g1) and (b1); (m2) BEM between (g2) and (b2); (n1) BEM between (h1) and (b1); (n2) BEM between (h2) and (b2).

Download Full Size | PDF

Table 4. Comparison of M_MSE, M_PSNR, and M_SSIM over the test set unwrapped by the U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp under high noise

View Table | View all tables in this article

Compared with the three metrics in Table 1, Table 2 and Table 3, Table 4 shows a relatively significant increase in M_MSE for U-Net and DLPU-Net, but little change in M_MSE values for the other four models. The M_PSNR of U-Net, DLPU-Net, and PU-GAN showed a large decrease, the M_PSNR of VUR-Net showed a significant increase, and the M_PSNR of the model proposed in this work has little change in M_PSNR. The M_SSIM values of U-Net, DLPU-Net, and PU-GAN all show different degrees of decline, the M_SSIM value of VUR-Net shows an increase, and the M_SSIM values of U$^2$-Net and U$^2$-Netp always remain closest to 1. Overall, the M_SSIM values of U$^2$-Net and U$^2$-Netp are least affected by the phase height under high-level noise interference. In addition, the PU parameters and times of the different models were statistically analyzed to evaluate the performance of the models further, as shown in Table 4.

Table 5. Comparison of the models in terms of computational space and time.

View Table | View all tables in this article

Table 5 shows the number of parameters, floating-point operations, and the time of PU for the six tested models. PU-GAN has the most significant number of parameters, and U$^2$-Net proposed in this work has the second-largest number of parameters. U$^2$-Net containing RSU residual blocks uses many stacked residual blocks, which can occupy a large amount of memory. The original U-Net contained many complex and redundant structures, resulting in excessive parameters. By incorporating partial residuals, the DLPU-Net effectively enhances the architecture of the U-Net while using fewer parameters. The most surprising is the lightweight model U$^2$-Netp proposed in this work, which occupies only a 1.13MB number of parameters to achieve a PU performance close to U$^2$-Net. Since the training optimization of adversarial networks may use fewer multiplications and additions, the minor floating-point operation is PU-GAN, followed by U$^2$-Netp. Except for U-Net, the floating-point functions of the other three models are close to each other. The PU time of all models is similar. U-Net is the simplest model, so the unwrapping time is the shortest, followed by PU-GAN and DLPU-Net, and U$^2$-Net and U$^2$-Netp proposed in this work have a large number of downsampling operations, but the PU time does not lag too much behind other models. The unwrapping time of U$^2$-Netp is even faster than that of VUR-Net with the introduction of residuals. The unwrapping time of the above models can be satisfied by combining them with the real-time requirements in practical applications.

Combined with the data in Table 1–Table 5, although PU-GAN has the least floating-point operations, the number of parameters is huge, and the anti-noise performance is average. Meanwhile, the image structure stability is poor due to the destruction of the feature structure of the phase during the model training. DLPU-Net has the fastest PU speed, but the number of model parameters is large, and the anti-noise performance is poor. Especially under high-level noise interference, the PU performance degrades more. U-Net has less number of model parameters and PU time. Still, the PU performance is the worst among all models, indirectly proving the importance of residual structure in PU. VUR-Net has the best PU performance and more stable model parameters except for the two models proposed in this work. Still, its unwrapping performance can be further improved [32]. Introducing the SE module or CBAM module can improve the unwrapping implementation of VUR-Net and increase the model’s computation. The weighted jump-edge attention mechanism has higher unwrapping performance and lower computational effort. However, there are some instabilities in the preprocessing operation of traditional thresholding algorithms that may affect the performance of model unwrapping. Therefore, by comparing various experimental data, the U$^2$-Net and U$^2$-Netp proposed in this work show optimal performance without preprocessing and attention modules. In particular, the lightweight U$^2$-Netp maintains an extremely high PU performance while occupying only a minimal memory space, providing a potential possibility to meet the real-time criteria in practical applications.

3.4 Testing on real samples

To test the generalization capability of the proposed network, the model trained from the simulated image set was used to test 421 wrapped phase images, including dynamic candle flames, different arrangements of pits, different shapes of grooves, and different shapes of tables provided in Ref. [38] and compare their phase unwrapped results, as shown in Fig. 9. Figure 9(a) shows the eight wrapped phase images extracted at frames 7, 22, 41, 114, 171, 265, 308 and 328. Figure 9(b) shows the corresponding GT. Figures 9(c)–9(h) show the unwrapped results using U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net, and U$^2$-Netp, respectively, and their corresponding SSIM values are shown below. Figure 9 indicates that the results of all model outputs do not differ much from GT, indirectly reflecting the excellent performance of deep learning-based PU in practical applications. Figures 9(i)–9(n) show the error map of the unwrapped results of the six networks with GT. From the variation of SSIM values, U-Net, DLPU-Net, and VUR-Net have higher SSIM values in the last five wiggle flame maps, while the values in the first three maps have their advantages and disadvantages. In the second low-height phase image, the U-Net algorithm demonstrates a higher SSIM value. It is plausible that the underlying reason for this phenomenon is the absence of residual structure in the U-Net architecture. Such a structure facilitates the model’s ability to learn and effectively utilize images with denser features. In the first chunked image, the unwrapped results of U-Net are poor, and the unwrapped results of DLPU-Net and VUR-Net with residuals are better. In contrast, DLPU-Net using a single residual is more suitable for phase images with clear chunking, and VUR-Net is suitable for phase images with less distinct features, such as the third one. The SSIM values of the PU-GAN unwrapped phase based on the generative model in Fig. 9(e) fluctuate widely, and the SSIM values in the third and fifth flame phase images of the same type have an enormous difference, which may be related to the adversarial training mode of the generative model. The images predicted by the generative model have a randomness affecting the segmentation results. The accuracy of the SSIM values of the two models proposed in this work is lower than the performance on simulation data but still better than the four models mentioned above. The first image in Fig. 9(h) has the lowest SSIM value of 0.9880. To reduce the number of parameters in the model, U$^2$-Netp keeps the number of channels in all the feature maps at 64, which results in some of the structural information being lost. It can also be seen from the error map in Figs. 9(i)–9(n) that U$^2$-Net and U$^2$-Netp have unwrapping errors only in the first and sixth images. The model has the strongest generalization and the most stable performance of PU compared with the results of PU of other models. Although the error maps of VUR-Net in Fig. 9(k) unwrapping in each flame phase image is less, it is still not as stable as the model in this work in terms of the overall PU performance, which is also verified in the data of Table 6. Table 6 shows the six models’ M_MSE, M_PSNR, and M_SSIM of the 421 real-phase unwrapped results.

Fig. 9. Generalization capability test. (a) Network input (wrapped phase of dynamic candle flame); (b) GT; (c) unwrapped phase using U-Net; (d) unwrapped phase using DLPU-Net; (e) unwrapped phase using VUR-Net ; (f) unwrapped phase using PU-GAN; (g) unwrapped phase using U$^2$-Net; (h) unwrapped phase using U$^2$-Netp; (i) error map between (c) and (b); (j) error map between (d) and (b); (k) error map between (e) and (b); (l) error map between (f) and (b); (m) error map between (g) and (b); (n) error map between (h) and (b).

Download Full Size | PDF

Table 6. Comparison of M_MSE, M_PSNR, and M_SSIM over the 421 real phase unwrapped results by the U-Net, DLPU-Net, VUR-Net, PU-GAN, U$^2$-Net and U$^2$-Netp.

View Table | View all tables in this article

4. Conclusion

This paper proposes to use U$^2$-Net with a nested U-shaped structure to solve the PU problem. The U$^2$-Net, consisting of 11 U-like RSU residual blocks, can go deeper in the model’s training without too much increasing the number of parameters. Meanwhile, the lightweight U$^2$-Netp requires only 1.13MB of memory to achieve an unwrapping performance close to U$^2$-Net. The stability, accuracy, and generalization of the network proposed in this paper in complex noise environments are verified by anti-Gaussian white noise and anti-speckle noise tests. In addition, combining the unwrapped results of real samples and model parameters of different deep networks further shows the excellent unwrapping performance and generalization ability of U$^2$Net. Therefore, the proposed in this work framework demonstrates considerable potential and a wide applicability with respect to PU.

Funding

National Natural Science Foundation of China (62275160).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. Aiello, D. Riccio, P. Ferraro, S. Grilli, L. Sansone, G. Coppola, S. De Nicola, and A. Finizio, “Green’s formulation for robust phase unwrapping in digital holography,” Opt. Lasers Eng. 45(6), 750–755 (2007). [CrossRef]

2. M. Jenkinson, “Fast, automated, N-dimensional phase-unwrapping algorithm,” Magn. Reson. Med. 49(1), 193–197 (2003). [CrossRef]

3. C. Zuo, J. Qian, S. Feng, W. Yin, Y. Li, P. Fan, J. Han, K. Qian, and Q. Chen, “Deep learning in optical metrology: a review,” Light: Sci. Appl. 11(1), 39 (2022). [CrossRef]

4. Y. Lan, H. Yu, Z. Yuan, and M. Xing, “Comparative Study of DEM Reconstruction Accuracy Between Single- and Multibaseline InSAR Phase Unwrapping,” IEEE Trans. Geosci. Remote Sensing 60, 1–11 (2022). [CrossRef]

5. H. Yu, Y. Lan, Z. Yuan, J. Xu, and H. Lee, “Phase Unwrapping in InSAR: A Review,” IEEE Geosci. Remote Sens. Mag. 7(1), 40–58 (2019). [CrossRef]

6. D. J. Bone, “Fourier fringe analysis: the two-dimensional phase unwrapping problem,” Appl. Opt. 30(25), 3627–3632 (1991). [CrossRef]

7. X. Su and W. Chen, “Reliability-guided phase unwrapping algorithm: A review,” Opt. Lasers Eng. 42(3), 245–261 (2004). [CrossRef]

8. M. Arevalillo-Herráez, F. R. Villatoro, and M. A. Gdeisat, “A Robust and Simple Measure for Quality-Guided 2D Phase Unwrapping Algorithms,” IEEE Trans. Image Process. 25(6), 2601–2609 (2016). [CrossRef]

9. H. P. Zhong, J. S. Tang, Z. Tian, and H. R. Wu, “Hierarchical quality-guided phase unwrapping algorithm,” Appl. Opt. 58(19), 5273–5280 (2019). [CrossRef]

10. J. Strand, T. Taxt, and A. K. Jain, “Two-dimensional phase unwrapping using a block least-squares method,” IEEE Trans. Image Process. 8(3), 375–386 (1999). [CrossRef]

11. J. M. Bioucas-Dias and G. Valadao, “Phase unwrapping via graph cuts,” IEEE Trans. Image Process. 16(3), 698–709 (2007). [CrossRef]

12. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep Convolutional Neural Network for Inverse Problems in Imaging,” IEEE Trans. Image Process. 26(9), 4509–4522 (2017). [CrossRef]

13. G. Dardikman, N. A. Turko, and N. T. Shaked, “Deep learning approaches for unwrapping phase images with steep spatial gradients: A simulation,” in IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE) (2018), pp. 1–4.

14. K. Wang, Y. Li, Q. Kemao, J. Di, and J. Zhao, “One-step robust deep learning phase unwrapping,” Opt. Express 27(10), 15100–15115 (2019). [CrossRef]

15. Y. Qin, S. Wan, Y. Wan, J. Weng, W. Liu, and Q. Gong, “Direct and accurate phase unwrapping with deep neural network,” Appl. Opt. 59(24), 7258–7267 (2020). [CrossRef]

16. G. E. Spoorthi, S. Gorthi, and R. K. S. S. Gorthi, “PhaseNet: A Deep Convolutional Neural Network for Two-Dimensional Phase Unwrapping,” IEEE Signal Process. Lett. 26(1), 54–58 (2019). [CrossRef]

17. G. E. Spoorthi, R. K. Sai Subrahmanyam Gorthi, and S. Gorthi, “PhaseNet 2.0: Phase Unwrapping of Noisy Data Based on Deep Learning Approach,” IEEE Trans. Image Process. 29, 4862–4872 (2020). [CrossRef]

18. G. Dardikman-Yoffe, D. Roitshtain, S. K. Mirsky, N. A. Turko, M. Habaza, and N. T. Shaked, “Phun-Net: Ready-to-use neural network for unwrapping quantitative phase images of biological cells,” Biomed. Opt. Express 11(2), 1107–1121 (2020). [CrossRef]

19. M. Xu, C. Tang, Y. Shen, N. Hong, and Z. Lei, “PU-M-Net for phase unwrapping with speckle reduction and structure protection in ESPI,” Opt. Lasers Eng. 151, 106824 (2022). [CrossRef]

20. J. Zhang, X. Tian, J. Shao, H. Luo, and R. Liang, “Phase unwrapping in optical metrology via denoised and convolutional segmentation networks,” Opt. Express 27(10), 14903 (2019). [CrossRef]

21. K. Yan, Y. Yu, T. Sun, A. Asundi, and Q. Kemao, “Wrapped phase denoising using convolutional neural networks,” Opt. Lasers Eng. 128, 105999 (2020). [CrossRef]

22. J. Zhang and Q. Li, “EESANet: edge-enhanced self-attention network for two-dimensional phase unwrapping,” Opt. Express 30(7), 10470 (2022). [CrossRef]

23. M. V. Perera and A. De Silva, “A joint convolutional and spatial quad-directional LSTM network for phase unwrapping,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (2020).

24. S. Park, Y. Kim, and I. Moon, “Automated phase unwrapping in digital holography with deep learning,” Biomed. Opt. Express 12(11), 7064–7081 (2021). [CrossRef]

25. G. Dardikman and N. T. Shaked, “Phase unwrapping using residual neural networks,” in OSA-The Optical Society Orlando, FL, United States (2018).

26. K. S. Vengala, N. Paluru, and G. R. Subrahmanyam, “3D deformation measurement in digital holographic interferometry using a multitask deep learning architecture,” J. Opt. Soc. Am. A 39(1), 167–176 (2022). [CrossRef]

27. K. Sumanth, V. Ravi, and R. K. Gorthi, “A Multi-Task Learning for 2D Phase Unwrapping in Fringe Projection,” IEEE Signal Process. Lett. 29, 797–801 (2022). [CrossRef]

28. C. Wu, Z. Qiao, N. Zhang, X. Li, J. Fan, H. Song, D. Ai, J. Yang, and Y. Huang, “Phase unwrapping based on a residual en-decoder network for phase images in Fourier domain Doppler optical coherence tomography,” Biomed. Opt. Express 11(4), 1760 (2020). [CrossRef]

29. S. Zhu, Z. Zang, X. Wang, Y. Wang, X. Wang, and D. Liu, “Phase unwrapping in ICF target interferometric measurement via deep learning,” Appl. Opt. 60(1), 10 (2021). [CrossRef]

30. Z. Zhao, B. Li, X. Kang, J. Lu, and T. Liu, “Phase unwrapping method for point diffraction interferometer based on residual auto encoder neural network,” Opt. Lasers Eng. 138, 106405 (2021). [CrossRef]

31. T. Zhang, S. Jiang, Z. Zhao, K. Dixit, X. Zhou, J. Hou, Y. Zhang, and C. Yan, “Rapid and robust two-dimensional phase unwrapping via deep learning,” Opt. Express 27(16), 23173 (2019). [CrossRef]

32. J. Zhao, L. Liu, T. Wang, X. Wang, X. Du, R. Hao, J. Liu, Y. Liu, and J. Zhang, “VDE-Net: a two-stage deep learning method for phase unwrapping,” Opt. Express 30(22), 39794 (2022). [CrossRef]

33. L. Zhou, H. Yu, V. Pascazio, and M. Xing, “PU-GAN: A One-Step 2-D InSAR Phase Unwrapping Based on Conditional Generative Adversarial Network,” IEEE Trans. Geosci. Remote Sensing 60, 1–10 (2022). [CrossRef]

34. J. Liu, Q. Wu, X. Sui, Q. Chen, G. Gu, L. Wang, and S. Li, “Research progress in optical neural networks: theory, applications and developments,” PhotoniX 2(1), 5 (2021). [CrossRef]

35. Y. Shu, J. Sun, J. Lyu, Y. Fan, N. Zhou, R. Ye, G. Zheng, Q. Chen, and C. Zuo, “Adaptive optical quantitative phase imaging based on annular illumination Fourier ptychographic microscopy,” PhotoniX 3(1), 1 (2022). [CrossRef]

36. K. Itoh, “Analysis of the phase unwrapping algorithm,” Appl. Opt. 21(14), 2470 (1982). [CrossRef]

37. S. Montresor and P. Picart, “Quantitative appraisal for noise reduction in digital holographic phase imaging,” Opt. Express 24(13), 14322 (2016). [CrossRef]

38. K. Wang, Q. Kemao, D. Jianglei, and J. Zhao, “Deep learning spatial phase unwrapping: a comparative review,” Adv. Photonics Nexus 1(1), 14001 (2022). [CrossRef]

39. X. Saining and T. Zhuowen, “Holistically-Nested Edge Detection,” arXiv, arXiv:1504.06375 (2015). [CrossRef]

40. H. Zunair and A. Ben Hamza, “Sharp U-Net: Depthwise convolutional network for biomedical image segmentation,” Comput. Biol. Med. 136, 104699 (2021). [CrossRef]

41. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in IEEE Computer Society, Boston, MA, United States (2015), pp. 1–9.

42. L. Jiang-Jiang, Q. Hou, C. Ming-Ming, J. Feng, and J. Jiang, “A Simple Pooling-Based Design for Real-Time Salient Object Detection,” arXiv, arXiv:1904.095692019). [CrossRef]

43. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778.

44. X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, “U²-Net: Going deeper with nested U-structure for salient object detection,” Pattern Recognit. 106, 107404 (2020). [CrossRef]

45. X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” (Microtome Publishing, Sardinia, Italy, 2010), pp. 249–256.

46. D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” (International Conference on Learning Representations, ICLR, San Diego, CA, United States, 2015).

Method	M_MSE	M_PSNR	M_SSIM
U-Net	16.8465	39.8731	0.9163
DLPU-Net	4.5489	52.7795	0.9892
VUR-Net	1.9723	62.0038	0.9972
PU-GAN	2.6773	43.9569	0.9714
U $^{2}$ -Net	0.0202	69.5247	1.0000
U $^{2}$ -Netp	0.0305	64.3850	0.9999

Noise	U-Net			DLPU-Net			VUR-Net
ratio	M_MSE	M_PSNR	M_SSIM	M_MSE	M_PSNR	M_SSIM	M_MSE	M_PSNR	M_SSIM
0.0250	10.1573	43.9267	0.9590	5.6073	51.0615	0.9774	2.0622	61.8379	0.9967
0.0500	9.7869	43.3172	0.9610	5.0713	52.9307	0.9853	1.2870	59.0166	0.9916
0.0750	9.7545	43.7334	0.9576	5.0601	52.9501	0.9854	1.3903	55.1591	0.9948
0.1000	14.7531	40.0675	0.9234	6.0235	49.4465	0.9761	0.9491	62.9472	0.9984
0.1250	12.1843	43.2618	0.9384	4.4238	50.0120	0.9872	1.8101	61.2463	0.9971
0.1500	17.4551	40.6564	0.9148	4.7883	52.7648	0.9895	1.0983	59.0210	0.9970
0.1750	13.1073	42.8180	0.9362	4.9725	51.2336	0.9835	1.3905	61.5540	0.9978
0.2000	17.3563	40.0274	0.9350	4.5844	51.6590	0.9812	2.1262	57.7142	0.9908

Method	M_MSE	M_PSNR	M_SSIM
U-Net	9.5971	42.0805	0.9784
DLPU-Net	7.9036	42.5845	0.9825
VUR-Net	3.4407	50.3690	0.9903
PU-GAN	0.0433	40.8835	0.9713
U $^{2}$ -Net	0.0349	62.3376	0.9956
U $^{2}$ -Netp	0.0589	61.8596	0.9971

Method	M_MSE	M_PSNR	M_SSIM
U-Net	12.7213	39.9381	0.9406
DLPU-Net	4.3745	48.6839	0.9822
VUR-Net	1.8459	59.5747	0.9968
PU-GAN	2.7491	43.8460	0.9715
U $^{2}$ -Net	0.0388	66.3059	0.9999
U $^{2}$ -Netp	0.0390	67.9907	0.9999

Method	Params(million)	FLOPs(G)	Running Time(ms)
U-Net	31.39	55.75	32.35
DLPU-Net	9.16	34.57	36.90
VUR-Net	26.12	35.26	40.43
PU-GAN	54.42	2.03	36.35
U $^{2}$ -Net	44.01	37.59	43.63
U $^{2}$ -Netp	1.13	12.69	38.36

Two-dimensional phase unwrapping based on U²-Net in complex noise environment

Abstract

1. Introduction

2. Proposed method

2.1 Data generation

2.2 Evaluation criteria

2.3 Structure of U$^2$-Net

2.3.1 Residual U-block

2.3.2 Structure of U$^2$-Net

2.3.3 Loss function

2.4 Application details

3. Experimental results and discussion

3.1 Feasibility and accuracy test

3.2 Anti-noise performance test

3.2.1 Anti-Gaussian white noise test

3.2.2 Anti-speckle noise test

3.3 High-level noise comparison experiment and network model comparison

3.4 Testing on real samples

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (9)

Tables (6)

Equations (9)

Optics Express

Method	M_MSE	M_PSNR	M_SSIM
U-Net	16.1221	40.6593	0.9236
DLPU-Net	4.3016	50.2582	0.9845
VUR-Net	2.4937	58.4415	0.9947
PU-GAN	2.1278	50.5380	0.9728
U $^{2}$ -Net	0.3554	63.6289	0.9989
U $^{2}$ -Netp	0.5394	61.9855	0.9982