Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Efficient and robust phase unwrapping method based on SFNet

Open Access Open Access

Abstract

Phase unwrapping is a crucial step in obtaining the final physical information in the field of optical metrology. Although good at dealing with phase with discontinuity and noise, most deep learning-based spatial phase unwrapping methods suffer from the complex model and unsatisfactory performance, partially due to simple noise type for training datasets and limited interpretability. This paper proposes a highly efficient and robust spatial phase unwrapping method based on an improved SegFormer network, SFNet. The SFNet structure uses a hierarchical encoder without positional encoding and a decoder based on a lightweight fully connected multilayer perceptron. The proposed method utilizes the self-attention mechanism of the Transformer to better capture the global relationship of phase changes and reduce errors in the phase unwrapping process. It has a lower parameter count, speeding up the phase unwrapping. The network is trained on a simulated dataset containing various types of noise and phase discontinuity. This paper compares the proposed method with several state-of-the-art deep learning-based and traditional methods in terms of important evaluation indices, such as RMSE and PFS, highlighting its structural stability, robustness to noise, and generalization.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Physical information in optical imaging and metrology systems cannot be directly acquired and needs to be calculated through phase information. For example, the three-dimensional information of an object surface in fringe projection profilometry (FPP) [1], the strain information in digital holographic interferometry (DHI) [2], the surface height or deformation degree of terrain in interferometric synthetic aperture radar (InSAR) [3,4], and the magnetization distribution or blood flow velocity in magnetic resonance imaging (MRI) [5,6] all require phase information extraction. However, in the process of obtaining phase information, the calculation of the arctangent function is involved. The initial phase is wrapped in the interval range of (-π, π] [79], with 2π discontinuous phase jumps. It needs to be unwrapped to obtain the physical information of the measured object. This process is called phase unwrapping. Phase unwrapping methods can be primarily classified into temporal phase unwrapping (TPU) methods and spatial phase unwrapping (SPU) methods.

TPU methods involve unwrapping phase by utilizing two or more phase maps acquired at different time instances with different frequencies. Numerous methods have been proposed for TPU [1013]. In the process of TPU, the phase of each point is only related to the corresponding point at the same location on different frequency phase maps and is independent of its neighboring points. The TPU methods utilize additional encoded patterns or information from phase maps of different frequencies to determine the stripe order. Because the temporal wrapped phase is unwrapped along the time sequence, errors do not propagate in space. The common methods mainly include the multi-frequency (hierarchical) method [14,15], multi-wavelength (heterodyne) method [16,17], and number-theoretical method [18,19]. With the development of deep learning, many methods utilizing neural networks for TPU have emerged [13,20]. The TPU methods are widely used in fringe projection profilometry (FPP) [20,21] and can effectively address the issue of phase surface discontinuity. Due to the need for a large number of fringe patterns, the measurement speed is slow.

Traditional SPU methods mainly include path-following methods and minimum-norm methods. Unlike TPU methods, SPU methods are mainly applied to address phase wrapping issues in a single two-dimensional wrapped phase image. The path-following methods primarily rely on setting an integration path to recover the continuous phase, including the branch-cut (BC) method [2224], the quality-guided (QG) method [25,26], the region-growing method [27], and the minimum discontinuity method [28]. The BC method unwraps phase by setting branch-cut lines, which has low computational complexity but is sensitive to noise. The QG method utilizes a quality map to weight the phase unwrapping of different pixels to improve the accuracy of the unwrapping. Finding a suitable integration path for phase unwrapping becomes challenging when the wrapped phase contains severe noise or phase discontinuity. The minimum-norm method converts the phase unwrapping problem into a global optimization problem, aiming to minimize the difference between the wrapped and true phase gradients. The least-squares (LS) method [29,30] is a typical representative. Xia et al. [31] proposed a robust method based on least-squares, iteration, and calibration of the 1st order spatial phase derivative (CPULSI) to suppress the influence of noise. They are prone to amplifying local noise and generate excessively smooth phase surfaces. In addition, Costantini [32] equated the phase unwrapping problem to finding the minimum-cost flow problem on a network. Carranza et al. [33] proposed a phase unwrapping method based on the transport of intensity equation (TIE) for noise suppression.

With the advancement of deep learning, methods for SPU based on convolutional neural networks (CNN) have been proposed. One approach is to treat phase unwrapping as a pixel-level semantic segmentation problem, where the wrapped phase φ(x, y) is used to estimate the wrap counts k(x, y), and then the true phase ψ(x, y) is computed by Eq. (1).

$$\psi ({x,y} )= \varphi ({x,y} )+ 2\pi \cdot k({x,y} )$$

Zhang et al. [34] proposed a phase unwrapping method based on CNN [35] to enhance robustness against noise. Zhang et al. [36] directly input the wrapped phase into the DeepLabV3 + network [37] to estimate the wrap counts. Subsequently, Li et al. [38] utilized DeepLabv3 + as the backbone and proposed a method based on spatial and channel attention networks. Spoorthi et al. [35,39] initially proposed the use of PhaseNet for phase unwrapping and further introduced PhaseNet2.0 [40] to enhance the anti-noise ability. Huang et al. [41] utilized the HRNet network [42] to enhance the resolution of the predicted results, but the computational cost is very high. Another approach based on CNN is to treat phase unwrapping as a regression problem, where a deep neural network is used to directly learn the mapping relationship between the wrapped phase and the true phase. Wang et al. [43] embedded ResNet [44] into the UNet [45] network to directly estimate the true phase from the wrapped phase. Since then, many methods based on UNet have been proposed [4648]. These methods do not strictly adhere to the numerical relationship described in Eq. (1), introducing errors for each pixel. Xu et al. [49] proposed an “M-shaped” MNet network model to enhance the utilization of initial features from the wrapped phase. He et al. [50] designed an UN-PUNet phase unwrapping network to mitigate the effects of noise and uneven grayscale. Although CNN-based phase unwrapping methods have achieved impressive results, their robustness to data noise and deformations is generally limited due to their local connectivity and weight-sharing convolutional structure.

In recent years, with the unique self-attention mechanism, the Transformer has been widely applied in various deep learning research areas such as object detection [51,52], image classification [53,54], semantic segmentation [55,56], and more. SPU methods based on transformer have been gradually proposed and achieved some results. Kuang et al. [57] used the Swin-ResUnet network based on Swin-Transformer [58] for segmenting the wrapped phase, achieving higher accuracy. Zhao et al. [59] also proposed a robust phase unwrapping method based on Swin-Transformer. Zhu et al. [60] designed a hybrid CNN-transformer (Hformer) model to enhance global dependency for fringe order prediction.

The aforementioned transformer-based models require positional encoding and can only output feature maps with a fixed resolution. These models have larger parameters, lower computational efficiency, and limited flexibility. Instead, SegFormer [61] is a simple, efficient, and powerful semantic segmentation framework. Unlike other models, SegFormer includes a hierarchical Transformer encoder that does not require positional encoding, allowing it to output multi-scale features. Additionally, SegFormer utilizes a simple MLP decoder to aggregate information from different layers, combining local and global attention mechanisms to present powerful representations. Inspired by SegFormer, we propose a novel SPU method based on SFNet. The proposed method is compared with three other deep learning-based SPU methods based on HRNet, DeepLabV3+, and UNet, as well as five traditional methods: BC, QG, LS, CPULSI, and TIE methods. The comparison is conducted on simulated datasets that included noise and phase discontinuity. Our main contributions can be summarized as follows:

  • 1) We first propose a novel Transformer-based SPU method, treating phase unwrapping as a semantic segmentation problem to recover the true phase from a single wrapped phase map.
  • 2) In the SFNet network architecture, Transformer-based encoders with different levels are employed to avoid the need for position encoding. Simultaneously, a lightweight MLP decoder processes feature maps from different layers through linear layers, upsamples them to 1/4 resolution, and performs fusion. This significantly reduces the computational burden and parameters of the decoder, enabling efficient model execution.
  • 3) In the case of noise, a simulated dataset randomly including different levels of Gaussian noise, speckle noise, and salt & pepper noise is generated to resemble real-world environments closely. It greatly improves the generalization and noise resistance capabilities of the model.
  • 4) We conduct an analysis and comparison of our SFNet-based method with three other deep learning-based methods and five classical methods using generated synthetic datasets under four different cases. The results highlight that our SFNet-based method outperforms others in terms of efficiency, accuracy, robustness, and generalization capability.

Here is the remaining content arrangement for this paper: Section 2 elaborates on the principle of the phase unwrapping method based on the SFNet network and the methodology for generating simulated datasets. In Section 3, a qualitative and quantitative comparative analysis is conducted between the SFNet-based method, three other deep learning-based methods, and five traditional methods. In Section 4, a comprehensive summary of the entire paper is provided.

2. Proposed method

The paper considers phase unwrapping as a pixel-level semantic segmentation problem. As shown in Fig. 1, the wrapped phase φ(x, y) is inputted into the self-built SFNet network for semantic segmentation, resulting in the output of wrap counts k(x, y). Afterward, the unwrapped phase ψ′(x, y) is obtained by solving Eq. (1). Then, the unwrapped phase is further refined to reduce misclassified pixels, obtaining a high-precision distribution of the true phase.

 figure: Fig. 1.

Fig. 1. The phase unwrapping method based on SFNet.

Download Full Size | PDF

2.1 SFNet architecture

The SFNet is built based on the improved SegFormer network, and the overall structure is shown in Fig. 2. The SFNet primarily consists of two modules: a hierarchical Transformer encoder without positional encoding and a lightweight fully connected multilayer perceptron (MLP) decoder named All-MLP. The model architecture follows a typical encoder-decoder structure. Given a wrapped phase image φ(x, y) with a size of H × W × 1, it is first divided into fixed-size patches. These patches are then fed into a hierarchical Transformer encoder to obtain multi-level feature maps. These feature maps are used as input to the All-MLP to predict segmentation masks. The size of the feature maps generated by the decoder is $\frac{H}{4} \times \frac{W}{4} \times {N_{cls}}$. Finally, the Upsample and Softmax operations are applied to obtain a wrap counts image k(x, y) with a size of H × W × 1.

 figure: Fig. 2.

Fig. 2. An illustration of the proposed SFNet network. (a) Architecture of the SFNet. It contains a hierarchical encoder without positional encoding and a lightweight fully connected All-MLP decoder. (b) Architecture of each SFNet block. It consists of overlap patch merging, efficient self-attention, and mix-FFN modules. (c) Architecture of the Mix-FFN. ‘FFN’ represents the feed-forward network. (d) Architecture of the MLP layer. It is used to aggregate information from the encoder.

Download Full Size | PDF

2.1.1 Encoder

The encoder of the SFNet consists of four SFNet Blocks with different hierarchical structures, as shown in Fig. 2(a). Each block includes three components: overlap patch merging, efficient self-attention, and Mix-FFN, as shown in Fig. 2(b).

Overlap patch merging is an image encoding method that merges overlapping patches into a complete image to restore integrity. First, the input wrapped phase image φ(x, y) is divided into multiple square patches with a fixed size of Pi, and there is a certain overlap region Li between adjacent patches. Subsequently, convolutional layers with a stride smaller than the kernel size are used to perform feature extraction and merging for each image patch. This paper employs an efficient self-attention mechanism, called Efficient Self-Attn, to reduce computational complexity. Efficient Self-Attn enables the SFNet model to weight and aggregate information from different positions in the input sequence, achieving global information interaction and integration. In other transformer-based models, the multi-head self-attention mechanism consists of Q, K, V with the same size dimensions N × C, where N = H × W is the length of the sequence. The computational complexity is O(${N^2}$) according to Eq. (2). Given a reduction ratio R leads to a dimension reduction to $\frac{N}{R} \times C$, thereby reducing the computational complexity to $O(\frac{{{N^2}}}{R})$. The reduction ratio and the number of attention heads in each module are denoted as Ri and ni, respectively.

$$\textrm{Attention}({Q,K,V} )= \textrm{Softmax}\left( {\frac{{Q{K^T}}}{{\sqrt {{d_{head}}} }}} \right)V$$

Mix-FFN introduces a different structure of a fully connected feed-forward network (FFN) in order to provide a more flexible way of feature extraction. Mix-FFN allows for different non-linear transformations to be applied at different positions, increasing the expressive power of the model. Figure 2(c) illustrates the specific calculation process of Mix-FFN. The expansion ratio of the MLP is denoted as ri. In this paper, we combine the Efficient Self-Attn and Mix-FFN modules to construct the SFNet encoder block. An overlap patch merging module is combined with Ni repetitions (depth) of the SFNet encoder block to form an SFNet block. The encoder of the SFNet consists of four SFNet blocks with different depth structures. These four modules together form the encoder of the SFNet. Hierarchical feature maps Fi are obtained by passing the wrapped phase image with a size of H × W × 1 through the four SFNet block modules. The size of the feature maps is $\frac{H}{{{2^{i + 1}}}} \times \frac{W}{{{2^{i + 1}}}} \times {C_i}$, where $i \in \{{1,2,3,4} \}$. The specific parameters configuration of the SFNet structure can be found in Table 1. We modify the number of channel dimensions in the hierarchical feature maps obtained from the multilayer encoder and adjust the depth of the four SFNet encoder blocks. These changes can make the network model to better suit SPU and improve the unwrapping accuracy.

Tables Icon

Table 1. The SFNet structure parameters

2.1.2 Decoder

As stated by Eq. (3) and shown in Fig. 2(d), the process of the All-MLP decoder in SFNet is as follows: First, the hierarchical feature maps Fi obtained from the multilayer encoder are passed through an MLP layer to unify the channel dimension. A linear layer is used to transform the non-hierarchical channel dimension Ci into C, obtaining feature maps Fi′ with a size of $\frac{H}{{{2^{i + 1}}}} \times \frac{W}{{{2^{i + 1}}}} \times C$. Next, the feature maps Fi′ are up-sampled to the initial size of $\frac{1}{4}$, i.e., $\frac{H}{4} \times \frac{W}{4} \times C$. Then, a concatenation operation is performed along the channel dimension, obtaining feature map F′′ with a channel size of 4C. Afterward, an MLP layer is used to fuse the concatenated feature maps, obtaining feature map F′′′ with the channel size of C. Finally, another MLP layer takes the fused features F′′′ as input and predicts the segmentation mask M, with a size of $\frac{H}{4} \times \frac{W}{4} \times {N_{cls}}$, where Ncls is the number of classes. The segmentation mask M is processed through a Softmax function to obtain an output of wrap counts image k(x, y) with a size of H × W × 1.

$$\begin{aligned} {F_i}{^\prime} &= \textrm{Linear}({{C_i},C} )({{F_i}} )\\ {F_i}{^{\prime \prime }} &= \textrm{Upsample}\left( {\frac{H}{4} \times \frac{W}{4}} \right)({{F_i}{^\prime }} )\\ F^{\prime\prime} &= \textrm{Concat}({{F_i}{^{\prime \prime }}} )\\ F^{\prime\prime}{^{\prime}} &= \textrm{Linear}({4C,C} )({F^{\prime\prime}} )\\ M &= \textrm{Linear}({C,{N_{cls}}} )({F^{\prime\prime}{^{\prime}}} )\end{aligned}$$

2.2 Refinement step

The issue of incorrectly segmenting pixels at the boundaries between adjacent classes is a challenge that almost all semantic segmentation methods face. The unwrapped phase obtained through the SFNet already has a high resolution, eliminating the occurrence of concentrated pixel misclassification in large areas. However, there may inevitably be some misclassification of pixels at the edges, as seen in the local magnified portion of the unwrapped phase in Fig. 1. Refining the unwrapped phase through appropriate post-processing techniques can significantly improve the accuracy and precision of the final phase.

$$\psi ({x,y} )= \textrm{Refine}({\psi^{\prime}({x,y} )} )$$

In this paper, a local Laplace filter is used to filter out error pixels in the unwrapped phase. For each error pixel e0(x, y), the Laplace operator is computed by convolving a kernel of size 3 × 3 around it, as shown in Fig. 3.

$$\Delta = \sum\limits_{i,j} {[{{e_0}({x + i,y + j} )- {e_0}({x,y} )} ]}$$
where $i,j \in [ - 1,0,1]$, and they are not all equal to zero. After filtering, the value of the error pixel is updated as follows:
$${e_0}({x,y} )= {e_0}({x,y} )+ 2\pi \times \textrm{Round}\left( {\frac{\Delta }{{8 \times 2\pi }}} \right)$$
where Round() means the value is rounded to the nearest integer.

 figure: Fig. 3.

Fig. 3. The process of refinement.

Download Full Size | PDF

2.3. Dataset generation

Acquiring a real and effective dataset is crucial for successfully training the network. However, obtaining a sufficient dataset of the true phase and corresponding wrap counts is challenging due to various factors, such as noise interference in the real world. Using simulated phase data to train the network is indeed an effective approach. Herein, we adopt the random matrix enlargement (RME) [43,62] method to generate simulated phase data in terms of generalization. As shown in Fig. 4, the process of the RME method is as follows:

  • i. Generate an initial square matrix with a random size, where the data follows a Gaussian distribution.
  • ii. Enlarge the initial matrix to the desired size H × W using the nearest neighbor interpolation method, ensuring the continuity of the phase surface.
  • iii. Linearly map the enlarged matrix to a true phase ψ(x, y) with a larger phase height h. The phase variation range is specified as [0, h].

 figure: Fig. 4.

Fig. 4. The RME method for generating phase data.

Download Full Size | PDF

After obtaining the true phase, we can calculate the corresponding wrapped phase by Eq. (7):

$$\varphi ({x,y} )= \textrm{angle}({\textrm{exp} ({1\textrm{i} \times \psi ({x,y} )} )} )$$
where 1i is the imaginary unit, and angle() wraps the true phase within the range of (-π, π]. Once the true and wrapped phases are obtained, the wrap counts k(x, y) is calculated according to Eq. (8), with the total number of classes denoted as Ncls.
$$k({x,y} )= \textrm{Round}\left( {\frac{{\psi ({x,y} )- \varphi ({x,y} )}}{{2\pi }}} \right)$$

In this paper, the main focus is on generating the following four types of datasets to analyze and test the performance of the SFNet:

  • i. The ideal case: We first generate the true phase without noise or discontinuity to analyze the performance in the ideal case. The initial square matrix size is varied within the range of 2 × 2 to 10 × 10. The phase height h is set to vary within the range of [20, 130] radians. The total number of classes for the corresponding wrap counts k is Ncls = 23 (an integer between 0 and 22).
  • ii. The discontinuous case: A randomly placed rectangular region is introduced in the ideal case of the true phase to analyze the performance in the case of phase discontinuity. This rectangular region has a random size ranging from 40 × 40 to 100 × 100 and a phase height of π or 2π.
  • iii. The noisy case: Different types and levels of noise, including Gaussian noise with a standard deviation of [0, 0.5], speckle noise with a standard deviation of [0, 0.5], and salt & pepper noise with a density of [0, 0.5] are randomly added to the wrapped phase in the ideal case to test the noise robustness. It is ensured that the signal-to-noise ratio (SNR) of the wrapped phase remains SNR ≥ -3 dB.
  • iv. The mixed case: We consider all the above cases to analyze the performance in the mixed case. The wrapped phase φ will serve as the input to the network, while the wrap counts k will be used as the ground truth.

2.4 Implementation details

The cross-entropy loss function is used to calculate the difference between the predicted results and the true labels. By minimizing the cross-entropy loss, the segmentation accuracy of the model can be improved. Equation (9) represents the expression of the cross-entropy loss function. For each pixel in the image, the cross-entropy loss is computed by comparing the predicted result p(x, y) of the network with the ground truth g(x, y). Then, the losses of all pixels in the entire image are averaged to obtain the final loss value LCE. The purpose of network training is to optimize each parameter in the network structure and continuously reduce the loss value LCE to its minimum. In the SFNet model, a lower loss value LCE, closer to 0, is desired to indicate better performance. It is also important to prevent overfitting.

$${L_{CE}} ={-} \frac{1}{{H \times W}}\sum\limits_{x = 1}^H {\sum\limits_{y = 1}^W {g({x,y} )\log ({p({x,y} )} )} }$$

A total of 38,000 pairs of wrapped phase and corresponding wrap counts with a size of 256 × 256 are generated for training the network for each case. In addition, 2,000 pairs of wrapped phase and wrap counts with the same size are generated for testing. 5% of the training dataset is set aside for validation. Due to the large scale of the training dataset, using 5% of the dataset (i.e., 1900 pairs) is sufficient to evaluate the network. Furthermore, we increase the proportion of the validation set to 20% and train the four deep learning-based methods in noisy and mixed cases. After testing on the same test dataset, we obtained consistent patterns with the experimental results presented in this paper. Therefore, the proportion of the validation dataset has minimal impact on the experimental results of this study. The AdamW optimizer with a momentum of 0.9 and a weight decay coefficient of 0.01 is chosen to optimize the parameters. The learning rate adjustment strategy employed in this paper is the cosine annealing decay method. Throughout the entire training process, the learning rate starts at 5e-6 and rapidly increases to 5e-5. After the warm-up stage, the learning rate gradually decreases to 3e-7. The SFNet is implemented using the PyTorch 1.13 framework based on Python 3.8 and the batch size is set to 64. We train the SFNet network for 550 epochs, which takes approximately 12 hours. The training is conducted on a workstation equipped with two Intel Xeon Gold 6326 CPUs (2.90 GHz) and four NVIDIA GeForce RTX 4090 GPUs, with a RAM of 256 GB.

3. Results

To accurately evaluate the phase unwrapping results of the four deep learning-based methods and five traditional methods, the following evaluation indices were employed:

  • i. RMSE and RMSEsd: the root mean squared error (RMSE) is used to measure the error between the unwrapped phase and the true phase, and the root mean squared error standard deviation (RMSEsd) is used to assess the stability and variability of RMSE.
  • ii. PFS: the proportion of failed samples (PFS) refers to the ratio of the number of failed samples to the total number of samples in the test dataset. A failed sample is one for which there is at least one pixel with an absolute error larger than π.
  • iii. PSNR: the peak signal-to-noise ratio (PSNR) represents the ratio between the peak signal power in the unwrapped phase and the average noise power.
  • iv. SSIM: the structural similarity index (SSIM) measures the similarity between the unwrapped phase and the true phase in terms of luminance, contrast, and structure.
  • v. CC: the correlation coefficient (CC) reflects the strength and direction of the linear relationship between the unwrapped phase and the true phase.
  • vi. AU: the accuracy of unwrapping (AU) represents the proportion of correctly unwrapped pixels out of the total number of pixels.

We utilize the aforementioned indices in the evaluation to comprehensively assess nine different phase unwrapping methods under each case. We further validate these methods using real-world datasets to highlight the efficiency, accuracy, and robustness of the SFNet-based method proposed in this paper. The compared deep learning networks, including HRNet, DeepLabV3+, and UNet, are trained using the same dataset. The evaluation results for each case are tested on the same test dataset consisting of 2000 pairs. The calculation of each evaluation metric is based on the average value.

3.1 In the ideal case

After training, the phase unwrapping results of the four deep learning-based methods and five traditional methods, in the ideal case, are shown in Table 2 and Fig. 5. The values in Table 2 represent the average evaluation results of the test samples. All eight methods can accurately unwrap the wrapped phase in the ideal case, except for UNet. Traditional phase unwrapping methods might even achieve more precise results compared to deep learning-based methods because they can directly solve the absolutely ideal wrapped phase using clear mathematical models and algorithms. The PFS values of almost zero for five traditional methods indicate that they can obtain almost completely correct results in the ideal case. From the results in the last three columns of Table 2 and the error distribution in Fig. 5, it can be seen that the structure and shape of the unwrapped phase are well protected.

 figure: Fig. 5.

Fig. 5. Results in the ideal case.

Download Full Size | PDF

Tables Icon

Table 2. Evaluation results for each phase unwrapping method in the ideal case

Among the four deep learning-based methods, three semantic segmentation networks achieve better results in SSIM, CC, and AU, indicating that this type of method has higher unwrapping accuracy in the ideal case and stronger protection ability for phase structure and shape. From the error distribution in Fig. 5, it can be seen that when the phase surface changes greatly, some misclassified pixels with an absolute error value of 2π appear in the unwrapped phase based on semantic segmentation methods. Among them, the error distribution calculated by SFNet has the least error points and is relatively scattered, followed by HRNet. In contrast, some small-area concentrated error points appear in the error distribution calculated by DeepLabV3+, and the UNet-based method even performs worse. For each method, the error gradually increases with the increase of the phase height.

Overall, the performance of the phase unwrapping method based on SFNet is the best. Although the RMSE value is not much different from that of HRNet, its distribution is more uniform. In the samples tested using the SFNet, 10% of the results failed, while using HRNet resulted in 28.5%. In the ideal case, deep learning-based methods cannot achieve better results than traditional methods because the physical model and rules of the phase unwrapping problem cannot be completely learned from the limited diversity of the training dataset. That is also the reason why the PFS values of deep learning-based methods are higher than traditional methods. The phase unwrapping results based on DeepLabV3 + are worse than HRNet. Regarding the UNet-based method, at least half of the test samples failed to be successfully unwrapped (PFS is 54.6%), and other evaluations also performed the worst.

Figure 6 shows the changes in RMSE and PFS values before and after the refinement operation to verify the necessity and importance of refinement operation in semantic segmentation network-based methods. The unwrapping results of the three methods after the refinement operation are obviously better than before. This is because misclassification causes some pixels at the edge to have an absolute change greater or equal to 2π, and thus are identified as failed samples. Refinement operation can accurately handle misclassified pixels, making the phase unwrapping results more accurate, and significantly reducing errors.

 figure: Fig. 6.

Fig. 6. The values of RMSE and PFS before and after the refinement operation.

Download Full Size | PDF

3.2 In the discontinuous case

The evaluation results in the discontinuous case are shown in Table 3 and Fig. 7. It is evident that the deep learning-based methods show a stronger phase unwrapping ability when the wrapped phase has discontinuity. From the SSIM, CC, and AU evaluation results and the error distribution in Fig. 7, it can be seen that the shape and structure of the unwrapped phase obtained by traditional methods are severely damaged. The phase unwrapping accuracy is poor, and the PFS values are close to 100%, indicating that almost all test samples fail to be unwrapped. Deep learning-based methods achieve relatively correct results, maintaining a stronger robustness.

 figure: Fig. 7.

Fig. 7. Results in the discontinuous case.

Download Full Size | PDF

Tables Icon

Table 3. Evaluation results for each phase unwrapping method in the discontinuous case

The method based on the SFNet network achieves the best values for almost all indices, and the PFS is nearly 20% less than that of HRNet. From the SSIM, CC, and AU evaluation results, it can be seen that the deep learning-based methods based on semantic segmentation networks can protect the stability of the unwrapped phase structure very well. Due to the discontinuity, the chance of misclassified pixels at the edges of the rectangular region in the wrapped phase increases. Compared with UNet, the PFS values of the three methods have increased. Except for PFS, the UNet-based method has the worst results in all other indices. It can also be verified from the error distribution in Fig. 7. The misclassified pixels of the semantic segmentation network-based methods are mostly concentrated at the edges of discontinuous regions, while pixels in other locations are hardly affected. Regarding the UNet-based method, the results are affected by discontinuity and have large errors in areas where the phase surfaces are more complex.

The error distributions in Fig. 7 show that the LS method errors propagate due to phase discontinuity, resulting in an overly smooth phase surface and a large deviation from the true phase. Neither CPULSI nor TIE methods can accurately identify discontinuous positions, and there are obvious rectangular areas in the error distribution. The BC method propagates the phase errors generated after encountering a discontinuous area, resulting in a wire-drawing phenomenon. The QG method achieves relatively good results among the five methods, but its ability to identify discontinuous regions also decreases when the phase surface is complex.

3.3 In the noisy case

We randomly add Gaussian noise, speckle noise, and salt & pepper noise of different levels to the wrapped phase in order to better simulate the noise in the real world. The results in the noise case are shown in Table 4 and Fig. 8. The deep learning-based methods are significantly better than traditional methods when the noise is heavy. The error distribution in Fig. 8 shows that traditional methods almost fail because severe noise makes the phase information unclear. The indices in Table 4 indicate that traditional methods have poor robustness and perform poorly in structure and accuracy.

 figure: Fig. 8.

Fig. 8. Results in the noisy case.

Download Full Size | PDF

Tables Icon

Table 4. Evaluation results for each phase unwrapping method in the noisy case

Deep learning-based methods not only have strong anti-noise performance but also can protect the structure and shape from being damaged by noise. Among the four deep learning-based methods, the SFNet achieves the best results. The results show that the unwrapped phase obtained by SFNet has the smallest RMSE value, whose distribution is uniform. In addition, the method based on SFNet achieves the most successful samples (PFS is 23%), which is about 17% higher than that based on HRNet. The DeepLabV3 + and UNet-based methods account for about 60% of PFS. The PSNR values show that the noise robustness of the SFNet-based method is the strongest, and the UNet is the weakest. Combining the results of the SSIM, CC, and AU in Table 4 and the error distribution in Fig. 8, we can see that the structure and accuracy of the unwrapped phase obtained by SFNet have reached the best state, followed by HRNet, the DeepLab3+, and the UNet.

When the noise level is low, the TIE method can achieve relatively good results among the five traditional methods. The CPULSI and LS methods produce a smoothing effect due to the noise. The presence of noise for the BC method will cause phase errors to propagate along the integration path. The QG method is superior to the BC method.

As shown in Fig. 9, further analysis of the performance in different noise levels is conducted by varying the SNR of the wrapped phase from -3 dB to 16 dB. Based on our observations, we have the following opinions:

  • i. The RMSE values of all methods decrease with the increase of SNR, and the method based on SFNet continues to achieve the lowest value. The RMSE of both SFNet and HRNet decay to a stable value first, but the fluctuation of RMSE for HRNet is larger, which verifies the reliability of the RMSEsd results in Table 4.
  • ii. When the SNR of the wrapped phase is >12 dB, all methods achieve the most satisfactory results, and the RMSE values are almost close to 0. However, the final RMSE value based on UNet is significantly larger than others.
  • iii. When the SNR of the wrapped phase is <4 dB, the RMSE of the UNet-based method, two least squares methods, and two path-following methods begin to increase significantly. The RMSE value of UNet is always lower than the other four traditional methods. The maximum increase in RMSE values for two path-following methods shows the worst noise robustness, followed by least squares methods.
  • iv. When the SNR of the wrapped phase is <0 dB, the RMSE values of three semantic segmentation network-based methods and the TIE method begin to show significant differences. As shown in Fig. 9, the method based on SFNet still achieves the smallest value when the SNR is -3 dB, while the RMSE values of other methods are significantly larger than 3.

 figure: Fig. 9.

Fig. 9. The RMSE results in different noise levels.

Download Full Size | PDF

3.4 In the mixed case

The results in the mixed case are shown in Table 5 and Fig. 10. They show that deep learning-based methods are significantly better than traditional methods when there is noise and discontinuity in the wrapped phase. Due to the impact of noise and phase discontinuity, the phase structure of the unwrapped phase obtained by traditional methods is severely damaged, and the stability and robustness are poor. Compared with the test performance in other cases, the results obtained by deep learning-based methods have declined to some extent but still maintain a certain structural stability and strong noise robustness.

 figure: Fig. 10.

Fig. 10. Results in the mixed case.

Download Full Size | PDF

Tables Icon

Table 5. Evaluation results for each phase unwrapping method in the mixed case

The SFNet achieves the best results in almost all evaluation indices among the four deep learning-based methods. The PFS is 58.95%, which is nearly 12% lower than HRNet, and the PSNR is at least 8% higher than HRNet. The phase unwrapping results obtained by DeepLabV3 + are even worse than those of HRNet. The method based on UNet achieves the worst results. Its poor noise resistance and low unwrapping accuracy can be inferred from the PSNR and AU values. Regarding the traditional methods, similar conclusions can be drawn in the discontinuous case when the noise level is low.

We calculate the computation time of a single wrapped phase for four deep learning-based methods on a single GPU (NVIDIA GeForce RTX4090) and five traditional methods on a CPU (Intel Core i5-12400) to test the operating efficiency of all methods. The last column of Table 2 – Table 5 shows that the SFNet-based method is at least 2.5 times faster than HRNet, and the time difference is tiny compared with the other two deep learning-based methods. In addition, we record the model parameters of the four deep learning-based methods in Table 6. From Table 6, it can be seen that the SFNet network has the smallest number of parameters, due to the fact that the SFNet structure uses a hierarchical encoder without positional encoding and a decoder based on a lightweight fully connected multilayer perceptron, which significantly reduces the number of parameters that need to be learned. Due to the adoption of a parallel processing approach, the HRNet network retains high-resolution features and low-resolution features simultaneously in each stage and processes them through multiple branches. As a result, the HRNet network has a large number of parameters. In the DeepLabV3 + network, the Atrous Spatial Pyramid Pooling (ASPP) module captures multi-scale information by using parallel convolutions with different dilation rates, which increases the complexity and number of parameters. The UNet network effectively reduces the number of model parameters by introducing ResNet, which has shared parameters. Therefore, it is safe to say that the SFNet network has a simple structure and a higher operating efficiency.

Tables Icon

Table 6. Size of the deep learning-based model parameters

3.5 Test on real datasets

Each of the cases described earlier is analyzed and tested on a simulated dataset. The evaluation results indicate that the phase unwrapping method based on SFNet achieves optimal performance. As shown in Fig. 11, several real samples, including different arrangements of pits and different shapes of grooves, are tested to evaluate the effectiveness and generalization of the proposed method. From the results, it can be observed that the method based on SFNet can also achieve the best performance for unfamiliar real datasets. The RMSE for each test sample is minimized, and the PSNR is the highest, ensuring excellent preservation of the surface structure of the unwrapped phase. The HRNet-based method achieves slightly worse results than SFNet, while the number of error points is significantly higher than SFNet. When using both DeepLabV3 + and UNet networks, poor results are obtained, especially in cases where the phase height is large and the phase surface is complex. Among the five traditional methods, the TIE method achieves favorable results due to its robust noise resistance. Other traditional methods, especially the minimum-norm and the path-following methods, are more susceptible to noise interference.

 figure: Fig. 11.

Fig. 11. Test results on real datasets

Download Full Size | PDF

We have performed 3D reconstruction on a facial contour image, and the results are shown in Fig. 12. From the results, we can see that the phase unwrapping method based on SFNet has the best 3D reconstruction result. It can effectively recover the three-dimensional surface information of the object, especially in regions with complex phase surface variations such as the nose and mouth. The HRNet method is ranked second, while the other two methods yield poorer results.

 figure: Fig. 12.

Fig. 12. The 3D surface reconstruction results.

Download Full Size | PDF

We also apply the proposed phase unwrapping method to single-shot 3D imaging. In Ref. [20], a randomly selected test image is taken from the wrapped phase images obtained by the CNN1 network. It is cropped to fit the size required by our network model. We first input the wrapped phase obtained from the single-shot 3D imaging process into the optical intensity transport equation to obtain the unwrapped phase, which is then used as the ground truth. Next, the wrapped phase is fed into four different deep learning networks: SFNet, HRNet, DeepLabV3+, and UNet. The test results are shown in Fig. 13. From Fig. 13, it can be observed that all four deep learning-based methods produce errors when the phase height has large numerical values. However, the proposed method based on SFNet achieves the lowest error, while the other three methods generate a larger area of erroneous pixels, which disrupts the phase profile. This demonstrates that our proposed method can be successfully used for single-shot 3D imaging.

 figure: Fig. 13.

Fig. 13. Test results in single-shot 3D imaging.

Download Full Size | PDF

4. Discussion and conclusion

The proposed method based on SFNet is built upon a semantic segmentation model, and the ambiguous classification of pixels at class boundaries is a common challenge faced by almost all semantic segmentation methods. The first step towards addressing this issue is to improve the network architecture by introducing spatial attention mechanisms or context-aware modules to better capture the contextual relationships of objects, thereby obtaining more accurate semantic segmentation results. Additionally, post-processing of the unwrapped results is a highly effective approach to filter out misclassified pixels. The proposed method proves particularly beneficial when dealing with phases of small height and relatively simple phase surfaces, as it significantly reduces misclassification errors. In cases where the phase surface exhibits discontinuity or complex variations, it is also advisable to consider data augmentation techniques during training to enhance robustness against deformations.

This paper proposes a novel and lightweight phase unwrapping method based on the SFNet network. The SFNet consists of a position-encoding-free hierarchical Transformer encoder and a lightweight All-MLP decoder. A refinement method based on a local Laplace filter is employed to further improve the accuracy and precision of the final phase. The overall structure is straightforward and highly efficient. This paper trains and tests the SFNet and several other state-of-the-art networks on a dataset with phase discontinuity and noise interference constructed by the RME method. The SFNet-based phase unwrapping method is thoroughly benchmarked with deep learning-based and classical methods. The results indicate that the proposed method exhibits greater structural stability and noise robustness, outperforming its counterparts. It exhibits strong generalization ability on previously unknown real datasets and proves itself as a potential candidate for phase unwrapping applications that require excellent accuracy, efficiency, and robustness.

Funding

Natural Science Basic Research Program of Shaanxi Province (2020JQ-199); National Postdoctoral Program for Innovative Talents (BX20200279); Equipment Development Department Rapid Support Project (80917020109-1); Natural Science Foundation of Ningbo Municipality (202003N4062); Natural Science Foundation of Zhejiang Province (LY23F040002); National Natural Science Foundation of China (62004166).

Disclosures

All of the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

1. X. He and Q. Kemao, “A comparative study on temporal phase unwrapping methods in high-speed fringe projection profilometry,” Opt. Lasers Eng. 142, 106613 (2021). [CrossRef]  

2. R. G. Waghmare, P. R. Sukumar, G. S. Subrahmanyam, et al., “Particle-filter-based phase estimation in digital holographic interferometry,” J. Opt. Soc. Am. A 33(3), 326–332 (2016). [CrossRef]  

3. H. Yu, Y. Lan, Z. Yuan, et al., “Phase unwrapping in InSAR: A review,” IEEE Geosci. Remote Sens. Mag. 7(1), 40–58 (2019). [CrossRef]  

4. W. Xu and I. Cumming, “A region-growing algorithm for InSAR phase unwrapping,” IEEE Trans. Geosci. Remote Sens. 37(1), 124–134 (1999). [CrossRef]  

5. Z.-P. Liang, “A model-based method for phase unwrapping,” IEEE Trans. Med. Imaging 15(6), 893–897 (1996). [CrossRef]  

6. M. Hedley and D. Rosenfeld, “A new two-dimensional phase unwrapping algorithm for MRI images,” Magn. Reson. Med. 24(1), 177–181 (1992). [CrossRef]  

7. K. Itoh, “Analysis of the phase unwrapping algorithm,” Appl. Opt. 21(14), 2470 (1982). [CrossRef]  

8. J. Tribolet, “A new phase unwrapping algorithm,” IEEE Trans. Acoust., Speech, Signal Process. 25(2), 170–177 (1977). [CrossRef]  

9. Q. Xiao, S. Wu, Y. Wang, et al., “Error analysis and realization of a phase-modulated diffraction grating used as a displacement sensor,” Opt. Express 31(5), 7907–7921 (2023). [CrossRef]  

10. J. M. Huntley and H. Saldner, “Temporal phase-unwrapping algorithm for automated interferogram analysis,” Appl. Opt. 32(17), 3047–3052 (1993). [CrossRef]  

11. H. O. Saldner and J. M. Huntley, “Temporal phase unwrapping: application to surface profiling of discontinuous objects,” Appl. Opt. 36(13), 2770–2775 (1997). [CrossRef]  

12. H. An, Y. Cao, H. Wu, et al., “Spatial-temporal phase unwrapping algorithm for fringe projection profilometry,” Opt. Express 29(13), 20657–20672 (2021). [CrossRef]  

13. W. Yin, Q. Chen, S. Feng, et al., “Temporal phase unwrapping using deep learning,” Sci. Rep. 9(1), 20175 (2019). [CrossRef]  

14. X. Peng, Z. Yang, and H. Niu, “Multi-resolution reconstruction of 3-D image with modified temporal unwrapping algorithm,” Opt. Commun. 224(1-3), 35–44 (2003). [CrossRef]  

15. H. Zhao, W. Chen, and Y. Tan, “Phase-unwrapping algorithm for the measurement of three-dimensional object shapes,” Appl. Opt. 33(20), 4497–4500 (1994). [CrossRef]  

16. J. Wyant, “Testing aspherics using two-wavelength holography,” Appl. Opt. 10(9), 2113–2118 (1971). [CrossRef]  

17. C. Polhemus, “Two-wavelength interferometry,” Appl. Opt. 12(9), 2071–2074 (1973). [CrossRef]  

18. J. Zhong and M. Wang, “Phase unwrapping by lookup table method: application to phase map with singular points,” Opt. Eng. 38(12), 2075–2080 (1999). [CrossRef]  

19. Q. Lu, Q. Xiao, C. Liu, et al., “Inverse design and realization of an optical cavity-based displacement transducer with arbitrary responses,” Opto-Electron. Adv. 6(3), 220018 (2023). [CrossRef]  

20. J. Qian, S. Feng, T. Tao, et al., “Deep-learning-enabled geometric constraints and phase unwrapping for single-shot absolute 3D shape measurement,” APL Photonics 5(4), 046105 (2020). [CrossRef]  

21. H. An, Y. Cao, H. Li, et al., “Temporal phase unwrapping based on unequal phase-shifting code,” IEEE Trans. on Image Process. 32, 1432–1441 (2023). [CrossRef]  

22. J. De Souza, M. Oliveira, and P. Dos Santos, “Branch-cut algorithm for optical phase unwrapping,” Opt. Lett. 40(15), 3456–3459 (2015). [CrossRef]  

23. B. Gutmann and H. Weber, “Phase unwrapping with the branch-cut method: role of phase-field direction,” Appl. Opt. 39(26), 4802–4816 (2000). [CrossRef]  

24. D. Zheng and F. Da, “A novel algorithm for branch cut phase unwrapping,” Opt. Lasers Eng. 49(5), 609–617 (2011). [CrossRef]  

25. T. J. Flynn, “Consistent 2-D phase unwrapping guided by a quality map,” in IGARSS'96. 1996 International Geoscience and Remote Sensing Symposium(IEEE, 1996), pp. 2057–2059.

26. M. Zhao, L. Huang, Q. Zhang, et al., “Quality-guided phase unwrapping technique: comparison of quality maps and guiding strategies,” Appl. Opt. 50(33), 6214–6224 (2011). [CrossRef]  

27. M. A. Herraez, D. R. Burton, M. J. Lalor, et al., “Robust, simple, and fast algorithm for phase unwrapping,” Appl. Opt. 35(29), 5847–5852 (1996). [CrossRef]  

28. T. J. Flynn, “Two-dimensional phase unwrapping with minimum weighted discontinuity,” J. Opt. Soc. Am. A 14(10), 2692–2701 (1997). [CrossRef]  

29. Y. Guo, X. Chen, and T. Zhang, “Robust phase unwrapping algorithm based on least squares,” Opt. Lasers Eng. 63, 25–29 (2014). [CrossRef]  

30. M. D. Pritt and J. S. Shipman, “Least-squares two-dimensional phase unwrapping using FFT's,” IEEE Trans. Geosci. Remote Sens. 32(3), 706–708 (1994). [CrossRef]  

31. H. Xia, S. Montresor, R. Guo, et al., “Phase calibration unwrapping algorithm for phase data corrupted by strong decorrelation speckle noise,” Opt. Express 24(25), 28713–28730 (2016). [CrossRef]  

32. M. Costantini, “A novel phase unwrapping method based on network programming,” IEEE Trans. Geosci. Remote Sens. 36(3), 813–821 (1998). [CrossRef]  

33. J. Martinez-Carranza, K. Falaggis, and T. Kozacki, “Fast and accurate phase-unwrapping algorithm based on the transport of intensity equation,” Appl. Opt. 56(25), 7079–7088 (2017). [CrossRef]  

34. J. Zhang, X. Tian, J. Shao, et al., “Phase unwrapping in optical metrology via denoised and convolutional segmentation networks,” Opt. Express 27(10), 14903–14912 (2019). [CrossRef]  

35. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). [CrossRef]  

36. T. Zhang, S. Jiang, Z. Zhao, et al., “Rapid and robust two-dimensional phase unwrapping via deep learning,” Opt. Express 27(16), 23173–23185 (2019). [CrossRef]  

37. L.-C. Chen, Y. Zhu, G. Papandreou, et al., “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV, 2018), pp. 801–818.

38. D. Li and X. Xie, “Deep learning-based Phase Unwrapping Method,” IEEE Access 11, 85836–85851 (2023). [CrossRef]  

39. G. Spoorthi, S. Gorthi, and R. K. S. S. Gorthi, “PhaseNet: A deep convolutional neural network for two-dimensional phase unwrapping,” IEEE Signal Process Lett. 26(1), 54–58 (2019). [CrossRef]  

40. G. Spoorthi, R. K. S. S. Gorthi, and S. Gorthi, “PhaseNet 2.0: Phase unwrapping of noisy data based on deep learning approach,” IEEE Trans. Image Process. 29, 4862–4872 (2020). [CrossRef]  

41. W. Huang, X. Mei, Y. Wang, et al., “Two-dimensional phase unwrapping by a high-resolution deep learning network,” Measurement 200, 111566 (2022). [CrossRef]  

42. J. Wang, K. Sun, T. Cheng, et al., “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021). [CrossRef]  

43. K. Wang, Y. Li, Q. Kemao, et al., “One-step robust deep learning phase unwrapping,” Opt. Express 27(10), 15100–15115 (2019). [CrossRef]  

44. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition(IEEE, 2016), pp. 770–778.

45. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241.

46. X. Xie, X. Tian, Z. Shou, et al., “Deep learning phase-unwrapping method based on adaptive noise evaluation,” Appl. Opt. 61(23), 6861–6870 (2022). [CrossRef]  

47. J. Chen, Y. Kong, D. Zhang, et al., “Two-dimensional phase unwrapping based on U2-Net in complex noise environment,” Opt. Express 31(18), 29792–29812 (2023). [CrossRef]  

48. Y. Qin, S. Wan, Y. Wan, et al., “Direct and accurate phase unwrapping with deep neural network,” Appl. Opt. 59(24), 7258–7267 (2020). [CrossRef]  

49. M. Xu, C. Tang, Y. Shen, et al., “PU-M-Net for phase unwrapping with speckle reduction and structure protection in ESPI,” Opt. Lasers Eng. 151, 106824 (2022). [CrossRef]  

50. H. He, C. Tang, L. Zhang, et al., “UN-PUNet for phase unwrapping from a single uneven and noisy ESPI phase pattern,” JOSA A 40(10), 1969–1978 (2023). [CrossRef]  

51. G. Han, J. Ma, S. Huang, et al., “Few-shot object detection with fully cross-transformer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(IEEE, 2022), pp. 5321–5330.

52. L. He and S. Todorovic, “DESTR: Object detection with split transformer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(IEEE, 2022), pp. 9377–9386.

53. S. K. Roy, A. Deria, D. Hong, et al., “Multimodal fusion transformer for remote sensing image classification,” IEEE Trans. Geosci. Remote Sens. 61, 1–20 (2023). [CrossRef]  

54. X. Wang, S. Yang, J. Zhang, et al., “Transformer-based unsupervised contrastive learning for histopathological image classification,” Med. Image Anal. 81, 102559 (2022). [CrossRef]  

55. H. Thisanke, C. Deshan, K. Chamith, et al., “Semantic segmentation using Vision Transformers: A survey,” Eng. Appl. Artif. Intell. 126, 106669 (2023). [CrossRef]  

56. W. Zhang, Z. Huang, G. Luo, et al., “TopFormer: Token pyramid transformer for mobile semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(IEEE, 2022), pp. 12083–12093.

57. Y. Kuang, F. Liu, Y. Liu, et al., “Correction of spurious phase sign in single closed-fringe demodulation using transformer based Swin-ResUnet,” Opt. Laser Technol. 168, 109952 (2024). [CrossRef]  

58. Z. Liu, Y. Lin, Y. Cao, et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision(IEEE, 2021), pp. 10012–10022.

59. Z. Zhao, M. Zhou, Y. Du, et al., “Robust phase unwrapping algorithm based on Zernike polynomial fitting and Swin-Transformer network,” Meas. Sci. Technol. 33(5), 055002 (2022). [CrossRef]  

60. X. Zhu, Z. Han, M. Yuan, et al., “Hformer: Hybrid convolutional neural network transformer network for fringe order prediction in phase unwrapping of fringe projection,” Opt. Eng. 61(09), 093107 (2022). [CrossRef]  

61. E. Xie, W. Wang, Z. Yu, et al., “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems 34, 12077–12090 (2021).

62. M. Gontarz, V. Dutta, M. Kujawińska, et al., “Phase unwrapping using deep learning in holographic tomography,” Opt. Express 31(12), 18964–18992 (2023). [CrossRef]  

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (13)

Fig. 1.
Fig. 1. The phase unwrapping method based on SFNet.
Fig. 2.
Fig. 2. An illustration of the proposed SFNet network. (a) Architecture of the SFNet. It contains a hierarchical encoder without positional encoding and a lightweight fully connected All-MLP decoder. (b) Architecture of each SFNet block. It consists of overlap patch merging, efficient self-attention, and mix-FFN modules. (c) Architecture of the Mix-FFN. ‘FFN’ represents the feed-forward network. (d) Architecture of the MLP layer. It is used to aggregate information from the encoder.
Fig. 3.
Fig. 3. The process of refinement.
Fig. 4.
Fig. 4. The RME method for generating phase data.
Fig. 5.
Fig. 5. Results in the ideal case.
Fig. 6.
Fig. 6. The values of RMSE and PFS before and after the refinement operation.
Fig. 7.
Fig. 7. Results in the discontinuous case.
Fig. 8.
Fig. 8. Results in the noisy case.
Fig. 9.
Fig. 9. The RMSE results in different noise levels.
Fig. 10.
Fig. 10. Results in the mixed case.
Fig. 11.
Fig. 11. Test results on real datasets
Fig. 12.
Fig. 12. The 3D surface reconstruction results.
Fig. 13.
Fig. 13. Test results in single-shot 3D imaging.

Tables (6)

Tables Icon

Table 1. The SFNet structure parameters

Tables Icon

Table 2. Evaluation results for each phase unwrapping method in the ideal case

Tables Icon

Table 3. Evaluation results for each phase unwrapping method in the discontinuous case

Tables Icon

Table 4. Evaluation results for each phase unwrapping method in the noisy case

Tables Icon

Table 5. Evaluation results for each phase unwrapping method in the mixed case

Tables Icon

Table 6. Size of the deep learning-based model parameters

Equations (9)

Equations on this page are rendered with MathJax. Learn more.

ψ ( x , y ) = φ ( x , y ) + 2 π k ( x , y )
Attention ( Q , K , V ) = Softmax ( Q K T d h e a d ) V
F i = Linear ( C i , C ) ( F i ) F i = Upsample ( H 4 × W 4 ) ( F i ) F = Concat ( F i ) F = Linear ( 4 C , C ) ( F ) M = Linear ( C , N c l s ) ( F )
ψ ( x , y ) = Refine ( ψ ( x , y ) )
Δ = i , j [ e 0 ( x + i , y + j ) e 0 ( x , y ) ]
e 0 ( x , y ) = e 0 ( x , y ) + 2 π × Round ( Δ 8 × 2 π )
φ ( x , y ) = angle ( exp ( 1 i × ψ ( x , y ) ) )
k ( x , y ) = Round ( ψ ( x , y ) φ ( x , y ) 2 π )
L C E = 1 H × W x = 1 H y = 1 W g ( x , y ) log ( p ( x , y ) )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.