VDE-Net: a two-stage deep learning method for phase unwrapping

Jiaxi Zhao; Lin Liu; Tianhe Wang; Xiangzhou Wang; Xiaohui Du; Ruqian Hao; Juanxiu Liu; Yong Liu; Jing Zhang

doi:10.1364/OE.469312

1. Introduction

Phase calculation plays an important role in many optical measurement techniques such as profile measurement [1], biomedical imaging [2], and synthetic aperture radar interferometry [3]. Generally, the phase is obtained by the arctangent function, which wraps the continuous phase in the (-π, π] interval [4]. Therefore, to acquire a continuous phase, numerous phase unwrapping methods have been proposed. These phase-unwrapping methods can be roughly classified into two categories: path-following methods [5] and minimum norm methods [6]. However, the accuracy of the unwrapped result using path-following methods is typically determined by the selected integration path, and the minimum norm methods are limited by unreliable data points in the wrapped phase, thus resulting in an inability to prevent the propagation of phase errors [7,8].

Significant deep learning achievements were recently made in the fields of image classification [9], segmentation [10], and restoration [11]. In addition, researchers attempted to recover the continuous phase from the wrapped phase using deep learning. Given that the difference between the wrapped phase and the real phase is only the wrap-count (integer jump of 2π) times of 2π, the real phase can be calculated by solving the value of wrap-count and adding its product with 2π to the wrapped phase at each pixel. Therefore, in [12–15], the problem of phase unwrapping was considered as a semantic segmentation problem in deep learning, the wrapped phase was considered as the input, and the wrap-count as a semantic label. Spoorthi et.al. [12] designed a fully convolutional neural network with an encoder–decoder structure (i.e., PhaseNet). The results revealed that PhaseNet is more efficient in terms of the computational time and robustness to noise when compared with conventional phase unwrapping methods. However, the wrap-count obtained by PhaseNet may be misclassified in the abruptly varying phase regions or disconnected regions of the output. Zhang et.al. [13] proposed a deep convolutional neural network, which is based on DeepLabV3+ [10], to obtain the wrap-count, and it significantly outperformed the conventional phase unwrapping algorithms in both simulation and real data tests. Nevertheless, similar to PhaseNet, this network accidentally segments a small number of pixels in the phase boundaries. Wu et al. [14] developed a residual en-decoder network (REDN) method to solve the phase unwrapping problem in Fourier domain Doppler optical coherence tomography (DOCT), which is a significantly deep network. Zeyada et.al. [15] used a deep recurrent residual U-Net to address the phase unwrapping problem in interferometric synthetic aperture radar, and the network exhibited a superior performance with fewer setting parameters. It should be noted that except for REDN, the abovementioned neural networks cannot directly obtain accurate segmentation results without post-processing operations for the network outputs. In addition, an incorrect classification leads to a minimum error of 2π. Hence, in [7,16], attempts were made to directly establish the relationship between the wrapped phase and the real phase with deep learning, which did not require post-processing operations and reduced the error. Wang et.al. [7] presented a deep convolutional neural network (DLPU-Net), which consists of a U-Net [17] and residual network [18], to directly restore the real phase from the wrapped phase. In addition, numerous tests were performed (including feasibility and accuracy tests, anti-noise, anti-aliasing, and generalization capability tests) to demonstrate the excellent unwrapping performance of the network. Qin et.al. [16] addressed the phase unwrapping problem with a new convolutional neural network (VUR-Net), which is a combination of a VGG [19], U-net, and residual network. Moreover, they developed two novel evaluation indices for phase-unwrapping. The experimental results revealed that VUR-Net is superior to DLPU-Net [7] and PhUn-Net [20] with respect to accuracy and robustness. However, they only randomly selected 100 images from the test set (3750 in total) instead of the entire test set when performing the comparison, thus introducing significant randomness into the results.

Deep learning relies on the rich representational ability of convolutional neural networks, i.e., with an increase in the representational ability of the model, the effect is more significant [21–23]. To date, apart from the depth, width, and cardinality of the network, the performance of deep convolutional neural networks is considered as closely related to the attention of the model, as verified in previous research [24–28]. Attention can inform the model as to where to direct focus, thereby improving the expressiveness of the regions of interest. Therefore, the representation ability and performance of the model can be improved using an attention mechanism to direct focus toward critical features and suppress non-critical features. With reference to the available literature, an attention mechanism has not been proposed for the phase unwrapping problem.

This paper proposes a novel deep convolutional neural network based on a weighted jump-edge attention mechanism, with Ref. to [29,30]. The proposed network is built on the basis of the original VUR-Net [16]. To reduce the influence of information loss due to pooling of high-level semantic information, we used dilated convolution [31] in the final convolutional layer of the downsampling convolution to replace the traditional convolution, which can ensure that the spatial resolution of the response does not change during the expansion of the receptive fields [32]. Moreover, dilated convolution maintains the number of trainable parameters of the convolution kernels [33]. Finally, by adding a weighted jump-edge attention mechanism, the network directed more attention toward the jump regions of the wrapped phase images, which improved the effect of phase unwrapping. Therefore, this network is referred to as VDE-Net (where V represents the basic VUR-Net, D represents the dilated convolution, and E represents the weighted jump edge). It should be noted that the attention map was defined before the training of the VDE-Net, thus rendering it unnecessary for the VDE-Net to learn where to direct focus while computing the features of wrapped phase images. In addition, the computational cost of the network was reduced. We compared the results of VUR-Net with a squeeze-and-excite (SE)-block [25] and a convolutional block attention module (CBAM) [21], which are two commonly used attention modules, and the comparison results indicated that the proposed method is superior to common attention mechanisms or alternative networks in terms of the accuracy of phase recovery, robustness in the presence of noise, and undersampling resistance. Furthermore, a wrapped phase image of a living red blood cell (RBC) (not included in the dataset) was accurately recovered by the proposed network, thus indicating the high generalization capacity of the method.

The remainder of this paper is organized as follows. Section II describes the proposed method. Section III presents the experimental results and discussion. Finally, Section IV presents the conclusions of this study.

2. Proposed method

This paper proposes a deep convolutional neural network (DCNN) combined with a weighted jump-edge attention mechanism for phase unwrapping, with the workflow shown in Fig. 1. It should be noted that VDE-Net consists of two stages: calculating the weighted jump edge maps of wrapped phase images and phase unwrapping. Generally, the jump-edge regions in the first stage find and highlight that the network in the second stage needs to pay attention to for better phase unwrapping.

Fig. 1. Schematic representation of the training workflow of VDE-Net.

Download Full Size | PDF

2.1 Calculation of the weighted jump edge

Mathematically, the relationship between real phase and wrapped phase can be expressed as follows:

(1)$$\phi (x,y) = \arctan \frac{{{\mathop{\rm Im}\nolimits} (\exp [j\varphi (x,y)])}}{{\textrm{Re} (\exp [j\varphi (x,y)])}}$$

where $\varphi (x,y)$ represents the real phase, and $\phi (x,y)$ represents the wrapped phase. Moreover, ${\mathop{\rm Im}\nolimits} ({\cdot} )$ and $textrm{Re} ({\cdot} )$ denote the imaginary and real parts of a complex number, respectively, and j is an imaginary unit. Due to the arctangent operation, $\phi (x,y)$ is wrapped in the interval (− π, π], and the process of recovering $\varphi (x,y)$ from $\phi (x,y)$ is referred to as phase unwrapping. Several examples of wrapped phase images are shown in Fig. 2(a), which were computed using Eq. (1) with real phase images (see the Data preparation section in the text for the data generation method).

Fig. 2. Visualization results of the VDE-Net. (a) Wrapped phase images from the test set; (b) the weighted jump edge maps of wrapped phase images; (c) the products of the weighted jump edge maps and the original wrapped phase images; (d) ground truth (GT); (e) unwrapped results of the proposed network; and (f) binary error maps between (d) and (e).

Download Full Size | PDF

As can be seen from Fig. 2(a), the real phase image is divided into multiple continuous distribution areas with values ranging from -π to π after the arctangent operation. Moreover, there are multiple obvious jump edges at the border of these regions, and these jump edges can be regarded as important signs of the wrapped phase image. In particular, the phase value of a nearby interval is wrapped if there is a jump edge. Therefore, more attention should be directed toward these phase jump edges when performing phase unwrapping. It should be noted that the DCNN combined with the attention mechanism can learn which feature or pre-defined areas to focus on, suppress useless information, and facilitate the flow of information in the network [21,29]. However, the existing methods of phase unwrapping using deep learning are based on the features extracted from the entire wrapped phase image; therefore, all regions in the wrapped phase image receive the same degree of attention. Consequently, we introduced a weighted jump-edge attention mechanism to concentrate on the jump edges of the wrapped phase image. The calculation process of the weighted jump edge map is expressed by the following formula:

(2)$${I_{we}} = {f_e} \times \phi {(x,y)_{edge}} + {f_b} \times \{ - (\phi {(x,y)_{edge}}) + 1\}$$

(3)$$\phi {(x,y)_{edge}} = {P_{br}}\{ \phi {(x,y)_{re}},\phi {(x,y)_{ce}}\}$$

(4)$$\phi {(x,y)_{re}} = {P_{bin}}\{ {P_{abs}}({P_{rs}}(\phi (x,y)))\}$$

(5)$$\phi {(x,y)_{ce}} = {P_{bin}}\{ {P_{abs}}({P_{cs}}(\phi (x,y)))\}$$

where ${I_{we}}$ represents the weighted jump edge map, ${f_e}$ represents the weighted factor, and ${f_b}$ represents background factor. It should be noted that ${f_e}$ was set as 1.5, and ${f_b}$ as 0.5 by comparative experiments. Moreover, $\phi {(x,y)_{edge}}$ denotes the final jump edge map of the wrapped phase image; ${P_{br}}\{{\cdot} \}$ denotes bitwise OR operation; $\phi {(x,y)_{re}}$ and $\phi {(x,y)_{ce}}$ denote the jump edge maps after the subtraction of adjacent rows and adjacent columns, respectively; ${P_{bin}}({\cdot} )$ represents the image binarization operation (the threshold in this paper is π); ${P_{abs}}({\cdot} )$ represents the absolute value operation; and ${P_{rs}}({\cdot} )$ and ${P_{cs}}({\cdot} )$ represent the subtraction operation of adjacent rows and adjacent columns of the wrapped image, respectively. Figure 2(b) presents the weighted jump edge maps of the wrapped phase images expressed by Eqs. (2) –(5), Fig. 2(c) presents the products of the weighted jump edge maps and the original wrapped phase images.

2.2 Architecture of the proposed VDE-Net

The architecture of VDE-Net is based on VUR-Net [16], which combines the advantages of VGG [19], residual learning [18], and U-Net [17], as shown in Fig. 3. In VGG, a stack of small convolution kernels (generally 3 × 3 convolution kernels) may have the same receptive fields as a large convolution kernel (two 3 × 3 convolution kernels equivalent to a 5 × 5 convolution kernel or three 3 × 3 convolution kernels equivalents to a 7 × 7 convolution kernel), while significantly reducing the number of convolution kernel parameters. The emergence of residual learning prevents the degradation problem caused by the increase in the depth of the network. In addition, U-net combines low-level detail information with high-level semantic information, which improves the final result [34]. The upsampling layers in its extensive path can ensure that the size of the output is consistent with the input. However, it should be noted that even by introducing upconvolution or upsampling, the information loss caused by the pooling layer can only be compensated to an extent [35]. Therefore, we used the dilated convolution [31] illustrated in Fig. 4 as a bridge between the contracting path on the left and the expanding path on the right, such that the multiscale contextual information could be aggregated without losing resolution. In order to clarify the positional relationship between the convolution blocks in the network, we annotated all the convolution blocks in the network, as shown in Fig. 3(a). Due to the max pooling 2 × 2 operation, the x-y-size of the feature map will be reduced to half of the original size, so the x-y-size of the Conv_5 feature map after the 5th residual convolution block becomes 16 × 16. Considering the x-y-size of the feature map after the Conv_5, the dilation rate of the dilated convolution was set as 2 in this paper.

Fig. 3. (a) Structure of the VDE-Net architecture. Each box denotes the corresponding operation. The x-y-size and the number of channels are denoted on top or bottom of the box. The arrows and symbol denote the different operations. Conv_n means the nth block in units of blocks; (b) detailed diagrams of Residual Convolution Block1 in (a); (c) detailed diagrams of Residual Convolution Block2 in (a); (d) detailed diagrams of Residual Convolution Block3 in (a). In (b), (c) and (d), H_in× W_in represents the x-y-size of the input, H_out× W_out represents the x-y-size of the output, C_in represents the number of channels of the input, and C_out represents the number of channels of the output

Download Full Size | PDF

Fig. 4. A two-dimensional dilated convolution with a kernel size of 3 × 3 and different dilation rates r = 1, 2, 3. It should be noted that when the dilation rate is 1, dilated convolutions are the same as standard convolutions. Dilated convolutions enlarge the receptive field while maintaining the spatial resolution.

Download Full Size | PDF

The input to the network was a 256 × 256 wrapped phase image and its product with the weighted jump edge map, and the output was the corresponding unwrapped result of the same size. It should be noted that the product based on the weighted jump edge map was not directly inputted into the network. In particular, it was concatenated with the final convolutional feature layer as a spatial attention map, thus allowing for the DCNN to direct more attention toward the jump edges. In addition, the weighted jump edge map is directly derived from the wrapped phase image without additional training or the formation of additional parameters, thus reducing the computational costs. Moreover, even if the type of wrapped phase image changes, it allows for the network to focus on the critical information in the wrapped phase image due to its close connection with the wrapped phase image.

As we can see from Fig. 3(a), there were 11 blocks with three different colors. Each color was paired with a residual convolution block whose details were shown in Fig. 3(b)–3(d). The bule block mainly consisted of two 3 × 3 convolution kernels and the cyan block was mainly composed of three 3 × 3 convolution kernels. Unlike the previous two convolution blocks, the yellow block consisted of three dilated convolutions, which could expand the receptive field without increasing the parameters, and the channels of extracted features ranged from 64–512, which ensured that the network demonstrated a high capability for feature extraction and learning. In addition, the adoption of batch normalization can effectively solve the problem of vanishing or exploding gradients [36].

2.3 Network implementation details

The proposed VDE-Net was implemented using PyTorch version 1.10.1, based on Python 3.9.7. We performed network training and testing on a personal computer (PC) with an AMD central processing unit (CPU) Ryzen 9 5950X (3.40 GHz) and 64 GB of random access memory (RAM) using an NVIDIA GeForce RTX 3090 graphics processing unit (GPU). We set the mean absolute error (MAE) between the network output and real phase as the loss function, and an Adam optimizer with a batch size of 8 was adopted as the optimizer. The learning rate was set to 0.0002 and the total epoch number was 200.

2.4 Data preparation

The phase images used in this study were generated according to the method described in [16]. The first step was to randomly generate an initial square matrix with dimensions between 2 × 2 and 25 × 25, and the values were randomly distributed between [0, 100] with a Gaussian or normal distribution. The second step was to enlarge the size of the initial square matrix to a target size of 256 × 256 by means of interpolation (randomly selecting nearest neighbor interpolation, bilinear interpolation, or bicubic interpolation). The final step was to subtract the minimum value in the phase image, to ensure that all the values were non-negative. Given that the above phase images were all randomly generated, the diversity of the training data and the generalization ability of the network model could be ensured, such that overfitting of the model could be prevented. We generated 37,500 real phase images and obtained the corresponding wrapped phase images using Eq. (1), among which 60%, 20%, and 20% were used as the training, validation, and test set images, respectively. In addition, to improve the invariance and robustness of the network, we randomly flipped the images horizontally and vertically in the dataset. Thereafter, noise with a random type (one of Gaussian, salt & pepper or multiplicative noises) and a random level (standard deviations of Gaussian and multiplicative noises from 0.01–0.20, or density of salt & pepper noise from 0.01–0.20) was added to all of the training data in order to enhance the noise resistance of the network.

3. Experimental results and discussion

To accurately evaluate the performance of the network in all experiments, the structural similarity (SSIM) [37]. The SSIM is defined by the following equation:

(6)$$SSIM(\varphi ,{\varphi _\textrm{u}}) = \frac{{(2{\mu _\varphi }{\mu _{{\varphi _\textrm{u}}}} + {C_1})(2{\sigma _\varphi }{\sigma _{{\varphi _\textrm{u}}}} + {C_2})}}{{({\mu _\varphi }^2 + {\mu _{{\varphi _\textrm{u}}}}^2 + {C_1})({\sigma _\varphi }^2 + {\sigma _{{\varphi _\textrm{u}}}}^2 + {C_2})}}$$

where the ${\varphi _u}$ represents the output of network, ${\mu _i}$ and ${\sigma _i}$ refer to the mean and standard deviation of i ($i = \varphi ,{\varphi _u}$), ${C_1}$ and ${C_2}$ are constants introduced to avoid instability. From the definition of the SSIM, we can see that the although SSIM can represent the degree of the overall similarity between images, it lacks a measure of the degree of difference between the corresponding pixel values. It should be noted that the pixel values of the unwrapped image are very important for subsequent processing and analysis. For example, the calculated thickness of biological cells is directly related to the phase value in biomedical imaging. Thus, the binary error map (BEM) [16] is also employed to evaluate the unwrapped results. The BEM can be mathematically expressed as follows:

(7)$$\{{_{0\textrm{, }otherwise}^{1\textrm{, }|{{\varphi_u}(x,y) - \varphi (x,y)} |\textrm{ } \le \textrm{ }[{\varphi (x,y) - \min (\varphi (x,y))} ]\textrm{ } \times \textrm{ }5\%}} $$

where operator min (·) represents the minimum. As indicated in Eq. (7), The BEM can intuitively obtain the recovery status of the pixel values of the unwrapped phase image by setting the correctly unwrapped pixels (CUPs), whose absolute of the phase error with GT is no more than 5% of its relative phase height, to one while the others are zero. To obtain the proportion among the total pixels that the CUPs account for, the accuracy of unwrapping (AU) [16] is also employed to evaluate the unwrapped results. The AU is defined by the following equation:

(8)$$AU = \frac{{\sum\limits_{x = 1}^X {\sum\limits_{y = 1}^Y {BEM(x,y)} } }}{{X \times Y}} \times 100\%$$

where X and Y represent the numbers of rows and columns of the unwrapped phase image, respectively.

In the following discussion, the abovementioned indices mentioned above are used to provide a comprehensive estimation of the unwrapped phase images obtained using different methods.

As shown in Fig. 2, Fig. 2(d) presents the real phases of Fig. 2(a), which are referred to as the ground truth (GT) in the deep learning field, and Fig. 2(e) presents the outputs of VDE-Net after feeding Fig. 2(a) into the network, which was almost identical to the real phase images. The SSIMs of the outputs shown in Fig. 2(e) were 0.9998, 0.9998, 0.9998 and 0.9996, respectively, which were significantly high and close to 1. Figure 2(f) presents the corresponding BEMs between Fig. 2(d) and 2(e), and their AUs were 98.25%, 98.48%, 99.62%, and 99.86%, respectively. The black pixels indicated by the red arrows in Fig. 2(f) represent the error zones, which occupied only a small part of the entire image. Therefore, the proposed method can achieve accurate phase unwrapping based on the results shown in Fig. 2.

The phase-unwrapping results of VDE-Net presented in Fig. 2 do not illustrate its superiority due to the small number of non-complex inputs. Hence, we conducted feasibility and accuracy tests, anti-noise performance tests, anti-undersampling performance tests, and generalization capability tests to further demonstrate the advantages of the proposed network.

3.1 Feasibility and accuracy test

To demonstrate the feasibility and accuracy of the proposed network, we compared the unwrapped result of a wrapped phase image randomly selected from the test set with three other networks, namely, VUR-Net, VUR-Net (SE) and VUR-Net (CBAM); and the final two networks were new networks formed by separately integrating SE and CBAM modules into the VUR-Net. Figure 5 presents a comparison between the unwrapped results obtained using the abovementioned four networks. Figure 5(a) presents the network input, Fig. 5(b) presents the weighted jump edge map of the wrapped phase image, Fig. 5(c) presents the product of the weighted jump edge map and the original wrapped phase image, and Fig. 5(d) presents the GT. Figures 5(e)–5(h) present the outputs of VUR-Net, VUR-Net (SE), VUR-Net (CBAM), and VDE-Net, respectively. As can be seen from Figs. 5(d)–5(h), the outputs of these four networks were almost identical to the GT. In particular, the SSIMs of these outputs were 0.9836, 0.9889, 0.9991, and 0.9998, respectively, which were significantly high. However, as can be seen from the BEMs illustrated in Figs. 5(i)–(l), there were significant differences between the outputs. Compared with the SSIMs, the AUs (62.57%, 36.39%, 97.57% and 99.78% for them respectively) of the outputs may be significantly smaller, e.g., VUR-Net (SSIM 0.9836 with AU 62.57%) and VUR-Net (SE) (SSIM 0.9889 with AU 36.39%). Moreover, the AU differences between different networks was relatively significant, e.g., 36.39% for VUR-Net (SE) and 99.78% for VDE-Net. To comprehensively evaluate the effectiveness of the proposed network, the phase height was plotted across the horizontal directional lines (red lines) and the vertical directional lines (blue lines), as illustrated in Figs. 5(m)–(p). The dashed lines indicate the GT and the solid lines indicate the outputs of the networks involved. As can be seen from Figs. 5(o) and 5(p), there were almost only solid lines and no dashed lines, which indicates that the solid and dashed lines were almost coincident. In contrast, the solid line and dashed line were both observed, and there was a clear interval, as shown in Fig. 5(m) and Fig. 5(n). Compared with the SSIM, the AU more accurately described the phase recovery effect of a network. Nevertheless, the performance of VDE-Net with the highest SSIM and AU was significantly high.

Fig. 5. Comparison of the unwrapped results of an example of the model application. (a) A wrapped phase image from the test set; (b) the weighted jump edge map of the wrapped phase image; (c) the product of the weighted jump edge map and the original wrapped phase image; (d) GT; (e) VUR-Net output; (f) VUR-Net (SE) output; (g) VUR-Net (CBAM) output; (h) VDE-Net output; (i) BEM between (d) and (e); (j) BEM between (d) and (f); (k) BEM between (d) and (g); (l) BEM between (d) and (h); (m)-(p) comparison of the phase heights across the designated lines in (d) and (e), (d) and (f), (d) and (g), and (d) and (h). In (m)-(p), the dashed and solid lines represent the GT and the network output, respectively.

Download Full Size | PDF

Apart from the above comparisons of the unwrapped results of a wrapped phase image using the mentioned networks, Table 1 presents the mean of the SSIMs (M_SSIMs), the mean of the AUs (M_AUs), the variance of the AUs (V_AUs) of the test set for all tested models in three training trials and the average of the test results of three training trials. In particular, with a decrease in the size of the V_AUs, the model stability in the presence of phase unwrapping. As can be seen from Table 1, no models differed significantly with respect to M_SSIMs or M_ AUs over the test set. However, it should be noted that in the three training trials, the M_AUs of the proposed method were the highest, and they were only slightly lower than those of VUR-Net (CBAM) on M_SSIMs. Besides, the V_AUs (0.031) of the proposed method was the lowest and kept unchanged, which indicated that our proposed method is very stable.

Table 1. Comparison of M_SSIMs, M_AUs and V_ AUs over the test set unwrapped by the VUR-Net, VUR-Net (SE), VUR-Net (CBAM) and VDE-Net.^a

View Table | View all tables in this article

3.2 Anti-noise performance test

Noise is a critical factor that degrades the phase-unwrapping accuracy of the wrapped phase image and is unavoidable under practical conditions. Therefore, it is necessary to verify the anti-noise performance of VDE-Net. A wrapped phase image was randomly selected from the test set and linearly normalized to [0,1], and the standard deviations of Gaussian noise were added from 0.030 to 0.240. Subsequently, it was linearly re-normalized to the same range as the input. The results of the noisy wrapped phase images unwrapped by VUR-Net, VUR-Net (SE), VUR-Net (CBAM), and VDE-Net are presented in Fig. 6. Figure 6(a) presents the noisy wrapped phase images, and the level of the noise increased from 0.030 to 0.240 (standard deviations of Gaussian). The numbers below Fig. 6(a) indicate the noise levels, and Fig. 6(b) presents the GT. Figures 6(c)–6(f) show the unwrapped results using VUR-Net, VUR-Net (SE), VUR-Net (CBAM), and VDE-Net with the corresponding SSIMs beneath them. As shown in Figs. 6(c)–6(f), although the noise level increased, the SSIMs of the four networks remained high, as verified by the M_SSIMs in Table 2, and there was no significant difference between them. In contrast, the AUs changed significantly when observing the BEMs illustrated in Figs. 6(g)–6(i), with the AUs below them corresponding to Figs. 6(c)–6(e), respectively. For example, the AU ranged from 97.00–90.00% in Figs. 6(g). Moreover, AU did not decrease monotonically with an increase in noise levels, and the changes were unpredictable. There were two possible causes for this phenomenon; namely, the dataset did not contain all types of noise or the number of wrapped phase images added with several types of noise was significantly less than the others, thus resulting in the trained model not fully learning the features of all types of noise. Consequently, the model demonstrated a low performance when unwrapping the wrapped phase images. Additionally, the models built with different modules provided advantages in the presence of different types of noise during the training process. Hence, these models exhibited different performances in the presence of the same noise. However, VDE-Net demonstrated a high performance in the presence of different levels of noise. Moreover, there were no significant differences in the anti-noise performance, as verified by the data shown in Table 2. Table 2 presents the M_AUs and the V_AUs of the phase unwrapping results of the four networks over the wrapped phase images with different levels of noise. As can be seen from the table, VDE-Net demonstrated a superior performance to the other models in terms of both accuracy and robustness against noise. To determine the cause of the high robustness of the VDE-Net against noise, we obtained weighted jump edge maps of the wrapped phase images with different levels of Gaussian noise, as shown in Fig. 6(k). As can be seen from the figure, with the addition of different degrees of noise, most of the weighted jump edge maps were not influenced by the noise, and were almost unchanged. This is because, in the process of calculating the jumping edge map, only the jumping edge was considered, while other areas (including noise) were suppressed. This allowed for the VDE-Net based on the weighted jump-edge attention mechanism to exhibit robustness in the presence of noise. In addition, we conducted comparison experiments of the above models on the anti-noise of salt-and-pepper noise and multiplicative noise, and the results were similar to those of Gaussian noise. For conciseness, the results are not individually presented in this paper. In order to make our conclusion more convincing, 100 wrapped phase images were randomly sampled from the test set and added with different levels of noise respectively. The phase unwrapping results of the above models were shown in Table 3. Similar to the result of the single noisy wrapped phase image, the M_SSIMs (the average, 0.9892, 0.9924, 0.9924 and 0.9924 for them respectively) of the phase unwrapping results of all models were still very high and the M_AUs had an obvious change. When the noise ratio was less than 0.15, the VUR-Net (CBAM) performed best among all models in terms of the M_AUs, but the difference, whose biggest value was 0.0198 when the noise ratio was 0.03, was small between the VUR-Net (CBAM) and the VDE-Net. As the noise ratio increased, the best model became the VDE-Net and the M_AUs of the results of the VDE-Net was always greater or equal to 83.56% while the other three models performed poorly with the M_AUs falling to 62.24%. The average of M_AUs (87.87%), the V_AUs and its average (0.049) in the Table 3 indicated that the VDE-Net could still have a good performance even in the face of noisy wrapped phase images with different levels.

Fig. 6. Anti-noise performance test. (a) Network input (noisy wrapped phase); (b) GT; (c) unwrapped phase using VUR-Net; (d) unwrapped phase using VUR-Net (SE); (e) unwrapped phase using VUR-Net (CBAM); (f) unwrapped phase using VDE-Net; (g) BEMs between (c) and (b); (h) BEMs between (d) and (b); (i) BEMs between (e) and (b); (j) BEMs between (f) and (b); and (k) the weighted jump edge maps of (a).

Download Full Size | PDF

Table 2. Comparison of M_SSIMs, M_AUs, and V_ AUs over the outputs of the noisy wrapped phase images with different levels unwrapped by the involved models.^a

View Table | View all tables in this article

Table 3. Comparison of M_SSIMs, M_AUs and V_ AUs over the outputs of 100 wrapped phase images with different levels noise unwrapped by the involved models.^a

View Table | View all tables in this article

3. 3 Anti-undersampling performance test

Apart from noise, the phase distribution is generally recovered ineffectively due to undersampling [38], which occurs when the difference between adjacent pixels is greater than π [7]. To verify the performance of the proposed network on the undersampling resistance, a group of wrapped phase images was randomly selected from the test set, with an undersampling ratio (UR, the ratio of the undersampled pixels to the total pixels) [16] that ranged from 10.89–50.08%, as shown in Fig. 7(a). Figure 7(b) presents the GT of Fig. 7(a), and the phase heights ranged from 98.18–112.2. To intuitively present the undersampling distributions of these real phase images, the binary maps of the undersampled points of the real phase images were computed and illustrated in Fig. 7(c). Figures 7(d)–7(g) present the unwrapped results using VUR-Net, VUR-Net (SE), VUR-Net (CBAM), and VDE-Net; in addition to the corresponding SSIMs below. As can be seen from the figures, the SSIMs of all the outputs of the four networks were high, and the M_SSIMs of the unwrapping results of the wrapped phase with different URs were not lower than 0.9985, as shown in Table 4. Although all networks demonstrated similar levels of performance in terms of the SSIM, the performances differed in terms of the BEM, as shown in Figs. 7(h)–7(k). For example, the AU of the outputs unwrapped by VUR-Net (CBAM) was a maximum of 97.14%, and the AU of the outputs unwrapped by VUR-Net was only 84.97% when the UR was 30.79%. Moreover, the AU of the outputs unwrapped by VDE-Net was 94.27%, whereas the AU of the outputs unwrapped by VUR-Net (SE) was only 85.78% when the UR was 50.08%. The phase-unwrapping results for the two undersampling cases cannot fully verify the undersampling resistance of the network. Thus, the M_SSIMs, M_AUs, and V_AUs of the unwrapping results of the undersampled wrapped phase images with different URs were calculated and presented in Table 4. As shown in Table 4, VDE-Net demonstrated an optimal performance with respect to the M_SSIMs, M_AUs, and V_AUs against undersampling. The AU changed significantly (as shown in Fig. 7(h), the AU changed from 99.82% to 84.97%) with an increase in the undersampling rate, and was similar to that in the noise immunity test shown in Fig. 6. Therefore, the cause of this phenomenon is the same as that for noise immunity, as discussed in section 3.2. However, given that the parameters were strictly ensured as consistent during the training and testing, with the exception of the network architectures, this phenomenon did not influence the determination of the performances of the above-mentioned networks. Like the experiments in section 3.2, 800 undersampled wrapped phase images (100 images for each different undersampling ratios) were also randomly sampled from the test set, and then were sent into the above models for phase unwrapping. The data in the Table 5 presented that the M_SSIMs (the average, 0.9979, 0.9982, 0.9988 and 0.9987 for them respectively) of the results of all models were very close to 1 whatever undersampling ratio was. Besides, we can easily find that the VUR-Net (CBAM) and the VDE-Net were the best performing models among the all models with the highest test accuracy in the M_SSIMs, the M_AUs or the V_AUs. However, it should be noted that the average of the M_SSIMs of the results of the VUR-Net (CBAM) was only 0.0001 larger while the average of the M_AUs was 0.0071 smaller compared with the VDE-Net. What’s more, the average of the V_AUs of the results of the VDE-Net was as low as 0.001 which was the smallest value among all averages and the V_AUs of the results of the VDE-Net under different undersampling ratios were the minimum values. Therefore, there is no doubt that our proposed method still performed best when unwrapping the undersampled wrapped phase images.

Fig. 7. Anti-undersampling performance test. (a) Network input (undersampled wrapped phase); (b) GT; (c) undersampling binary maps of GT; (d) unwrapped phase using VUR-Net; (e) unwrapped phase using VUR-Net (SE); (f) unwrapped phase using VUR-Net (CBAM); (g) unwrapped phase using VDE-Net; (h) BEMs between (d) and (b); (i) BEMs between (e) and (b); and (j) BEMs between (f) and (b); and (k) BEMs between (g) and (b).

Download Full Size | PDF

Table 4. Comparison of M_SSIMs, M_AUs, and V_ AUs over the outputs of the undersampled wrapped phase images unwrapped by the involved models.^a

View Table | View all tables in this article

Table 5. Comparison of M_SSIMs, M_AUs and V_ AUs over the outputs of 800 undersampled wrapped phase images with different levels unwrapped by the involved models.^a

View Table | View all tables in this article

3. 4 generalization capability test

To examine the generalization capability of the VDE-Net, we used the involved networks to retrieve a wrapped phase image of a living RBC, which was not included in the training dataset, and compared the unwrapped results, as illustrated in Fig. 8. Similar to Fig. 5, Fig. 8(a) presents the wrapped phase image of a living RBC obtained by white-light diffraction phase microscopy (wDPM) [39] and the fast Fourier transform method (FFT) [40], Fig. 8(b) presents the weighted jump edge map of the wrapped phase image, Fig. 8(c) presents the product of the weighted jump edge map and the original wrapped phase image, and Fig. 8(d) presents the GT obtained using the minimum norm methods. Figures 8(e)–8(h) present the outputs of VUR-Net, VUR-Net (SE), VUR-Net (CBAM), and VDE-Net, respectively. As can be seen from the figures, except for the output of the VDE-Net, there were significant differences between the other outputs and GT, and the SSIMs were 0.0883, 0.5357, 0.4652, and 0.9756, respectively. In addition, the AU of VDE-Net was 72.01%, which was significantly higher than that of the other networks (0% for VUR-Net, 0.05% for VUR-Net (SE), and 0.6% for VUR-Net (CBAM)), and their BEMs were almost entirely black, as shown in Figs. 8(i)–8(l). Only several white dots were observed, as indicated by the red arrows in Fig. 8(j) and Fig. 8(k). Figures 8(m)–8(p) present the phase height comparison of the unwrapped results across the horizontal directional lines (red lines) and vertical direction lines (blue lines). The dashed lines indicate the GT, and the solid lines indicate the outputs of the networks involved. The curves extracted from Fig. 8(e)–8(g) were completely disjointed from the curves extracted from the GT, as shown in Figs. 8(m)–8(o), although the profiles of the curves in Figs. 8(f)–8(g) were similar to those of the GT. In contrast, there was only a significantly small degree of deviation between the curves in Fig. 8(h) and those of the GT. In addition, the three-dimensional (3D) maps of Figs. 8(e)–8(h) are presented in Fig. 9 for a more intuitive comparison of the output results. Based on the comparisons above, the superior generalization capacity of the proposed method was fully demonstrated with respect to the abovementioned networks. With the exception of the inadequate training with many wrapped phase maps of living RBCs, the low generalization capacity of VUR-Net may be partly due to its inability to select key information from multiple segments of information, given the lack of an attention mechanism. The SE and CBAM require a moderate amount of new data to adaptively learn where to direct attention, although the VUR-Net (SE) and VUR-Net (CBAM) each contain an attention mechanism that is effective when unwrapping unseen new data. Different from SE or CBAM, the weighted jump edge map proposed in this study was defined prior to the training of the network, which was calculated based on the characteristics of the wrapped phase image. Therefore, VDE-Net can demonstrate a high performance when unwrapping an unseen wrapped phase image, even without new training. To further verify the generalization ability of our proposed method, we re-acquired 50 wrapped phase images of living RBCs and performed phase unwrapping with the above models. The results were shown in Table 6. In order to compare the generalization ability of the above models more intuitively, we randomly selected 4 images from these 50 wrapped phase images and plotted the corresponding results in Fig. 10. As shown in Table 6, the M_SSIMs of the outputs of our proposed method was still as high as 0.9649 while the others (0.2129, 0.7874, 0.6760 for them respectively) were significantly lower. In addition, the M_AUs of the outputs of the proposed method was much higher than the M_AUs of the outputs of the rest models although it became 68.26%. Not only that, but the V_ AUs of the outputs of our proposed method was still the smallest. As presented in Fig. 10, Fig. 10(a) shows the 4 wrapped phase images of living RBCs, Fig. 10(b) is the products of the weighted jump edge maps and the original wrapped phase images, Fig. 10(c) is the GT, Figs. 10(d)–10(g) represent the unwrapped results using the VUR-Net, VUR-Net (SE), VUR-Net (CBAM) and VDE-Net, with the corresponding SSIMs beneath them. As shown in Figs. 10(d)–10(g), the outputs of the VDE-Net were significantly more similar to GT than the outputs of the other networks and the SSIMs of the outputs of the VDE-Net were all greater than 0.90. Figures 10(h)–10(k) are the BEMs with the corresponding AUs below them. There were still very few white spots when observing Fig. 10(h)–10(j) while we could easily find a lot of white spots in Fig. 10(k), which corresponds to the higher AUs (67.28%, 62.77%, 79.73% and 71.02%) below these figures. Therefore, Table 6 and Fig. 10, especially Fig. 10(k), could well prove that our proposed method has strong generalization ability.

Fig. 8. Comparison of the unwrapped results of a living RBC using the involved models. (a) Network input (wrapped phase image of a living RBC); (b) the weighted jump edge map of wrapped phase image; (c) the product of the weighted jump edge map and the original wrapped phase image; (d) GT; (e) VUR-Net output; (f) VUR-Net (SE) output; (g) VUR-Net (CBAM) output; (h) VDE-Net output; (i) BEMs between (d) and (e); (j) BEMs between (d) and (f); (k) BEMs between (d) and (g); and (l) BEMs between (d) and (h). (m)-(p) The comparisons of the phase heights across the designated lines in (d) and (e), (d) and (f), (d) and (g), (d) and (h) respectively. In (m)-(p), the dashed and solid lines represent the GT and network output, respectively.

Download Full Size | PDF

Fig. 9. The 3D maps of the unwrapped results of the living RBC using the involved models. (a) The 3D map of the GT; (b) the 3D map of the VUR-Net output; (c) the 3D map of the VUR-Net (SE) output; (d) the 3D map of the VUR-Net (CBAM) output; and (e) the 3D map of the VDE-Net output.

Download Full Size | PDF

Fig. 10. Generalization capability test. (a) Network input (wrapped phase images of 4 living RBCs); (b) the products of the weighted jump edge maps and the original wrapped phase images; (c) GT; (d) unwrapped phases using VUR-Net; (e) unwrapped phases using VUR-Net (SE); (f) unwrapped phases using VUR-Net (CBAM); (g) unwrapped phases using VDE-Net; (h) BEMs between (d) and (c); (i) BEMs between (e) and (c); (j) BEMs between (f) and (c); (k) BEMs between (g) and (c).

Download Full Size | PDF

Table 6. Comparison of M_SSIMs, M_AUs and V_ AUs over the outputs of 50 wrapped phase images of the living RBCs unwrapped by the involved models.^a

View Table | View all tables in this article

Moreover, the trainable parameters and time required for phase unwrapping of the networks were determined to further evaluate the performance of the models, as shown in Table 7. The selected test models demonstrated optimal performances in three training trials (the largest sum of M_SSIMs and M_AUs). As can be seen from Table 7, although VUR-Net exhibited the least number of parameters and computation time, its performance on the test set was the lowest among the networks. In addition, VUR-Net (SE) increased the number of trainable parameters and computation time due to the addition of the SE module, which was designed to improve the representational power of a network by enabling it to perform dynamic channel-wise feature recalibration [25]. The CBAM module in VUR-Net (CBAM) includes the channel attention module, in addition the spatial attention module, which can enhance the feature representation ability of the network. However, the cost is significant. Compared with SE or CBAM, the proposed weighted jump-edge attention mechanism was not dependent on the intermediate features of the network to train, as it was defined before training. Moreover, the calculation of the weighted jump edge map and its integration into the network was simple and rapid. Therefore, the computational space required by the proposed network was similar to that of the VUR-Net, as shown in Table 7, and the unwrapping time of a single 256 × 256 wrapped phase image was 8.02 ms (including the mean time required to obtain a weighted jump edge map over the test set, i.e., 1.02 ms). This can satisfy the real-time requirements of practical applications. Finally, combining Table 1 and Table 7, although the M_SSIMs of VUR-Net (CBAM) over the test set were slightly higher (only 0.0001) than those of VDE-Net, the M_AUs, computational spaces, and time of VDE-Net were superior. Therefore, the proposed method based on the weighted jump-edge attention mechanism demonstrated the optimal performance, combined with the various experimental data.

Table 7. Comparison of the models in terms of computational space and time.

View Table | View all tables in this article

Furthermore, to clarify the relationship between the weighted factor ${f_e}$ and network performance, we compared the performance of VDE-Net with different weighted factors over the test set, which was the average of three training trials, as shown in Table 8. The weighted factor ${f_e}$ was set from 1–2.5 with an interval of 0.5, which indicates that ${f_e}$ was larger than the background factor ${f_b}$ by a factor of 2, 3, 4, and 5. As can be seen from Table 8, the network demonstrated the optimal performance when ${f_e}$ was 1.5, which was the value of the weighted factor of jump edge in this study. In addition, the performance of the network did not increase with an increase in ${f_e}$. This is because phase unwrapping is essentially a phase recovery process, which focuses on the jump edge and considers the background region. Therefore, if the value of the background region is excessively small when compared with the jump edge, which indicates that the ${f_e}$ is excessively large, the network may ignore the part of the background region with a small value, thus distorting the phase recovery results.

Table 8. Comparison of M_SSIMs and M_AUs over test set for different weighted factors ${f_e}$.^a

View Table | View all tables in this article

4. Conclusion

In summary, this paper proposes a novel DCNN with a weighted jump-edge attention mechanism, namely, VDE-Net, to solve the phase unwrapping problem. Various experiments were conducted to demonstrate the effectiveness and robustness of VDE-Net. In addition, the generalization capacity of VDE-Net was verified by unwrapping the wrapped phase image of a living RBC, and was significantly higher than those of the three compared networks. Furthermore, VDE-Net achieved a significant improvement in the network performance with only a slight increase in parameters. Therefore, the proposed framework demonstrates considerable potential and a wide applicability with respect to phase unwrapping, which can facilitate further research.

Funding

University of Electronic Science and Technology of China (ZYGX2021YGCX020); National Natural Science Foundation of China (61405028).

Disclosures

The authors declare no conflicts of interest.

Data availability

No data were generated or analyzed in the presented research.

References

1. C. Edwards, A. Arbabi, G. Popescu, L. L. J. L. S. Goddard, and Applications, “Optically monitoring and controlling nanoscale topography during semiconductor etching,” Light: Sci. Appl. 1(9), e30 (2012). [CrossRef]

2. T. Cacace, V. Bianco, P. J. O. Ferraro, and L. I. Engineering, “Quantitative phase imaging trends in biomedical applications,” ACS Nano 135, 106188 (2020). [CrossRef]

3. L. Zhou, H. Yu, Y. J. I. T. O. G. Lan, and R. Sensing, “Deep convolutional neural network-based robust phase gradient estimation for two-dimensional phase unwrapping using SAR interferograms,” IEEE Trans. Geosci. Remote Sensing 58(7), 4653–4665 (2020). [CrossRef]

4. G. Fornaro, G. Franceschetti, R. Lanari, and E. J. J. A. Sansosti, “Robust phase-unwrapping techniques: a comparison,” Journal of the Optical Society of America A 13(12), 2355–2366 (1996). [CrossRef]

5. R. M. Goldstein, H. A. Zebker, and C. L. J. R. S. Werner, “Satellite radar interferometry: Two-dimensional phase unwrapping,” Radio Sci. 23(4), 713–720 (1988). [CrossRef]

6. D. C. Ghiglia and L. A. J. J. A. Romero, “Minimum Lp-norm two-dimensional phase unwrapping,” J. Phys.: Conf. Ser. 13(10), 1999–2013 (1996). [CrossRef]

7. K. Wang, Y. Li, Q. Kemao, J. Di, and J. J. O. E. Zhao, “One-step robust deep learning phase unwrapping,” Opt. Express 27(10), 15100–15115 (2019). [CrossRef]

8. D. C. Ghiglia and M. D. J. A. W. I. P. Pritt, “Two-dimensional phase unwrapping: theory, algorithms, and software,” (1998).

9. D. Zoran, M. Chrzanowski, P.-S. Huang, S. Gowal, A. Mott, and P. Kohli, “Towards robust image classification using sequential attention models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2020), 9483–9492.

10. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), (2018), 801–818.

11. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Multi-stage progressive image restoration,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2021), 14821–14831.

12. G. Spoorthi, S. Gorthi, and R. K. S. S. J. I. S. P. L. Gorthi, “PhaseNet: A deep convolutional neural network for two-dimensional phase unwrapping,” IEEE Signal Process. Lett. 26(1), 54–58 (2019). [CrossRef]

13. T. Zhang, S. Jiang, Z. Zhao, K. Dixit, X. Zhou, J. Hou, Y. Zhang, and C. J. O. E. Yan, “Rapid and robust two-dimensional phase unwrapping via deep learning,” Opt. Express 27(16), 23173–23185 (2019). [CrossRef]

14. C. Wu, Z. Qiao, N. Zhang, X. Li, J. Fan, H. Song, D. Ai, J. Yang, and Y. J. B. O. E. Huang, “Phase unwrapping based on a residual en-decoder network for phase images in Fourier domain Doppler optical coherence tomography,” Biomed. Opt. Express 11(4), 1760–1771 (2020). [CrossRef]

15. H. H. Zeyada, M. S. Mostafa, M. M. Ezz, A. H. Nasr, H. M. J. T. E. J. O. R. S. Harb, and S. Science, “Resolving phase unwrapping in interferometric synthetic aperture radar using deep recurrent residual U-Net,” Egyptian Journal of Remote Sensing and Space Science 25(1), 1–10 (2022). [CrossRef]

16. Y. Qin, S. Wan, Y. Wan, J. Weng, W. Liu, and Q. J. A. O. Gong, “Direct and accurate phase unwrapping with deep neural network,” Appl. Opt. 59(24), 7258–7267 (2020). [CrossRef]

17. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.

18. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), 770–778.

19. K. Simonyan and A. J. A. P. A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” (2014).

20. G. Dardikman-Yoffe, D. Roitshtain, S. K. Mirsky, N. A. Turko, M. Habaza, and N. T. J. B. O. E. Shaked, “PhUn-Net: ready-to-use neural network for unwrapping quantitative phase images of biological cells,” Biomed. Opt. Express 11(2), 1107–1121 (2020). [CrossRef]

21. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), (2018), 3–19.

22. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, (Ieee, 2009), 248–255.

23. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, (Springer, 2014), 740–755.

24. V. Mnih, N. Heess, and A. J. A. I. N. I. P. S. Graves, “Recurrent models of visual attention,” 27(2014).

25. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 7132–7141.

26. F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual attention network for image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), 3156–3164.

27. D. Bahdanau, K. Cho, and Y. J. A. P. A. Bengio, “Neural machine translation by jointly learning to align and translate,” (2014).

28. Y. Mei, Y. Fan, Y. Zhang, J. Yu, Y. Zhou, D. Liu, Y. Fu, T. S. Huang, and H. J. A. P. A. Shi, “Pyramid attention networks for image restoration,” (2020).

29. Y. Xu, H.-K. Lam, and G. J. N. Jia, “MANet: A two-stage deep learning method for classification of COVID-19 from Chest X-ray images,” Neurocomputing 443, 96–105 (2021). [CrossRef]

30. X. Xiao, S. Lian, Z. Luo, and S. Li, “Weighted res-unet for high-quality retina vessel segmentation,” in 2018 9th international conference on information technology in medicine and education (ITME), (IEEE, 2018), 327–331.

31. F. Yu and V. J. A. P. A. Koltun, “Multi-scale context aggregation by dilated convolutions,” (2015).

32. Z. Wang, S. J. D. M. Ji, and K. Discovery, “Smoothed dilated convolutions for improved dense prediction,” 35(4), 1470–1496 (2021).

33. T. Ziegler, M. Fritsche, L. Kuhn, and K. J. A. P. A. Donhauser, “Efficient smoothing of dilated convolutions for image segmentation,” (2019).

34. Z. Zhang, Q. Liu, Y. J. I. G. Wang, and R. S. Letters, “Road extraction by deep residual u-net,” IEEE Geosci. Remote Sensing Lett. 15(5), 749–753 (2018). [CrossRef]

35. F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), 472–480.

36. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, (PMLR, 2015), 448–456.

37. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. J. I. T. O. I. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

38. P. Ferraro, C. Del Core, L. Miccio, S. Grilli, S. De Nicola, A. Finizio, and G. J. O. L. Coppola, “Phase map retrieval in digital holography: avoiding the undersampling effect by a lateral shear approach,” Opt. Lett. 32(15), 2233–2235 (2007). [CrossRef]

39. B. Bhaduri, H. Pham, M. Mir, and G. J. O. L. Popescu, “Diffraction phase microscopy with white light,” Opt. Lett. 37(6), 1094–1096 (2012). [CrossRef]

40. M. Takeda, H. Ina, and S. J. J. Kobayashi, “Fourier-transform method of fringe-pattern analysis for computer-based topography and interferometry,” J. Opt. Soc. Am. 72(1), 156–160 (1982). [CrossRef]

Method	Trial 1	Trial 2	Trial 3	Average
Method	M_SSIMs M_AUs V_AUs	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_AUs
VUR-Net	0.9964 90.85% 0.035	0.9962 90.52% 0.036	0.9964 91.22% 0.035	0.9963 90.86% 0.035
VUR-Net (SE)	0.9974 91.49% 0.034	0.9974 91.87% 0.033	0.9975 90.07% 0.036	0.9974 91.14% 0.034
VUR-Net (CBAM)	0.9975 90.35% 0.036	0.9975 92.30% 0.032	0.9976 92.50% 0.031	0.9975 91.71% 0.033
VDE-Net	0.9974 92.97% 0.031	0.9976 92.46% 0.031	0.9974 92.78% 0.031	0.9974 92.73% 0.031

Method	M_SSIMs	M_AUs	V_ AUs
VUR-Net	0.9969	94.10%	0.044%
VUR-Net (SE)	0.9974	94.80%	0.030%
VUR-Net (CBAM)	0.9970	95.16%	0.029%
VDE-Net	0.9984	96.81%	0.009%

Noise ratio	VUR-Net	VUR-Net (SE)	VUR-Net (CBAM)	VDE-Net
Noise ratio	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_AUs
0.03	0.9895 86.40% 0.051	0.9928 87.54% 0.050	0.9921 88.81% 0.056	0.9931 86.83% 0.048
0.06	0.9897 87.41% 0.049	0.9907 87.97% 0.050	0.9928 89.00% 0.056	0.9925 87.31% 0.047
0.09	0.9896 87.73% 0.052	0.9928 87.91% 0.032	0.9915 89.69% 0.057	0.9926 88.46% 0.047
0.12	0.9897 87.43% 0.052	0.9927 88.25% 0.053	0.9927 89.68% 0.055	0.9920 89.56% 0.046
0.15	0.9861 86.97% 0.050	0.9927 87.94% 0.050	0.9926 88.43% 0.057	0.9934 90.03% 0.046
0.18	0.9896 83.42% 0.052	0.9926 85.68% 0.052	0.9924 84.36% 0.057	0.9921 89.34% 0.048
0.21	0.9896 74.98% 0.064	0.9924 80.87% 0.063	0.9923 75.49% 0.072	0.9919 87.85% 0.050
0.24	0.9896 62.24% 0.096	0.9921 73.68% 0.080	0.9926 63.77% 0.104	0.9919 83.56% 0.060
Average	0.9892 82.07% 0.058	0.9924 84.98% 0.054	0.9924 83.65% 0.064	0.9924 87.87% 0.049

Undersampling ratio	VUR-Net	VUR-Net (SE)	VUR-Net (CBAM)	VDE-Net
Undersampling ratio	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_ AUs	M_SSIMs M_AUs V_AUs
10.89%	0.9977 96.16% 0.007	0.9984 96.00% 0.008	0.9989 97.59% 0.010	0.9988 98.78% 0.032%
16.28%	0.9983 96.07% 0.015	0.9985 95.34% 0.018	0.9990 97.07% 0.017	0.9988 97.93% 0.005
21.01%	0.9981 95.54% 0.010	0.9985 96.09% 0.006	0.9988 97.77% 0.004	0.9988 98.54% 0.002
25.94%	0.9985 96.78% 0.007	0.9987 97.29% 0.004	0.9993 98.96% 0.001	0.9990 99.10% 0.003%
30.79%	0.9979 95.07% 0.009	0.9982 95.82% 0.003	0.9988 97.22% 0.007	0.9985 98.41% 0.011%
35.02%	0.9980 95.91% 0.004	0.9981 96.03% 0.002	0.9990 98.53% 0.041%	0.9987 98.42% 0.024%
42.34%	0.9979 95.32% 0.005	0.9980 95.12% 0.003	0.9989 97.83% 0.002	0.9986 97.47% 0.002
50.08%	0.9970 90.70% 0.018	0.9975 93.04% 0.006	0.9980 93.66% 0.013	0.9981 95.67% 0.002
Average	0.9979 95.19% 0.009	0.9982 95.59% 0.006	0.9988 97.33% 0.007	0.9987 98.04% 0.001

Method	Params (million)	Running time (ms)
VUR-Net	51.25	6.90
VUR-Net (SE)	51.45	7.90
VUR-Net (CBAM)	51.44	10.2
VDE-Net	51.26	8.02

VDE-Net: a two-stage deep learning method for phase unwrapping

Abstract

1. Introduction

2. Proposed method

2.1 Calculation of the weighted jump edge

2.2 Architecture of the proposed VDE-Net

2.3 Network implementation details

2.4 Data preparation

3. Experimental results and discussion

3.1 Feasibility and accuracy test

3.2 Anti-noise performance test

3. 3 Anti-undersampling performance test

3. 4 generalization capability test

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (8)

Equations (8)

Optics Express

Method	M_SSIMs	M_AUs	V_ AUs
VUR-Net	0.9987	96.04%	0.298%
VUR-Net (SE)	0.9986	96.10%	0.338%
VUR-Net (CBAM)	0.9985	97.70%	0.047%
VDE-Net	0.9987	98.06%	0.033%

Method	M_SSIMs	M_AUs	V_ AUs
VUR-Net	0.2129	3.931%	0.029
VUR-Net (SE)	0.7874	13.01%	0.065
VUR-Net (CBAM)	0.6760	12.67%	0.038
VDE-Net	0.9649	68.26%	0.013

weighted factors	M_SSIMs	M_AUs
$f_{e}$ = 1.0	0.9968	90.74%
$f_{e}$ = 1.5	0.9976	92.50%
$f_{e}$ = 2.0	0.9969	91.41%
$f_{e}$ = 2.5	0.9971	92.01%