Learning-based denoising for polarimetric images

Xiaobo Li; Xiaobo Li; Haiyu Li; Haiyu Li; Yang Lin; Yang Lin; Jianhua Guo; Jingyu Yang; Huanjing Yue; Kun Li; Chuan Li; Zhenzhou Cheng; Zhenzhou Cheng; Haofeng Hu; Haofeng Hu; Tiegen Liu; Tiegen Liu

doi:10.1364/OE.391017

1. Introduction

Polarimetric imaging obtains the polarization information, which provides physical properties related to the object shape, surface roughness, and so on [1,2]. Therefore, it has been widely used in various applications including image recovery [3], object detection [4], biological imaging [5], and material classification [6,7]. In particular, these applications may rely on imaging for polarimetric parameters. However, some essential polarimetric parameters, such as the degree of polarization (DoP) and the angle of polarization (AoP), are derived from the measured intensities through nonlinear operators, which could amplify the noise of intensity measurement. Consequently, these polarimetric parameters are quite sensitive to the noise [8–10]. It means that the polarization information is easy to be flooded in the noise, which indeed influences the performance of polarimetric imaging. Therefore, the effective denoising method is significant for polarimetric imaging.

Various methods have been proposed and developed for image denoising, such as principal component analysis (PCA), K-times singular value decomposition (K-SVD), block-matching 3-D filtering (BM3D) [11–13], etc. These methods have been used for polarimetric imaging systems. However, they have two main drawbacks. Firstly, these methods are designed for noisy images with additive white Gaussian noise (AWGN), while the properties of the noise in practice are more complex and depend on many factors [14,15]. Therefore, these methods could not well handle the practical applications. Secondly, most of these methods depend on some prior knowledge and need to adjust some structure parameters manually, so these methods are not universally suitable for all materials and conditions [14–16].

Deep learning is a powerful technique based on neural networks with multiple layers of increasingly richer functionality. It performs particularly well in various fields, including image denoising [16–18]. These applications have verified that learning-based methods have distinct advantages in terms of extracting image structures and features [16]. Deep learning is thus more suitable for image recovery in complex and strong noise environment than other methods [19]. Therefore, applying this technique to remove the noise and extract the polarimetric features in polarimetric images is promising, especially for those polarimetric parameters which are more sensitive to the noise. However, to our knowledge, there is no learning-based denoising method for polarimetric images.

In this paper, we consider the Stokes imager as an example to demonstrate that employing deep neural networks can significantly suppress the noise in polarimetric images and enhance the image quality. In particular, we built a dataset containing a large number of polarimetric image pairs, which are captured in the real environment with different gain values and exposure times of the camera. This dataset is then trained by the proposed connected neural network named polarimetric denoising residual dense network (PDRDN). The most prominent advantage of the PDRDN method is that it learns rich hierarchical representations from residual dense blocks, in which all the features produced from preceding layers are fused by local feature fusion and concatenated by global feature fusion. Thus, the proposed method can rebuild the polarization information flooded in strong noisy background [19,20].

The paper is organized as follows. We introduce the model of polarimetric imaging in Section 2 and the network structure in Section 3. The experimental results based on the division of focal plane (DoFP) imager and the evaluations of network structure are presented in Section 4. We finally draw conclusions and perspectives of this work in Section 5.

2. Polarimetric imaging system based on Stokes measurement

Considering the polarimetric imaging system based on Stokes measurement in the presence of environmental noise, the noisy image ${{\textbf I}^n}({x,y} )$ is given by [21]:

(1)$${{\textbf I}^n}({x,y} )= W{\textbf S}({x,y} )+ {\textbf N}({x,y} )$$

where ${\textbf S}({x,y} )= {[{{s_0}({x,y} ),{s_1}({x,y} ),{s_2}({x,y} ),{s_3}({x,y} )} ]^T}$ denotes the Stokes vector map of the scene, and $({x,y} )$ is the location of the pixel. $W$ denotes the measurement matrix depending on the eigenstate of polarization state analysis in the polarimetric imaging system.${\textbf N}({x,y} )$ refers to the noise component of observation. In practice, ${\textbf I}({x,y} )= W{\textbf S}({x,y} )$, which corresponds to the noise-free image (ground truth) of the scene. Since the measurement matrix is known for the given polarimetric imaging system, we can obtain the estimator of the Stokes map by inversing the captured noisy image:

(2)$$\hat{{\textbf S}}({x,y} )= {W^ + }{{\textbf I}^n}({x,y} )= {W^ + }[{{\textbf I}({x,y} )+ {\textbf N}({x,y} )} ]$$

where “+” denotes the pseudo-inverse of matrix, and ${W^ + } = {({{W^T}W} )^{\textrm{ - 1}}}{W^T}$[21]. We assume that the variance of noise for the j^th intensity measurement is equal to $\sigma _j^2$. According to Eq. (2), we can calculate the estimation variance of each Stokes parameter as (the detailed demonstration is in Appendix):

(3)$$\textrm{Var}{[{\hat{{\textbf S}}({x,y} )} ]_i} = \sum\limits_j {{{({W_{ij}^ + } )}^2}\sigma _j^2}$$

These variances indeed depend on the measurement matrix. Besides, it can be seen that the influence of noise is passed from the intensity to Stokes parameters through a linear operator. Subsequently, we can obtain the images of other polarimetric parameters based on the inversed Stokes maps, such as the DoP and the AoP [1]:

(4)$$\begin{array}{ccc} {\textrm{DoP} = \frac{{\sqrt {s_1^2({x,y} )+ s_2^2({x,y} )+ s_3^2({x,y} )} }}{{{s_0}({x,y} )}}}&{\textrm{and}}&{\textrm{AoP} = \frac{1}{2}{{\tan }^{ - 1}}\left[ {\frac{{{s_2}({x,y} )}}{{{s_1}({x,y} )}}} \right]} \end{array}$$

where ${s_i}({x,y} )$ denotes the map of the i+1^th Stokes parameter. DoP and AoP are nonlinear combinations of these Stokes parameters. Therefore, they are more sensitive to the noise theoretically, because the nonlinear operators may amplify the influence of noise [8,9,21,22].

Denoising-convolutional-neural-network (DCNN) [19,20,23] has been successfully applied in denoising the traditional intensity images by training the residual map instead of the clean image. The goal of the proposed PDRDN method is to learn the difference between noisy observations and latent clean images, and thus to determine the residual mapping $\Delta {\textbf I}({x,y} )$ corresponding to the noise term ${\textbf N}({x,y} )$ from inputs. The denoised results can be deduced by [20]:

(5)$${\textbf I}({x,y} )= {{\textbf I}^n}({x,y} )- \Delta {\textbf I}({x,y} )$$

Subsequently, we can get the “clean Stokes maps” by inversing the denoised intensity images:

(6)$${\textbf S}({x,y} )= {W^ + }{\textbf I}({x,y} )= {W^ + }[{{{\textbf I}^n}({x,y} )- \Delta {\textbf I}({x,y} )} ]$$

and then the clean images of polarimetric parameters can be obtained, such as the images of DoP and AoP.

3. Network structure

In order to remove the noise in polarimetric images, we proposed the PDRDN which has been shown in Fig. 1(a). The PDRDN, which consists of four main components: shallow feature extraction (SFE), residual dense blocks (RDBs), dense feature fusion (DFF), and global residual learning (GRL), learns the residual mapping from input images. In particular, the RDB can make full use of hierarchical features from all the convolutional layers [19,20]. The proposed network combines dense connected layers with local feature fusion (LFF) by using the local residual learning in RDBs [shown in Fig. 1(b)]. The output of d-1^th RDB is directly connected to each layer in d^th RDB and contributes to the input of d+1^th RDB. The LFF can further enhance the flow of the information and the gradient. Besides, it can also further improve the ability of network representation, and thus results in a better performance [20]. DFF includes the concatenation layer followed by convolutional layers with sizes of 1×1 and 3×3. In practice, we used skip connection between the first convolutional layer of SFE and DFF to realize the GRL.

Fig. 1. The architectures of (a) Polarization denoising Residual Dense Network (PDRDN), (b) Residual Dense Block (RDB).

Download Full Size | PDF

In the present network, we define the loss function to learn the trainable parameters as the mean square error given by [16,24]:

(7)$$l(\Theta )= \frac{1}{N}\sum\limits_{i = 1}^N {{\bigg \Vert}{\Delta {{\textbf I}_i}({x,y;\Theta } )- [{{\textbf I}_i^n({x,y} )- W{{\textbf S}_i}({x,y} )} ]} {\bigg \Vert}_F^2}$$

where N denotes the number of input training images.$\Delta {{\textbf I}_i}({x,y;\Theta } )$ refers to the residual mapping learned from the trainable parameters $\Theta $ in the proposed network. ${{\textbf S}_i}({x,y} )$ is the Stokes maps corresponding to the i^th input image, and ${||\cdot ||_F}$ denotes the Frobenius matrix norm.

4. Experimental results and discussion

4.1. Preparation of training dataset

Training the network and evaluating the effectiveness of our method need a large dataset. In order to build this dataset conveniently, we employed the commercial linear division-of-focal-plane (DoFP) polarization camera. The linear DoFP camera can measure the linear Stokes vector with only one image acquisition because of its integration of micro-polarizer [11–13]. In this case, the polarimetric parameter DoP in Eq. (4) should be replaced by the degree of linear polarization (DoLP) given by:

(8)$$\textrm{DoLP} = \frac{{\sqrt {s_1^2({x,y} )+ s_2^2({x,y} )} }}{{{s_0}({x,y} )}}$$

Of course, all the steps in the presented method can be extended to such polarimetric systems with the division of time, the division of aperture, and the division of amplitude [4], and such full Stokes imagers composed with polarizers and retarders [25,26].

Each image pair in the dataset includes the ground truth and the corresponding noisy image. In practice, we captured these image pairs with different gain values and appropriately adjusted exposure times of the camera [14]. Especially, the images acquired at high gain and short exposure time with a high level of noise and low signal-to-noise ratio (SNR) are served as noisy inputs. Conversely, the images acquired at low gain and long exposure time, which are at a low level of noise and high SNR, are served as ground truths. Furthermore, in order to minimize the random noise and generate a substantially noise-free image, we sampled the same scene for 50 times and averaged them as the ground truth.

It needs to be clarified that our method requires a long time to get the ground truth, and thus it is difficult to obtain a large scale of dataset. This point is different from other methods based on the noisy images generated by numerically adding the Gaussian noise to the clean images. In this work, we built a dataset that includes 150 groups of image pairs with a spatial resolution of 2448×2048. In practice, under the unpolarized illumination, all these images were taken by a commercial DoFP imager (LUCID, PHX055S-PC), and they related to different objects with different materials such as metal, plastic, wood, fabric, etc. 120 groups of these image pairs were served as the training set, and the rest 30 groups were divided into validation and test sets. Besides, in order to achieve the universality of this work, all of the training set, validation set and test set contain various types of materials. In fact, 150 groups of images are not large enough for learning. We expanded the scale of the dataset by using a 64×64 pixels-window and flipping this window horizontally or vertically with a step of 32 pixels. We finally obtained a training dataset with a scale of about 137 thousand, which is enough for the learning.

Figure 2 shows a sample image pair of the metal coin. Figures 2(a) and (b) show the noisy image and its ground truth, and Fig. 2(c) shows the enlarged views of the marked regions (A-1 and A-2). Compared with the ground truth, there are many apparent “pocks” in the noisy image, which is caused by the “real noise” in the environment. The noisy images and ground truths corresponding to DoLP and AoP are also presented in Fig. 2(d). It can be seen that the details in both the noisy images of DoLP and AoP are difficult to be distinguished because they are flooded in the noisy background. This point is consistent with the theoretical analysis of Eq. (4) that the DoLP and AoP are more sensitive to the noise. Therefore, a solid denoising network must be valid for DoLP and AoP images.

Fig. 2. Sample image pair of the dataset: (a) noisy image, (b) ground truth, (c) enlarged views of the regions in (a) and (b), (d) the corresponding noisy images and ground truths of DoLP and AoP.

Download Full Size | PDF

4.2. Evaluation of the network structure

In practice, we split each noisy image captured by the DoFP camera into four sub-images corresponding to four different directions (${0^ \circ },{45^ \circ },{90^ \circ },{135^ \circ }$) of the polarizers. These four polarization images were packaged as a 3D input to retain the polarization correlations. Then this input was first cropped into small patches and then be sent to the network. All the weights of RDB are initialized by the MSRA method [27], while the other layers are randomly initialized with a standard deviation of 0.01. We used Adam optimizer [28] with the mini-batch size of 32 to update network parameters. The learning rate was initially set to 10⁻⁴ and exponentially decayed with a rate of 0.9. Finally, we trained our model 60 epochs with Nvidia RTX 2080Ti GPU.

The performances of different numbers of RDB blocks and the effects of DFF and GRL were investigated to evaluate the reasonability of the proposed network structure. A well-known objective metric commonly used in image restoration, peak signal-to-noise ratio (PSNR), is calculated by [18]:

(9)$$\textrm{PSNR}({{I_{\textrm{gt}}}({x,y} ),{I_{\textrm{dn}}}({x,y} )} )= 10{\log _{10}}\left( {\frac{{{{255}^2}}}{{\textrm{MSE}({{I_{\textrm{gt}}}({x,y} ),{I_{\textrm{dn}}}({x,y} )} )}}} \right)$$

where ${I_{\textrm{gt}}}({x,y} )$ and ${I_{\textrm{dn}}}({x,y} )$ represents the ground truth and the denoised result, respectively. The mean square error (MSE) between ${I_{\textrm{gt}}}({x,y} )$ and ${I_{\textrm{dn}}}({x,y} )$ is given by [24]:

(10)$$\textrm{MSE}({{I_{gt}}({x,y} ),{I_{dn}}({x,y} )} )= \frac{1}{{m \times n}}\sum\limits_{x = 1}^m {\sum\limits_{y = 1}^n {{{[{{I_{gt}}(x,y) - {I_{dn}}(x,y)} ]}^2}} }$$

where $m \times n$ is the size of the image.

We performed several experiments with different numbers ($d \in \{ 4,6,8,10\} $) of RDB blocks and compared their performances. For each network, the RDB block has 6 convolution layers, and each layer has 32 filters. The number of filters for SFE and DFF was set to 64. The average PSNR values of intensity, DoLP, and AoP calculated on the validation set are shown in Fig. 3(a). It can be seen that the network based on more RDB blocks has a better performance mainly because the network with a lager d goes deeper. However, the improvement is limited when the number of RDB blocks is over 8. Moreover, the converge curve keeps more stable with more RDB blocks. After making a trade-off between the performance and the training speed, the number of RDB blocks is set to 10 in the following experiments.

Fig. 3. The average values of PSNR for intensity, DoLP, and AoP on the validation set (a) with different numbers of RDB blocks, (b) for Net-1, Net-2, Net-3, and PDRDN.

Download Full Size | PDF

The baseline of the proposed network only consists of SFE, RDBs, and the followed $3 \times 3$ convolutional layer. In order to verify the effectiveness of the PDRDN network, we carried out ablation experiments on the performance of different combinations of DFF and GRL. In practice, we trained four networks: the baseline (Net-1), the baseline-DFF (Net-2), the baseline-GRL (Net-3), and the baseline-DFF-GRL (PDRDN). The average PSNRs for intensity, DoLP, and AoP on the validation set are shown in Table 1. According to Table 1, the baseline (Net-1) scores worst (PSNR=27.58 dB), while adding DFF (Net-2) or GRL (Net-3) to the baseline can improve the performance. Indeed, the PDRDN adding both DFF and GRL earns the highest PSNR.

Table 1. The average values of PSNR on validation set for Net-1, Net-2, Net-3, and PDRDN.

View Table | View all tables in this article

The convergence processes of these four networks on validation set are shown in Fig. 3(b). It is clear that the proposed PDRDN performs the best, which is consistent with the analysis in Table 1. Besides, the convergence curve of the PDRDN is more stable than other networks, which also means that simultaneously adding DFF and GRL improves the performance and stabilizes the training process.

4.3 Results and discussion

In order to analyze the effectiveness of the proposed learning-based denoising method, we present the corresponding results compared with the other three conventional methods in terms of both visual effects and objective quantified evaluations, including PCA-based denoising method [11], K-SVD-based method [12] and BM3D method [13].

4.3.1 Comparison results from visual effect

Firstly, let us focus on the scene of metal coin in Fig. 2. The denoised results are shown in Fig. 4, which present the denoising results for intensity, DoLP, and AoP images by PCA, K-SVD, BM3D, and PDRDN methods, respectively. In this work, the algorithm parameters for other methods were chosen as: PCA (the size of the variable block is 6×6, the size of the training block is 34×34), K-SVD (the block dimension is 8, the number of atoms in the dictionary is 256), and BM3D (the sliding step to process every next reference block is 3, the size of the reference block is 4×4). The first column of Fig. 4 corresponds to the noisy images and the last column to the ground truths. Compared with the intensity image in the first row, we found that the DoLP and AoP images in second and third rows are quite sensitive to the noise. Especially, the AoP information is severely degraded and flooded in the strong noisy background, and its details can hardly be distinguished.

Fig. 4. The denoised images for intensity, DoLP, and AoP by different methods.

Download Full Size | PDF

Overall, the PDRDN method has the best performances for intensity, DoLP and AoP images. By using the PDRDN method, all the details (including some weak information, such as the leaf vein) are well reconstructed, and the noise is significantly removed, especially for AoP image. Besides, it is worth noting that images reconstructed by PDRDN are visually superior to the ground truth, and the main reason is that the learning-based method is data-driven, whose result mainly depends on the scale of training data and the network architecture. The results above also verify that this method has distinct advantages in the environment with strong noise.

In order to give a clearer presentation about the performances for different methods, Fig. 5 shows the enlarged views of regions A and B marked in Fig. 4. Let us firstly focus on the results for intensity images in the first column of Fig. 5. It can be seen from regions A and B that the image details (such as the words “GUO” in region A and the leaf in region B) are severely lost after the PCA method, while the K-SVD method performs better than PCA method to some extent. Besides, for the denoised results, it seems that the BM3D method has a similar performance with our PDRDN method from the vision. However, in terms of the quantified index in Eq. (8), its PSNR value (28.64 dB) is slightly lower than ours (30.06 dB).

Fig. 5. The enlarged views of regions A and B in Fig. 4.

Download Full Size | PDF

Considering the reconstructions for DoLP and AoP images, the PCA method can reconstruct some details seemingly. However, compared with the ground truth, some details (such as the outline of the coin, the vein of the leaf) are nearly distorted. The K-SVD method has only a little improvement in visual effects but cannot provide more details. As is observed from the third row in Fig. 5, the profile of the object is well recovered, but some details are very weak. For example, the words “GUO” in region A and the leaf vein in region B cannot be distinguished in the denoised results by utilizing the K-SVD method. In addition, although the BM3D method can improve the visual quality and recover more details than the K-SVD method, there is still lots of residual noise (for example, in the upper left quadrant of region A) and the information somewhat distorted. Overall, all these three methods do not work or are limited when they used to denoising for DoLP and AoP images, and the reason is that DoLP and AoP are more sensitive to the noise. Most of the useful details are flooded in a strong noisy background. On the contrary, the proposed PDRDN method shows a powerful ability on denoising, and the reconstructed image details are well close to the ground truth, especially for AoP and DoLP images.

In order to verify the universality of the proposed method in terms of different materials, we carried out several groups of experiments on other commonly used materials, including plastics, wood, and fabrics. The denoised results of intensity, DoLP and AoP images reconstructed by PDRDN are shown in Fig. 6. It should be noted that these first three objects shown in Fig. 6 are not contained in the training and validation sets, while the corresponding materials are in the training set. Figure 6 shows that the proposed method can suppress the noise of these materials and show high competition in the reconstruction of polarization information, which also verifies the robustness of the proposed method. Since AoP images are much more sensitive to the noise, most of the details (such as the numbers in plastic rule, the words in wood, and the outline of the fabric rabbit) are flooded in the strong noisy background. It is a pleasant surprise that by performing the proposed learning-based method, the scenes can be clearly recognized in the denoised images. Moreover, we also present the ground truths of intensity, DoLP, and AoP in Fig. 6 for comparison. It can be seen from Fig. 6 that the information in denoised results is highly similar to that in ground truths, which means that the proposed method can accurately rebuild the polarization information of the scene.

Fig. 6. Denoised results for the other four materials: wood (first row), fabric (second row), plastic (third row) and rubber (last row).

Download Full Size | PDF

Besides, we also show the denoised result for another object in the last row in Fig. 6. This object is made of rubber, and this material does not appear in the training set. The denoised results verified that the proposed method performs better for this case. Therefore, it seems that the proposed method is also valid for such materials that do not appear in the training set. Of course, a series of experiments for other different materials (such as stonework, fiber-products, etc.) can be performed to verify further this point, which is indeed an interesting perspective.

In addition to verifying the effectiveness of our method for different materials, how the proposed network will perform at different noise levels is an important issue. We checked this issue by considering the following two examples in Fig. 7, whose PSNRs are 19.36 dB (bridge) and 30.04 dB (cup), respectively. To some extent, they have different noise levels. In the denoised images in Fig. 7, the noise is significantly removed and the details can be clearly recognized, which shows that the proposed method is useful at different noise levels.

Fig. 7. Denoised results for the other two examples with different levels of noise.

Download Full Size | PDF

Besides, the images in our dataset are captured in the real environment and have an uneven noise distribution in different regions of the scene. For example, the following two regions in the DoLP image of the bridge have different levels of noise (with PSNR values of 7.05 dB and 9.38 dB, respectively). The results of the comparison are shown in Fig. 8. We can find that our method is useful for these two regions at the same time. All the results above verify the effectiveness of the present method in terms of different noise levels.

Fig. 8. Denoised results for two regions (in the DoLP image of bridge) with different levels of noise.

Download Full Size | PDF

Of course, setting a dataset contains various images with relatively quantified noise levels and investigating the corresponding performance are necessary. It is also worth noting that the original scale of the dataset is not large enough. In addition to expanding this scale, we can also take cross-validation to increase the precision of the model further and avoid the random-perturbation of different training images [29].

4.3.2 Comparison results from quantified metrics

We also compared the proposed method with the methods above in terms of objective quantified metrics on the test set corresponding to different materials. Table 2 shows the average values of PSNR on intensity, DoLP, and AoP images, respectively. It indicates that PDRDN earns the highest score compared with other methods. In particular, compared with the corresponding noisy images, the average value of PSNR can be improved to 160% for DoLP denoised results and to 93% for AoP denoised results.

Table 2. Average PSNRs of denoised results for intensity, DoLP and AoP on test set.

View Table | View all tables in this article

The structural similarity index measure (SSIM) is another common quantified metric to characterize the similarity between the denoised image and the ground truth. It is given by [30]:

(11)$$\textrm{SSIM}({I_{gt}},{I_{dn}}) = \frac{{({2{\mu_{{I_{gt}}}}{\mu_{{I_{dn}}}} + {C_1}} )({2{\sigma_{{I_{gt}}{I_{dn}}}} + {C_2}} )}}{{({\mu_{{I_{gt}}}^2 + \mu_{{I_{dn}}}^2 + {C_1}} )({\sigma_{{I_{gt}}}^2 + \sigma_{{I_{dn}}}^2 + {C_2}} )}}$$

where $\mu $ refers to the mean intensity value of the image. ${\sigma _{{I_{gt}}{I_{dn}}}}$ is the covariance of images ${I_{gt}}$ and ${I_{dn}}$. ${\sigma ^\textrm{2}}$ denotes the variance of image. C₁ and C₂ are two constants introduced to avoid being divided by zero.

Table 3 shows the average SSIMs on the test set for intensity, DoLP, and AoP images with different methods. The proposed PDRDN method achieves higher SSIM values than other methods. In particular, compared with the noisy images, the average SSIMs of DoLP and AoP images can be significantly enhanced to about nine and three times, respectively. In addition, after training, the proposed learning-based method only requires 1.6s on average (in python) in the test set to get the denoised result, which is faster than other methods (about 138s for PCA, 64s for K-SVD and 48s for BM3D in MATLAB). All the quantified results above indicate that the proposed method is highly effective for denoising the DoLP and AoP images.

Table 3. Average SSIMs of denoised results for intensity, DoLP and AoP on test set.

View Table | View all tables in this article

5. Conclusion

In this paper, we have built a dataset for denoising of Stokes polarimetric images, in which the noisy images and the ground truths are acquired by adjusting the level of gains and the exposure time of the linear DoFP polarization camera. Based on this dataset, we have proposed a denoising method (PDRDN) based on the residual dense network for polarimetric images. The experimental results show that the performance of the PDRDN method is better than other methods. In particular, the DoLP and AoP images can also be well reconstructed, although they are quite sensitive to the noise. By taking several groups of experiments with different materials or different noise levels, we have verified the universality and the effectiveness of the denoising performances for intensity, DoLP and AoP images in both visual effect and objective quality metrics.

The proposed learning-based method is not confined to the particular case of Stokes imagers, and it can be applied in principle to any polarimetric imager, such as Mueller imagers [31] and orthogonal states contrast (OSC) polarimetric imagers [32]. Besides, the physical meanings are not apparent in traditional learning-based methods. Therefore, another interesting perspective of this work is embedding the physical model and prior knowledge of polarization optics and noise into the networks to guide and constrain the learning process. In this way, the learning-based methods may contain more physical meanings and lead to better performance.

Appendix: Explanation for Eq. (3)

According to Eq. (2), we can get the estimation variance of this estimator as:

(12)$$\textrm{Var}[{\hat{{\textbf S}}({x,y} )} ]= \textrm{Var}[{{W^ + }{\textbf I}({x,y} )+ {W^ + }{\textbf N}({x,y} )} ]$$

Consider that ${\textbf I}({x,y} )$ denotes the real value of the intensity, and it is obvious that the covariance matrix of ${\textbf I}({x,y} )$, named $\Gamma ({{\textbf I}({x,y} )} )$, is a zero matrix. Therefore, one has:

(13)$$\begin{array}{ll} \textrm{Var}[{\hat{{\textbf S}}({x,y} )} ]&= \textrm{Var}[{{W^ + }{\textbf N}({x,y} )} ]\\ &= {W^ + } \cdot \Gamma ({{\textbf N}({x,y} )} )\cdot {({{W^ + }} )^T}\\ &= diag\left( {\begin{array}{cccc} {\sum\limits_j {{{({W_{\textrm{1}j}^ + } )}^2}\sigma_j^2,} }&{\sum\limits_j {{{({W_{\textrm{2}j}^ + } )}^2}\sigma_j^2,} }&{\sum\limits_j {{{({W_{\textrm{3}j}^ + } )}^2}\sigma_j^2,} }&{\sum\limits_j {{{({W_{\textrm{4}j}^ + } )}^2}\sigma_j^2} } \end{array}} \right) \end{array}$$

where $diag({\cdot} )$ denotes a diagonal matrix and $\Gamma ({{\textbf N}({x,y} )} )$ denotes the covariance matrix of the noise. $\Gamma ({{\textbf N}({x,y} )} )$ is also a diagonal matrix with element of $\sigma _j^2$, which is the variance of the j^th noise. According to Eq. (13), we can calculate the estimation variance of each Stokes parameter as:

(14)$$\textrm{Var}{[{\hat{{\textbf S}}({x,y} )} ]_i} = \sum\limits_j {{{({W_{ij}^ + } )}^2}\sigma _j^2}$$

These variances indeed depend on the measurement matrix and the influence of the noise is passed from the intensity to the Stokes parameters according to a linear operator.

Funding

National Natural Science Foundation of China (61775163).

Acknowledgments

Haofeng Hu and Xiaobo Li thank François Goudail for fruitful discussions.

Disclosures

The authors declare no conflicts of interest.

References

1. D. Goldstein, Polarized Light (Dekker, 2003).

2. B. Huang, T. Liu, H. Hu, J. Han, and M. Yu, “Underwater image recovery considering polarization effects of objects,” Opt. Express 24(9), 9826–9838 (2016). [CrossRef]

3. X. Li, H. Hu, L. Zhao, H. Wang, Y. Yu, L. Wu, and T. Liu, “Polarimetric image recovery method combining histogram stretching for underwater imaging,” Sci. Rep. 8(1), 12430 (2018). [CrossRef]

4. E. Garcia-Caurel, R. Ossikovski, M. Foldyna, A. Pierangelo, B. Drévillon, and A. D. Martino, “Advanced Mueller ellipsometry instrumentation and data analysis,” in Ellipsometry at the Nanoscale, M. Losurdo and K. Hingerl, eds. (Springer, 2013), Chap. 2.

5. J. Qi, C. He, and D. S. Elson, “Real time complete Stokes polarimetric imager based on a linear polarizer array camera for tissue polarimetric imaging,” Biomed. Opt. Express 8(11), 4933–4946 (2017). [CrossRef]

6. Z. Guan, F. Goudail, M. Yu, X. Li, Q. Han, Z. Cheng, H. Hu, and T. Liu, “Contrast optimization in broadband passive polarimetric imaging based on color camera,” Opt. Express 27(3), 2444–2454 (2019). [CrossRef]

7. M. Wan, G. Gu, W. Qian, K. Ren, and Q. Chen, “Stokes-vector-based polarimetric imaging system for adaptive target/background contrast enhancement,” Appl. Opt. 55(21), 5513–5519 (2016). [CrossRef]

8. A. Carnicer and B. Javidi, “Polarimetric 3D integral imaging in photon-starved conditions,” Opt. Express 23(5), 6408–6417 (2015). [CrossRef]

9. N. Hagen and Y. Otani, “Stokes polarimeter performance: general noise model and analysis,” Appl. Opt. 57(15), 4283–4296 (2018). [CrossRef]

10. X. Li, H. Hu, T. Liu, B. Huang, and Z. Song, “Optimal distribution of integration time for intensity measurements in degree of linear polarization polarimetry,” Opt. Express 24(7), 7191–7200 (2016). [CrossRef]

11. J. Zhang, H. Luo, R. Liang, W. Zhou, B. Hui, and Z. Chang, “PCA-based denoising method for division of focal plane polarimeters,” Opt. Express 25(3), 2391–2400 (2017). [CrossRef]

12. W. Ye, S. Li, X. Zhao, A. Abubakar, and A. Bermak, “A K times singular value decomposition based image denoising algorithm for DoFP polarization image sensors with Gaussian noise,” IEEE Sens. J. 18(15), 6138–6144 (2018). [CrossRef]

13. A. Abubakar, X. Zhao, S. Li, M. Takruri, E. Bastaki, and A. Bermak, “A Block-Matching and 3-D filtering algorithm for Gaussian noise in DoFP polarization images,” IEEE Sens. J. 18(18), 7429–7435 (2018). [CrossRef]

14. T. Plotz and S. Roth, “Benchmarking denoising algorithms with real photographs,” in IEEE Conference on Computer Vision and Pattern Recognition. 1586–1595 (2017).

15. J. Xue, Y. Zhao, W. Liao, and J. Chan, “Nonlocal low-rank regularized tensor decomposition for hyperspectral image denoising,” IEEE Trans. Geosci. Remote Sens. 57(7), 5174–5189 (2019). [CrossRef]

16. G. Barbastathis, A. Ozcan, and G. H. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

17. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

18. S. Feng, Q. Chen, G. Gu, T. Tao, L. Zhang, Y. Hu, W. Yin, and C. Zuo, “Fringe pattern analysis using deep learning,” Adv. Photonics 1(2), 1–7 (2019). [CrossRef]

19. D. W. Kim, J. Ryun Chung, and S. W. Jung, “GRDN: Grouped Residual Dense Network for Real Image Denoising and GAN-based Real-world Noise Modeling,” in CVPR Workshops (2019).

20. Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image restoration,” http://arxiv.org/abs/1812.10477 (2019).

21. X. Li, T. Liu, B. Huang, Z. Song, and H. Hu, “Optimal distribution of integration time for intensity measurements in Stokes polarimetry,” Opt. Express 23(21), 27690–27699 (2015). [CrossRef]

22. S. Roussel, M. Boffety, and F. Goudail, “On the optimal ways to perform full Stokes measurements with a linear division-of-focal-plane polarimetric imager and a retarder,” Opt. Lett. 44(11), 2927–2930 (2019). [CrossRef]

23. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

24. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: an end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]

25. X. Li, H. Hu, F. Goudail, and T. Liu, “Fundamental precision limits of full Stokes polarimeters based on DoFP polarization cameras for an arbitrary number of acquisitions,” Opt. Express 27(22), 31261–31272 (2019). [CrossRef]

26. F. Goudail, X. Li, M. Boffety, S. Roussel, T. Liu, and H. Hu, “Precision of retardance autocalibration in full-Stokes division-of-focal-plane imaging polarimeters,” Opt. Lett. 44(22), 5410–5413 (2019). [CrossRef]

27. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision, 1026–1034 (2015).

28. D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” http://arxiv.org/abs/1412.6980 (2014).

29. V. B. Semwal, K. Mondal, and G. C. Nandi, “Robust and accurate feature selection for humanoid push recovery and classification: deep learning approach,” Neural Comput. Appl. 28(3), 565–574 (2017). [CrossRef]

30. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

31. F. Carmagnola, J. M. Sanz, and J. M. Saiz, “Development of a Mueller matrix imaging system for detecting objects embedded in turbid media,” J. Quant. Spectrosc. Radiat. Transfer 146, 199–206 (2014). [CrossRef]

32. M. Alouini, F. Goudail, N. Roux, L. Le Hors, P. Hartemann, S. Breugnot, and D. Dolfi, “Active spectro-polarimetric imaging: signature modeling, imaging demonstrator and target detection,” Eur. Phys. J.: Appl. Phys. 42(2), 129–139 (2008). [CrossRef]

	Noisy image	PCA	K-SVD	BM3D	PDRDN
Intensity	26.062	30.134	33.438	37.544	41.512
DoLP	12.276	21.168	25.917	27.652	31.897
AoP	6.348	7.150	8.481	8.978	12.257

	Noisy image	PCA	K-SVD	BM3D	PDRDN
Intensity	0.576	0.683	0.817	0.920	0.944
DoLP	0.092	0.421	0.624	0.785	0.851
AoP	0.060	0.063	0.088	0.126	0.210

	Noisy image	PCA	K-SVD	BM3D	PDRDN
Intensity	26.062	30.134	33.438	37.544	41.512
DoLP	12.276	21.168	25.917	27.652	31.897
AoP	6.348	7.150	8.481	8.978	12.257

	Noisy image	PCA	K-SVD	BM3D	PDRDN
Intensity	0.576	0.683	0.817	0.920	0.944
DoLP	0.092	0.421	0.624	0.785	0.851
AoP	0.060	0.063	0.088	0.126	0.210

Learning-based denoising for polarimetric images

Abstract

1. Introduction

2. Polarimetric imaging system based on Stokes measurement

3. Network structure

4. Experimental results and discussion

4.1. Preparation of training dataset

4.2. Evaluation of the network structure

4.3 Results and discussion

4.3.1 Comparison results from visual effect

4.3.2 Comparison results from quantified metrics

5. Conclusion

Appendix: Explanation for Eq. (3)

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (8)

Tables (3)

Equations (14)

Optics Express

	Net-1	Net-2	Net-3	PDRDN
DFF	×	√	×	√
GRL	×	×	√	√
PSNR	27.58	28.32	28.38	28.52

	Net-1	Net-2	Net-3	PDRDN
DFF	×	√	×	√
GRL	×	×	√	√
PSNR	27.58	28.32	28.38	28.52