Remote sensing image cloud removal by deep image prior with a multitemporal constraint

Yuanpei Zhang; Ci Zhao; Ying Wu; Junqiang Luo

doi:10.1364/OPTCON.439671

1. Introduction

Optical remote sensing images contribute a lot to Earth observation tasks [1], such as ground classification [2,3] and ground change detection, due to the heavy spatial and spectral information they convey. However, missing information in remote sensing images, which is largely caused by clouds, severely goes against the effective utilization of images. According to [4], above half of the Earth is covered by clouds at any time. Reconstructing the missing information is of great importance for the further application of remote sensing images. Traditional cloud removal methods can be generally classified into two families: inpainting-based methods and multitemporal-based methods.

Traditional inpainting-based cloud removal methods, or the spatial-based methods mentioned in [5], fill the missing information with the remaining information. They can be further subclassified into four methods: interpolation methods [6,7], diffusion methods [8], patch-match methods [9,10], and sparse representation methods [11,12]. They attempt to estimate the missing pixels with surrounding pixels or copy the most similar patches from the remaining parts to the gaps. To be specific, traditional inpainting-based methods do not generative new information. In recent years, some generative inpainting methods [13] are proposed due to the development of Generative Adversarial Network. They [14] fill the missing areas with the generated information and can always obtain reconstruction results with semantic information. Compared with traditional inpainting methods, they can fill gaps with large sizes. However, both traditional and generative inpainting methods cannot guarantee the authority of reconstruction results.

Multitemporal-based cloud removal methods fill the missing areas with optical images from other periods as reference. According to [5], they can be further classified into two families which are respectively replacement methods [15,16] and learning methods [17–19]. Replacement methods [15,16] fill the missing gaps with patches from the reference image in a direct manner [15] or an indirect manner [16]. Learning methods [17,18] decompose the images into dictionaries and codes and recombine them to get the reconstruction results. Representative learning methods include compressed sensing [17] and sparse representation [18]. Multitemporal-based methods can fill gaps with large sizes and guarantee the spatial information accuracy of reconstruction results. However, they will not work if the gaps of reference and target images have overlapping areas.

Taking into consideration of the advantages and disadvantages of inpainting-based methods and multitemporal-based methods, we introduce Deep image prior, which is proposed in [20], into the cloud removal task and make full use of the remaining information of multitemporal optical images. First, a noise map serves as the input of a well-designed network to obtain an output. Then a loss function is formed between the output and target images with the assistance of the cloud mask. After optimizing the network, we will obtain the output as cloud removal results. The contribution of this paper is summarized as follows:

1. We propose a cloud removal method by introducing Deep Image Prior to the multitemporal cloud removal task. Compared with other methods, the method can make full use of the remaining information of multitemporal images, which will be presented in our experiments.
2. The proposed method overcome the drawbacks of both inpainting-based and existing multitemporal-based methods. Compared with inpainting methods, the proposed method can fill gaps with large sizes. Compared with multitemporal-based methods, the proposed method can deal with the situation where both multitemporal images have clouds that may even overlap with each other.
3. We conduct both simulation and real data experiments on several optical remote sensing images from different satellites. The experiment results demonstrate that the proposed method outperforms many other state-of-the-art methods.

Our paper is organized as follows: we introduce the related work which mainly contains inpainting-based methods, multitemporal-based methods, and Deep Image Prior in the second section. In the third section, we will present our method in detail. In the fourth section, simulation and real-data experiments are conducted and discussed. In the final section, we conclude our paper and discuss the potential future work.

2. Related work

2.1 Inpainting-based cloud removal methods

As is mentioned above, traditional inpainting-based cloud removal methods make use of geo-statistic rules to fill the cloud with the remaining information. They can be further classified into interpolation methods [6,7], diffusion-based methods [8], variation-based methods [11,12], and patch-matched methods [9,10]. Interpolation methods [6,7] add the surrounding pixels of missing pixels by weight to estimate the missing pixels. For example, Sampson, et al. [21] introduced Krigin interpolation algorithm to restore the PM2.5 maps corrupted by clouds. Seo, et al. [22] proposed co-Krigin interpolation algorithm to reconstruct the missing gaps of precipitation distribution maps. Diffusion-based methods remove the clouds by propagating the remaining pixels to the missing pixels with partial differential equations [8]. For example, Richard and Chang [23] introduced isotropic partial differential equations to reconstruct the missing pixels while Mendez-Rial, et al. [24] made use of anisotropic partial differential equations. Patch-match methods [9,10] attempt to find the most similar patches from the remaining areas and paste them to the cloud areas. A very popular patch-match method is PatchMatch [25]. Sparse representation methods establish the equations of cloudy images with some priors and solve the model to obtain the reconstruction results. Representative priors used in variation models are low-rankness [26]. Traditional inpainting methods deal well with clouds with small sizes but cannot work when it comes to clouds with large sizes. In recent years, some generative inpainting methods [13,27] based on generative adversarial network [28] have been proposed. Dong, et al. [13] introduced generative adversarial network to complete the sea temperature maps. Singh and Komodakis [29] made use of cycleGAN [30] to remove thin clouds in optical remote sensing images. Generative inpainting methods can fill the gaps with large sizes compared with those traditional inpainting methods but they need a large amount of data to train the network. In addition, both traditional and generative inpainting methods cannot guarantee the authority of reconstruction results because their results are based on the math/statistic rules.

2.2 Multitemporal-based cloud removal methods

In the multitemporal-based methods, optical remote sensing images from other periods serve as the reference image for the cloud removal of target images. Compared with inpainting-based methods, multitemporal-based methods can fill gaps with large sizes and guarantee the spatial information of reconstruction results. Practical multitemporal-based cloud removal methods can be mainly classified into two families. The first family is replacement methods [15,16]. For example, Zhang, et al. [15] attempted to find the most similar pixel from the time sequence of images and copied it to the missing pixels. Zhang, et al. [16] replaced the missing pixels with corresponding pixels from the reference image in an indirect manner. The second family is learning-based methods [17,18]. For example, Li, et al. [18] decomposed the multitemporal images to dictionaries, which contain the spectral information of images, and their corresponding codes, which have the spatial information of images. Then codes of the reference image and dictionary of the target image are recombined to obtain the final cloud removal results. Although multitemporal-based cloud removal methods can obtain satisfying results and guarantee the accuracy of spatial information, they cannot work if the reference images also have gaps.

2.3 Deep image prior

Deep image prior, which is proposed in [20], is an exceptional solution for many inverse problems such as super-resolution, denoising, and inpainting. Given the degradation model, deep image prior sets the noise map as input of network and constructs loss function between the output degraded images. After optimizing the network several times, we can obtain the output as the reconstruction results. Different from those deep learning methods, deep image prior does not need any training dataset in the network. However, if the inverse process is too hard, deep image prior may not obtain satisfying results. Moreover, deep image prior for inpainting can be classified to traditional inpainting methods. That means that it will not work when it comes to gaps with large sizes.

3. Method

Before we introduce our method, we give some notations of data for simplification. ${O_\textrm{1}} \in {{\mathbb R}^{H \times W \times C}}$ and ${O_\textrm{2}} \in {{\mathbb R}^{H \times W \times C}}$ denote optical remote sensing images obtained from ${t_1}$ and ${t_2}$ in the same area. H, W, and C respectively denote the height, width, and channel number of two images. ${M_1}$ and ${M_\textrm{2}}$ are their respective cloud masks where 1 means the remaining areas and 0 denotes the missing pixels. z means the map of random noise. The deep neural network we use in the method is denoted as ${G_\theta }$ where $\theta $ means the parameters of the network G.

3.1 Deep image prior

We first briefly introduce the application of deep image prior on inpainting tasks. Take ${O_1}$ and its corresponding mask ${M_1}$ as an example. With the random noise map z as the input of the network ${G_\theta }$, we first get an output $O \in {{\mathbb R}^{H \times W \times C}}$:

(1)$$O = {G_\theta }(z)$$

Then an energy function is constructed between O and ${O_\textrm{1}}$ with the cloud mask ${M_\textrm{1}}$ to find the optimal ${\theta ^\mathrm{\ast }}$ of network $G$:

(2)$$\theta \mathrm{\ast{=}\ }\mathop {\arg \min }\limits_\theta ||O - M \odot {O_1}||_2^{}$$

where ${\odot} $ denotes the operation of element product.

With the optimal parameter ${\theta ^\mathrm{\ast }}$, we can finally, get ${O^\mathrm{\ast }}$ as the cloud removal result.

(3)$${O^\mathrm{\ast }}\textrm{ = }{G_{{\theta ^\ast }}}(z)$$

However, same as the traditional inpainting methods, deep image prior cannot well fill the gaps with large sizes and guarantee the authority of cloud removal results.

3.2 Multitemporal cloud removal with deep image prior

Instead of using only one image, we introduce deep image prior to the cloud removal of multitemporal optical images ${O_1}$ and ${O_2}$. Again with the random noise map z as the input, we obtain an initial output from the network ${G_\theta }$:

(4)$$O\textrm{ = }{G_\theta }(z)$$

where the output $O \in {{\mathbb R}^{W \times H \times 2C}}$.

Then an energy function is constructed between O, ${O_1}$ and ${O_2}$ with the cloud masks ${M_1}$ and ${M_2}$ to obtain the optimal parameters ${\theta ^\mathrm{\ast }}$:

(5)$$\theta \mathrm{\ast{=}\ }\mathop {\arg \min }\limits_\theta ||O - {\Phi _3}({M_1},{M_2}) \odot {\Phi _3}({O_1},{O_2})||_2^{}$$

where ${\Phi _\textrm{3}}$ means the concatenation operation on the channel dimension.

With the optimal parameters ${\theta ^\mathrm{\ast }}$, we get the initial cloud removal results ${O^\ast }$:

(6)$${O^\mathrm{\ast }}\textrm{ = }{G_{{\theta ^\ast }}}(z)$$

Then ${O^\ast }$ is split in the third dimension to obtain $O_1^\ast $ and $O_2^\ast $ which are corresponding cloud removal results of ${O_1}$ and ${O_2}$:

(7)$$O_1^\ast ,O_2^\ast \textrm{ = }{\varphi _3}({O^\ast })$$

In this way, the final cloud removal results can fully make use of the remaining information from multitemporal images and will only depend on the overlapping cloud areas of multitemporal images We will introduce this phenomenon in the discussion part.

The termination of optimization process is determined by the loss function below:

(8)$$Loss = \frac{{sum(||M \odot (O - {O_1})||_2^{})}}{{sum(M)}}$$

When the $Loss$ is less than 0.02, the optimization process will terminate and we will obtain the optimal parameters.

3.3 Network structure

For the network structure, we follow the structure introduced in [31]. It is a U-net structure with 15 blocks. Each of the first seven blocks has a convolution operation whose stride is set as 2, a batch-normalization operation, and a ReLU operation. Each of the 8th to 14th blocks has a transpose convolution layer whose stride is set as 2, a batch-normalization operation, and a ReLU operation. Skip connection operations are conducted between the output of the n^th block and the (14-n)^th block to simultaneously make use of high-level information and low-level information. In the 15th block, there are a convolution operation and a ReLU operation. The main structure of the network is presented in Fig. 1 and the detailed parameters of the network are listed in Table 1.

Fig. 1. The framework of the proposed method

Download Full Size | PDF

Table 1. Network parameters.

View Table | View all tables in this article

4. Experiment

4.1 Settings

In the simulation experiment, we artificially draw the cloud masks. In the real data experiment, we make use of Fmask4.0 [32] to extract the cloud mask. The whole experiments are conducted on NVIDIA 1080Ti. Two state-of-the-art multitemporal-based methods are chosen as comparison methods in our experiments. They are respectively Modified Neighborhood Similar Pixel Interpolation (MNSPI) [33] and Weighted Linear Regression (WLR) [34].

To quantitatively evaluate the results of our method, we first denote the results and ground truth respectively as $RS$ and $GT$. Four commonly used evaluation indices are used to evaluate the cloud removal results. The first is peak signal to noise ratio (PSNR), the formula of which is described in Eq. (9):

(9)$$PSNR = 10 \cdot {\log _{10}}(\frac{{{{255}^2}}}{{\frac{1}{{WHC}}\sum\limits_{i = 1}^W {\sum\limits_{j = 1}^H {\sum\limits_{k = 1}^C {||R{S_{ijk}} - G{T_{ijk}}|{|^2}} } } }})$$

where W, H and C are width, height, and channel number of images. $R{S_{ijk}}$ is the pixel of the ith column and the jth row in band k on $RS$. The higher PSNR indicates the better result.

The second index we use is structure similarity index (SSIM), the formula of which is described in Eq. (10):

(10)$$SSI{M_k} = \frac{{(2\mu (R{S_k}) \cdot \mu (G{T_k}) + 0.01) \cdot (2\sigma (R{S_k},G{T_k}) + 0.03)}}{{(\mu {{(R{S_k})}^2} + \mu {{(G{T_k})}^2} + 0.01) \cdot (\sigma (R{S_k},R{S_k}) + \sigma (R{S_k},R{S_k}) + 0.03)}}$$

where $\mu $ is the mean value operation and $\sigma $ is the co-variance operation. $R{S_k}$ denotes the kth band of $RS$. $SSI{M_k}$ is the SSIM value of band k. The mean value of SSIM of all bands serves as the final SSIM. The higher SSIM indicates the better result.

The third index is correlation coefficient (CC). The formula of CC is described as Eq. (11):

(11)$$CC = \frac{{\sum\limits_{i = 1}^W {\sum\limits_{j = 1}^H {\sum\limits_{k = 1}^C {(R{S_{ijk}} - \mu (RS)) \cdot (G{T_{ijk}} - \mu (GT))} } } }}{{\sqrt {\sum\limits_{i = 1}^W {\sum\limits_{j = 1}^H {\sum\limits_{k = 1}^C {{{(R{S_{ijk}} - \mu (RS))}^2} \cdot {{(G{T_{ijk}} - \mu (GT))}^2}} } } } }}$$

A higher CC indicates a better result.

The fourth evaluation index is spectral angle mapper (SAM), whose formula is described as Eq. (12):

(12)$$SAM = {\cos ^{ - 1}}(\frac{{\mathop {R{S_{ij}}}\limits^ \to \cdot \mathop {G{T_{ij}}}\limits^ \to }}{{||\mathop {R{S_{ij}}}\limits^ \to ||\cdot ||\mathop {G{T_{ij}}}\limits^ \to ||}})$$

For SAM, a lower score indicates more accurate spectral information.

4.2 Simulation experiments

4.2.1 Landsat-8 simulation experiments

In this experiment, we crop 10 pairs of patches from ${t_1}$ and ${t_2}$ as our testing images. All images have the size of 256*256. For the sake of simplification, only the second, third and fourth bands are chosen for the experiment. The spatial resolution of images is 30m.

Figure 2 shows the simulation results of Landsat-8 images. Figure 2(a-b) displays the simulated cloudy image and the reference multitemporal image. Figure 2(c-e) are respectively results of WLR, MNSPI, and our method. Figure 2(f) presents the ground truth image. We magnify an area, which is a river, from the results for further evaluation. It can be seen from the magnified area that WLR and MNPSI just clone the spectral information from ${O_2}$ to cloud area of ${O_1}$, causing the spectral distortion of the magnified area. However, compared with two other methods, the proposed method can obtain results with the most similar spectral information in the river area with ground truth image, indicating that the proposed method does not just clone the information from ${O_2}$. Quantitative evaluation results are listed in Table 2. We mark the evaluation results with the best score in bald. The proposed method outperforms other methods by a large extent in terms of all four indices.

Fig. 2. Simulation experiment results of Landsat-8 data

Download Full Size | PDF

Table 2. Quantitative results of Landsat-8 simulation experiment results.

View Table | View all tables in this article

4.2.2 Sentinel-2 simulation experiments

10 pairs of patches, whose sizes are 256*256, are cropped respectively from Sentinel-2 images from ${t_1}$ and ${t_2}$ to test our method. Here we make use of the first, second, and third bands in the experiment for the sake of simplification. The spatial resolution of these images is 10m. It is a challenging task because the gap of two images’ acquiring time is large, causing significant ground information change.

Figure 3 presents the simulation results of Sentinel-2 images. Figure 3(a-b) are the simulated cloudy image and the reference multitemporal image. Figure 3(c-e) displays the results of WLR, MNSPI, and the proposed method. Figure 3(f) is the ground truth image. We again magnify a farmland area for further evaluation. The spectral information of the magnified area in the two periods is different from each other. MNSPI again just clones the patch from ${O_2}$ to the cloud areas. Although its result has sharp spatial information, and it cannot reflect the real-time spectral information in the result. WLR can obtain results with more accurate spectral information compared with MNSPI but the reconstruction area of its results is blurry. The proposed method can acquire results with both right spectral information and sharp spatial information compared with two other methods. The quantitative evaluation results are listed in Table 3. The proposed method outperforms the comparison methods to a large extent.

Fig. 3. Simulation experiment results of Landsat-8 data

Download Full Size | PDF

Table 3. Quantitative results of Sentinel-2 simulation experiment results.

View Table | View all tables in this article

4.3 Real-data experiment

In the real-data experiment, we assume that the reference multitemporal image has intact information. We make use of the real multitemporal Landsat-8 images from [35] as our testing data. The experiment results are displayed in Fig. 4. Figure 4(a-b) are the cloudy image and the reference multitemporal image. Figure 4(c) is the cloud mask. Figure 4(d-e) are the results of WLR and MNSPI. Figure 4(f) displays the result of our method. It can be seen from Fig. 4 that WLR and MNSPI can only obtain results with blur in the cloud removal areas. On the other hand, the proposed method not only restores the tiny texture but also gets accurate spectral information in the cloud removal results, largely outperforming the two comparison methods.

Fig. 4. Real data experiment results of Sentinel-2 data

Download Full Size | PDF

4.4 Overlapping

In this section, we test our method and comparison methods under the condition where both ${O_\textrm{1}}$ and ${O_\textrm{2}}$ have cloud areas. Moreover, the two cloud areas have overlapping areas, bringing a lot of challenges to the cloud removal work which can be viewed in Fig. 5(a-b). The cloud removal results of two comparison methods are displayed in Fig. 5(c-d) while the result of the proposed method is displayed in Fig. 5(e). The ground truth images are shown in Fig. 5(f). It can be seen from the results that the two comparison methods cannot remove clouds in the overlapping areas. On the other hand, the proposed method generates visually pleasant results which are very close to the ground truth. Quantitative evaluation of all methods on two images is listed in Table 4. The proposed method outperforms all comparison methods by a large extent in four indexes.

Fig. 5. Results of overlapping areas

Download Full Size | PDF

Table 4. Quantitative results of overlapping masks.

View Table | View all tables in this article

4.5 Time cost

In this section, we compare our method with other methods in terms of efficiency. The time cost of all methods on an image with a size of 256*256 is listed in Table 5. Although the proposed method obtains the results with the best accuracy, we have to spend much more time than the compared method to acquire the reconstruction results. Improving the efficiency of our method will be our future work.

Table 5. Time cost

View Table | View all tables in this article

4.6 Setting of hyper-parameter

In this section, we briefly introduce the setting of hyper-parameters. There is only one parameter in our method, which is the learning rate of network. Empirically speaking [20], the learning rate of deep image prior for inpainting task should be set between 0.1 and 0.01. We attempt to find the optimal learning rate within this range with an internal of 0.01. The PSNR value of different learning rates is listed in Table 6. We find that 0.04 is the optimal learning rate for reconstruction task of remote sensing images.

Table 6. Learning rate selection

View Table | View all tables in this article

5. Conclusion

In this paper, we introduce deep image prior to the cloud removal task of multitemporal optical remote sensing images. Different from those deep learning methods, the method in this paper can remove clouds by deep neural network without training datasets. Compared with inpainting-based cloud removal methods and traditional multitemporal-based cloud removal methods, the proposed method can at the same time guarantee the accuracy of spatial information and deal with overlapping cloud areas of multitemporal images. Moreover, the proposed method can make full use of the remaining information of multitemporal images and work in extreme conditions such as the remaining parts of two images do not overlap. Experiment results demonstrate that the proposed method can obtain results with higher quantitative evaluation and better visual inception compared with those traditional cloud removal methods.

Although the proposed method can obtain satisfying results, the cost is high calculation due to the long backward-propagation optimization process. In our future work, we will attempt to apply deep image prior to cloud removal tasks with multisource remote sensing images.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. Gao, J. Li, and M. Jiang, “Hyperspectral and Multispectral Image Fusion by Deep Neural Network in a Self-Supervised Manner,” Remote Sens. 13(16), 3226 (2021). [CrossRef]

2. B. T. Le and T and T. L. Ha, “Hyperspectral remote sensing image classification based on random average band selection and an ensemble kernel extreme learning machine,” Appl. Opt. 59(13), 4151–4157 (2020). [CrossRef]

3. D. A. LeMaster, S. Leung, and O. L. Mendoza-Schrock, “Joint object classification and turbulence strength estimation using convolutional neural networks,” Appl. Opt. 60(25), G40–G48 (2021). [CrossRef]

4. J. Ju and D. P. Roy, “The availability of cloud-free Landsat ETM+ data over the conterminous United States and globally,” Remote. Sens. Environ. 112(3), 1196–1211 (2008). [CrossRef]

5. H. Shen, X. Li, Q. Cheng, C. Zeng, G. Yang, H. Li, and L. Zhang, “Missing information reconstruction of remote sensing data: A technical review,” IEEE Geosci. Remote Sens. Mag. 3(3), 61–85 (2015). [CrossRef]

6. C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, and J. Verdera, “Filling-in by joint interpolation of vector fields and gray levels,” IEEE Trans. on Image Process. 10(8), 1200–1211 (2001). [CrossRef]

7. C. Yu, L. Chen, L. Su, M. Fan, and S. Li, “Kriging interpolation method and its application in retrieval of MODIS aerosol optical depth,” in 2011 19th International Conference on Geoinformatics, (IEEE, 2011), 1–6.

8. T. F. Chan and J. Shen, “Nontexture inpainting by curvature-driven diffusions,” J. Vis. Commun. Image Represent. 12(4), 436–449 (2001). [CrossRef]

9. A. A. Efros and T. K. Leung, “Texture synthesis by non-parametric sampling,” in Proceedings of the seventh IEEE international conference on computer vision, (IEEE, 1999), 1033–1038.

10. A. Criminisi, P. Pérez, and K. Toyama, “Region filling and object removal by exemplar-based image inpainting,” IEEE Trans. on Image Process. 13(9), 1200–1212 (2004). [CrossRef]

11. J. Shen and T. F. Chan, “Mathematical models for local nontexture inpaintings,” SIAM J. Appl. Math. 62(3), 1019–1043 (2002). [CrossRef]

12. A. Bugeau, M. Bertalmío, V. Caselles, and G. Sapiro, “A comprehensive framework for image inpainting,” IEEE Trans. on Image Process. 19(10), 2634–2645 (2010). [CrossRef]

13. J. Dong, R. Yin, X. Sun, Q. Li, Y. Yang, and X. Qin, “Inpainting of remote sensing SST images with deep convolutional generative adversarial network,” IEEE Geosci. Remote Sensing Lett. 16(2), 173–177 (2019). [CrossRef]

14. J. Gao, Q. Yuan, J. Li, H. Zhang, and X. Su, “Cloud removal with fusion of high resolution optical and SAR images using generative adversarial networks,” Remote Sens. 12(1), 191 (2020). [CrossRef]

15. J. Zhang, M. K. Clayton, and P. A. Townsend, “Functional concurrent linear regression model for spatial images,” J. Agric. Biol. Environ. Stat. 16(1), 105–130 (2011). [CrossRef]

16. J. Zhang, M. K. Clayton, and P. A. Townsend, “Missing data and regression models for spatial images,” IEEE Trans. Geosci. Remote Sensing 53(3), 1574–1582 (2015). [CrossRef]

17. L. Lorenzi, F. Melgani, and G. Mercier, “Missing-area reconstruction in multispectral images under a compressive sensing perspective,” IEEE Trans. Geosci. Remote Sensing 51(7), 3998–4008 (2013). [CrossRef]

18. X. Li, H. Shen, L. Zhang, H. Zhang, Q. Yuan, and G. Yang, “Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning,” IEEE Trans. Geosci. Remote Sensing 52(11), 7086–7098 (2014). [CrossRef]

19. J. Gao, Y. Yi, T. Wei, and H. Zhang, “Sentinel-2 Cloud Removal Considering Ground Changes by Fusing Multitemporal SAR and Optical Images,” Remote Sens. 13(19), 3998 (2021). [CrossRef]

20. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018), 9446–9454.

21. P. D. Sampson, M. Richards, A. A. Szpiro, S. Bergen, L. Sheppard, T. V. Larson, and J. D. Kaufman, “A regionalized national universal kriging model using Partial Least Squares regression for estimating annual PM2. 5 concentrations in epidemiology,” Atmos. Environ. 75, 383–392 (2013). [CrossRef]

22. D. J. Seo, W. F. Krajewski, and D. S. Bowles, “Stochastic interpolation of rainfall data from rain gages and radar using cokriging: 1. Design of experiments,” Water Resour. Res. 26(3), 469–477 (1990). [CrossRef]

23. M. Richard and M. Y.-S. Chang, “Fast digital image inpainting,” in Appeared in the Proceedings of the International Conference on Visualization, Imaging and Image Processing (VIIP 2001), Marbella, Spain, 2001), 106–107.

24. R. Mendez-Rial, M. Calvino-Cancela, and J. Martin-Herrero, “Anisotropic inpainting of the hypercube,” IEEE Geosci. Remote Sensing Lett. 9(2), 214–218 (2012). [CrossRef]

25. C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, “PatchMatch: A randomized correspondence algorithm for structural image editing,” ACM Trans. Graph. 28(3), 1–11 (2009). [CrossRef]

26. K. H. Jin and J. C. Ye, “Annihilating filter-based low-rank Hankel matrix approach for image inpainting,” IEEE Trans. on Image Process. 24(11), 3498–3511 (2015). [CrossRef]

27. A. Kuznetsov and M. Gashnikov, “Remote sensing image inpainting with generative adversarial networks,” in 2020 8th International Symposium on Digital Forensics and Security (ISDFS), (IEEE, 2020), 1–6.

28. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems27, (2014).

29. P. Singh and N. Komodakis, “Cloud-gan: Cloud removal for sentinel-2 imagery using a cyclic consistent generative adversarial networks,” in IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, (IEEE, 2018), 1772–1775.

30. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017), 2223–2232.

31. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.

32. S. Qiu, Z. Zhu, and B. He, “Fmask 4.0: Improved cloud and cloud shadow detection in Landsats 4–8 and Sentinel-2 imagery,” Remote. Sens. Environ. 231, 111205 (2019). [CrossRef]

33. X. Zhu, F. Gao, D. Liu, and J. Chen, “A modified neighborhood similar pixel interpolator approach for removing thick clouds in Landsat images,” IEEE Geosci. Remote Sensing Lett. 9(3), 521–525 (2012). [CrossRef]

34. C. Zeng, H. Shen, and L. Zhang, “Recovering missing pixels for Landsat ETM+ SLC-off imagery using multi-temporal regression analysis and a regularization method,” Remote. Sens. Environ. 131, 182–194 (2013). [CrossRef]

35. J. Gao, Q. Yuan, J. Li, and X. Su, “Unsupervised missing information reconstruction for single remote sensing image with Deep Code Regression,” Int. J. Appl. Earth Obs. Geoinf. 105, 102599 (2021). [CrossRef]

Encoder				Decoder
	In_channel	Out_channel	Kernel		In_channel	Out_channel	Kernel
En_Layer1	1	8	3×3	De_Layer7	256	256	3×3
En_Layer2	8	32	3×3	De_Layer6	512	256	3×3
En_Layer3	32	64	3×3	De_Layer5	512	128	3×3
En_Layer4	64	128	3×3	De_Layer4	256	64	3×3
En_Layer5	128	256	3×3	De_Layer3	128	32	3×3
En_Layer6	256	256	3×3	De_Layer2	64	32	3×3
En_Layer7	256	256	3×3	De_Layer1	32	16	3×3
				Output	16	6	3×3

Mean/std	PSNR	SSIM	CC	SAM
WLR	36.672/1.916	0.981/0.011	0.992/0.008	0.272/0.024
MNSPI	34.899/2.338	0.975/0.009	0.989/0.008	0.338/0.033
ours	36.834/1.645	0.983/0.006	0.993/0.005	0.269/0.015

Mean/std	PSNR	SSIM	CC	SAM
WLR	32.886/2.472	0.916/0.012	0.959/0.009	0.397/0.025
MNSPI	31.162/2.084	0.915/0.010	0.939/0.010	0.485/0.022
ours	34.371/1.807	0.933/0.007	0.972/0.006	0.333/0.019

Mean/std	PSNR	SSIM	CC	SAM
WLR	33.877/2.477	0.989/0.013	0.983/0.007	0.048/0.034
MNSPI	34.470/1.942	0.989/0.011	0.986/0.006	0.068/0.029
ours	38.324/1.574	0.990/0.007	0.995/0.003	0.031/0.021

Learning rate	PSNR	Learning rate	PSNR
0.10	34.15	0.05	36.60
0.09	34.98	0.04	36.83
0.08	35.35	0.03	36.67
0.09	35.98	0.02	36.34
0.06	36.25	0.01	35.92

Remote sensing image cloud removal by deep image prior with a multitemporal constraint

Abstract

1. Introduction

2. Related work

2.1 Inpainting-based cloud removal methods

2.2 Multitemporal-based cloud removal methods

2.3 Deep image prior

3. Method

3.1 Deep image prior

3.2 Multitemporal cloud removal with deep image prior

3.3 Network structure

4. Experiment

4.1 Settings

4.2 Simulation experiments

4.2.1 Landsat-8 simulation experiments

4.2.2 Sentinel-2 simulation experiments

4.3 Real-data experiment

4.4 Overlapping

4.5 Time cost

4.6 Setting of hyper-parameter

5. Conclusion

Disclosures

Data availability

References

Data availability

Cited By

Figures (5)

Tables (6)

Equations (12)

Optics Continuum