Data-driven framework for high-accuracy color restoration of RGBN multispectral filter array sensors under extremely low-light conditions

Yanpeng Cao; Yanpeng Cao; Bowen Zhao; Bowen Zhao; Xi Tong; Xi Tong; Jian Chen; Jian Chen; Jiangxin Yang; Jiangxin Yang; Yanlong Cao; Yanlong Cao; Xin Li

doi:10.1364/OE.426940

1. Introduction

It is highly desirable to build multispectral imaging systems for acquiring complementary information of scenes/objects in the visible (400-700 nm) and near-infrared (700-1100 nm) spectral bands to facilitate various optical applications such as night vision [1], spectral analysis [2], precision measurement [3], photogrammetric enhancement [4], scene classification [5] and health monitoring [6]. It is noted that many existing solutions deploy two well-calibrated individual sensors and a beam splitter for simultaneous acquisition of red-green-blue (RGB) and near-infrared (NIR) images [7,8]. However, such dual-camera designs will not only incur more hardware cost but also involve complex calibration processes and adversely decrease the overall stability of optical systems.

In recent years, multispectral filter array sensor has attracted increasing attention as a promising approach to simultaneously capture high-quality and well-aligned RGB and NIR images during one-shot acquisition [9,10]. Fig. 1(a) shows the imaging process and spectral sensitivity of a standard Bayer color filter array (CFA) sensor [11]. Note the silicon-based sensors are sensitive to the NIR bands up to 1100 nm. By removing the infrared cut-off filter (IRCF) and replacing one G filter with one N filter, multispectral information of both visible and NIR bands can be simultaneously acquired by a single multispectral filter array sensor [9]. The imaging process and spectral sensitivity of a RGBN multispectral sensor are shown in Fig. 1(b).

Fig. 1. Image acquisition process and spectral sensitivity: (a) A standard Bayer CFA sensor. (b) The RGBN multispectral sensor.

Download Full Size | PDF

In order to achieve spectral sensitivity in the NIR band, the IRCFs are removed in RGBN multispectral filter array sensors. As the consequence, signal intensities of R, G, B channels are inevitably distorted by the undesirable contribution of NIR bands. More specifically, the pixel values of R, G, B channels will become larger and the captured raw images will present an increase of overall brightness and a decrease of color saturation. This phenomenon is typically referred to as spectral crosstalk [12] which will adversely desaturate colors in the captured RGB images. Therefore, color correction/restoration provides an indispensable component for NIR spectral crosstalk compensation in high-quality RGBN multispectral imaging systems [13].

Previously, a number of RGBN multispectral image color restoration methods have been presented. They can be divided into two major categories including color correction matrix (CCM) -based methods and deep learning-based ones. Chen et al. presented the first CCM-based color restoration solution working for RGBN sensors [12]. A $3\times 19$ CCM matrix base on polynomial color correction (PCC) is calculated to restore color information by imposing quadratic and cubic polynomial constraints. Based on the assumption that the overall scene illumination remains constant, Monno et al. extended linear color correction (LCC) [14], polynomial color correction (PCC) [15] and Root-polynomial color correction (RPCC) [16] methods for RGBN multispectral image color restoration [17]. Park et al. propose a spectral decomposition-based CCM method by dividing the NIR bands into chromatic and achromatic bands [18]. Inspired by the recent success of deep convolutional neural networks (CNN) in various machine vision tasks [19–21], Aguilera et al. presented the first deep learning-based method for RGBN multispectral color restoration, implementing a simple network structure consisting of fully connected layers to learn the mapping of distorted RGB image and color-corrected ones [22]. Soria et al. proposed two different light-weight CNN models called ENDE-Net and CD-Net for RGBN image color restoration during daytime [23]. Han et al. presented a data augmentation method to generate RGBN multispectral images for training a U-Net color restoration model [24]. However, the above-mentioned methods cannot well handle extreme color desaturation and severe noise disturbance which commonly occur when RGBN multispectral sensors are utilized in low-light conditions (e.g., security surveillance and night vision) [25,26].

High-accuracy color restoration of desaturated RGB images captured under extremely low-light conditions (below 1 lux) represents a challenging task for two major reasons as illustrated in Fig. 2. First of all, the absence of visible light makes the responses collected in the RGB channels mostly caused by undesirable NIR spectral crosstalk. Since the spectral sensitivity curves of RGB channels are almost identical over 850 nm, the pixel values of RGB channels are almost identical and the captured color images are significantly desaturated to gray ones, as illustrated in Fig. 2(a). Secondly, it is typically required to increase amplifier gains of RGBN multispectral sensors to enhance the weak visible signals for image acquisition in low-light scenes. However, such practice will also adversely aggravate the undesired NIR spectral crosstalk and cause severe noise disturbance in the captured images, as shown in Fig. 2(b).

Fig. 2. Two major challenges of high-accuracy color restoration under extremely low-light conditions: (a) Extreme desaturation. (b) Severe noise disturbance.

Download Full Size | PDF

To address the above-mentioned challenges, we present a data-driven framework for effective spectral crosstalk compensation of RGBN multispectral filter array sensors and demonstrate its effectiveness for high-accuracy color restoration given extremely desaturated and noisy RGB images captured under low-light conditions (below 1 lux). Fig. 3 shows the overall pipeline of our proposed color restoration method including RGBN multispectral image acquisition and color restoration model training. First of all, we set up a multispectral image acquisition system which consists of a multispectral filter array sensor, a controllable visible light source, a NIR light source (850 nm), and an illuminometer, to capture RGB and NIR image pairs under various illuminations. Moreover, we design an end-to-end convolutional neural network (CNN) model and present a joint training scheme for simultaneous noise reduction and color restoration of RGB images captured under low-light conditions. The proposed framework incorporates the captured RGBN multispectral images as inputs and learns the complex mapping relationship between RGB images with and without NIR spectral crosstalk in the presence of severe color desaturation and noise disturbance. As the result, the proposed methodology can achieve high-accuracy color restoration of images captured using RGBN multispectral filter array sensors under extremely low-light conditions. The contributions of this paper are summarized as follows:

Fig. 3. The overall pipeline of our proposed color restoration method including RGBN multispectral image acquisition and color restoration model training.

Download Full Size | PDF

(1) We present a multispectral image acquisition setup based on RGBN multispectral filter array sensors for capturing well-aligned RGB (with and without NIR spectral crosstalk) and NIR image pairs under controllable illumination conditions (between 0 lux and 200 lux). The newly built multispectral dataset could be utilized to facilitate the development of solutions for color restoration, demosaicing, denoising, and multispectral image fusion.

(2) We design a complete CNN-based framework that integrates two important modules including noise reduction and color restoration to process extremely desaturated and noisy RGB images captured under low-light conditions (below 1 lux). Moreover, we present an effective technique to generate noise-free images (RGB and NIR) and a task-specific joint loss function to facilitate the training of the proposed multi-task CNN model.

(3) We set up indoor and outdoor experiments to verify the effectiveness of the proposed methodology. Compared with the state-of-the-art calibration-based and deep learning-based color restoration solutions [12,16,17,23,24], our proposed method exhibits good generalization properties and achieves significantly improved color restoration results for images captured using RGBN multispectral filter array sensors under various illumination conditions.

2. Multispectral image acquisition system and dataset

In this section, we build a multispectral image acquisition system including a high-resolution RGBN multispectral filter array sensor and controllable visible and NIR light sources to capture well-aligned RGB (with and without NIR spectral crosstalk) and NIR image pairs under various illumination conditions.

2.1 Hardware configuration

Our image acquisition system consists of a RGBN camera built upon a multispectral filter array sensor, a controllable visible light source, a 850 nm NIR light source, and an illuminometer. The exposure time of the RGBN camera is set to 30 ms. The visible light source can be manually adjusted to simulate various illumination conditions between 0 lux and 200 lux by referring to the readings of the illuminometer. The NIR light source is placed 1 m away from the targets for uniform illumination. The schematic illustration of our multispectral imaging system is shown in Fig. 4.

Fig. 4. The experimental setup of our multispectral image acquisition system.

Download Full Size | PDF

The core of this multispectral image acquisition system is a $1920\times 1080$ RGBN multispectral filter array sensor in which the IRCF is removed and half of the G filters are replaced with the NIR filters. As illustrated in Fig. 5, the multispectral filter array sensor contains a number of $2\times 2$ patterned filters which are sensitive to R, G, B, and NIR bands. The captured RAW images are firstly decomposed into sub-sampled images and further interpolated to generate well-aligned $1920\times 1080$ RGB and NIR images by implementing the demosaicing algorithm presented in [13].

Fig. 5. The filter array pattern of our RGBN sensor and the demosaicing process to obtain well-aligned $1920\times 1080$ RGB and NIR images.

Download Full Size | PDF

2.2 Multispectral dataset

For the color restoration task, we utilize the built multispectral image acquisition system to capture well-aligned RGB (with and without NIR spectral crosstalk) and NIR image pairs under controllable illumination conditions according to the following steps: (1) adjust the controllable visible light source and set the reading of the illuminometer to a pre-defined value; (2) turn off the NIR light source and capture 10 consecutive RGB images without NIR spectral crosstalk distortion; (3) turn on the NIR light source and consecutively capture 10 distorted RGB images (with NIR spectral crosstalk) and 10 NIR ones; (4) adjust the environmental illuminance to another pre-defined value and repeat the operations in step (2) and (3).

We captured images in 2 indoor scenes which consist of standard X-Rite colorcheckers and objects made of different materials. In low-light conditions, we set the amplifier gain of the RGBN camera to the maximum and captured RGB images (with and without NIR spectral crosstalk) and NIR images under 9 different illuminations (0.3 lux, 0.5 lux, 0.7 lux, 1 lux, 3 lux, 5 lux, 7 lux, 10 lux, and 30 lux). Moreover, we switched the RGBN camera to the low-gain mode and captured images under a high illumination condition (200 lux) to generate high-quality RGB reference images to facilitate the training of the proposed color restoration model (more details are provided in Sec. 3.2.1). In total, we collect 200 sets of well-aligned undistorted/distorted RGB and NIR images in 2 indoor scenes under 10 illuminations between 0.3 lux and 200 lux. Moreover, we also capture 40 sets of undistorted/distorted RGB and NIR images of various outdoor scenes during nighttime for testing the robustness of our proposed color restoration method. Note that the outdoor objects (e.g., pedestrians, trees, or vehicles) could not remain completely static during the image acquisition process, therefore distorted and undistorted RGB images captured in outdoor scenes are not perfectly aligned. Some sample images are shown in Fig. 6.

Fig. 6. Sample images of the built multispectral dataset. Note that the outdoor objects (e.g., pedestrians, trees, or vehicles) could not remain completely static during the image acquisition process, therefore distorted and undistorted RGB images captured in outdoor scenes are not perfectly aligned.

Download Full Size | PDF

To the best of our knowledge, this is the first RGBN multispectral image dataset captured under a wide range of illumination conditions and it contains severe NIR spectral crosstalk and noise disturbance. The newly built dataset will be made publicly available to facilitate the development of solutions for color restoration, demosaicing, denoising, and multispectral image fusion.

3. Methodology

In this section, we present a multi-task CNN architecture to perform simultaneous noise reduction and color restoration of extremely desaturated and noisy RGB images captured under low-light conditions. Moreover, we present a technique for generating high-quality reference images and a task-specific joint loss function to facilitate the training of the proposed CNN model.

3.1 Multi-task network architecture

To achieve high-accuracy color restoration of desaturated RGB images captured under low illumination conditions, it is required to obtain the complex mapping relationship between the RGB images with and without NIR spectral crosstalk in the presence of severe noise disturbance and color desaturation. We propose to divide this challenging task into two sub-tasks including noise reduction and color restoration. Therefore, we present a Two-Stream RGB and NIR Deep Fusion (TS-RNDF) model which consists of two consecutive sub-networks which are specifically designed for noise reduction and color restoration tasks, as illustrated in Fig. 7.

Fig. 7. The overall architecture of our proposed TS-RNDF model. Please zoom in to see network configuration details.

Download Full Size | PDF

Noise reduction sub-network: Given a pair of RGB (with noise disturbance and NIR spectral crosstalk) and NIR images as input, two individual noise reduction sub-networks are deployed to suppress severe noise disturbances presented in RGB and NIR images captured under low-light conditions. As shown in Fig. 7, a noise reduction sub-network consists of two major processing steps including initial feature extraction, and encoder-decoder feature aggregation. More specifically, we deploy three individual $3\times 3$ convolutional layers for initial feature extraction on the RGB and NIR input image pairs. Leaky rectified linear units (Leaky ReLU) activation function is utilized to allow for small negative values when the input is less than zero and embed more nonlinear terms into the network. Then, we adopt the encoder-decoder architecture to generate informative feature maps [27] for the noise reduction task. In individual RGB or NIR stream, we stack two encoders and two decoders and add skip connections between an encoder and its corresponding mirrored decoder. Skip connections directly back-propagate gradient information between shallower and deeper layers, alleviating the gradient vanishing problem of training deep CNN models. Each encoder module consists of three consecutive residual blocks [19], which contain $1\times 1$, $3\times 3$, and $1\times 1$ convolutional layers to compute feature maps in different convolutional stages. The decoder deploys a learnable up-sampling layer to perform deconvolution operation, mapping the low-resolution feature maps of encoders to full input resolution feature maps for the subsequent pixel-wise image reconstruction task. We use the first sub-network to learn the mapping functions to transfer noisy RGB and NIR input images to the denoised versions.

Color restoration sub-network: After performing noise reduction on individual RGB and NIR images, a color restoration sub-network is consecutively deployed for the deep fusion of RGB and NIR features and high-accuracy color restoration of desaturated RGB images. As shown in Fig. 7, the color restoration sub-network consists of three major processing steps including two-stream feature fusion, encoder-decoder feature aggregation, and color-corrected RGB image reconstruction. It is important to incorporate both visible and NIR information for accurate spectral crosstalk compensation under different illumination conditions [24]. Accordingly, the computed feature maps of RGB and NIR noise reduction sub-networks are firstly concatenated and fed to a $1\times 1$ convolutional layer for multispectral feature fusion. Then, we deploy another encoder-decoder architecture which consists of three encoders and three decoders to learn the non-linear mapping relationship between the distorted and undistorted RGB images under various illumination conditions. Finally, the fused multispectral features are fed into $1\times 1$ and $3\times 3$ convolutional layers to generate the color-corrected RGB images in which the crosstalk distortions of NIR bands are well compensated.

3.2 Network training strategies

3.2.1 Simulation of high-quality training labels

It is important to generate high-quality reference RGB images to facilitate the training of the proposed TS-RNDF model. It is observed that RGB images captured under low illumination conditions without NIR crosstalk distortions (turn off the NIR light source) contain a considerable amount of noise which will adversely affect the training of the color restoration model. We propose an effective technique to simulate high-quality reference RGB images for a low illumination condition by referring to the images captured using a low-gain RGBN camera under a high illumination condition (200 lux), as shown in Fig. 8. Given two undistorted RGB images $I_{\textrm {H}}$ and $I_{\textrm {L}}$ which are captured under high and low illumination conditions, we select all pixels within the white squares and calculate the averaged values $V_{\textrm {H}}$ and $V_{\textrm {L}}$, respectively. Then, we calculate the ratio $k$ between $V_{\textrm {H}}$ and $V_{\textrm {L}}$ and use it as a linear mapping coefficient to simulate a high-quality reference RGB image for a low-light scene $\hat {I}_{\textrm {L}}$ by adjusting the brightness of the low-gain image captured under 200 lux illumination as $\hat {I}_{\textrm {L}} = \frac {I_{\textrm {H}}}{k}$. It is visually observed that the simulated and real-captured low-light RGB images have consistent color characteristics but the undesired noise is effectively suppressed. We will evaluate the effectiveness of the proposed technique for simulating high-quality training labels in Sec. 4.5.

Fig. 8. The illustration of simulating reference RGB images (without NIR spectral crosstalk) for training the proposed TS-RNDF model.

Download Full Size | PDF

For training the noise reduction sub-networks in our TS-RNDF model, we average 10 consecutively captured RGB and NIR images to generate the denoised references. As shown in Fig. 9, undesired noise in homogenous image regions is suppressed while image details such as textures or edges are well preserved via inter-frame temporal averaging.

Fig. 9. The illustration of generating denoised RGB (with NIR spectral crosstalk) and NIR images for training the noise reduction sub-networks via inter-frame temporal averaging: (a) Label images generating process for RGB noise reduction sub-network. (b) Label images generating process for NIR noise reduction sub-network.

Download Full Size | PDF

3.2.2 Task-specific joint loss function

We design a multi-term loss function $\mathcal {L}$ to drive the weights learning for training the proposed TS-RNDF model as

(1)$$\mathcal{L} = \alpha \cdot \mathcal{L}_\textrm{nr}^\textrm{RGB} + \beta \cdot \mathcal{L}_\textrm{nr}^\textrm{NIR} + \gamma \cdot \mathcal{L}_\textrm{cr},$$

where $\mathcal {L}_{\textrm {nr}}$ and $\mathcal {L}_{\textrm {cr}}$ denotes the loss terms for training noise reduction and color restoration sub-networks, respectively. $\alpha$, $\beta$, and $\gamma$ are the weight coefficients which are equally set to 1 in our experiments.

It has been found that the choice of loss function is task-dependent and will significantly affect the overall performance [28]. For instance, the $L1$ loss function is more sensitive to the variances of color and illuminance in homogenous image regions disregarding image local structures/textures. In comparison, the SSIM loss function can better preserve contrast in high-frequency regions but often causes shifts of brightness or colors. We propose to combine the advantages of $L1$ and SSIM loss functions and adjust their weighting parameters for specific tasks. For the noise reduction task, the loss terms $\mathcal {L}_{\textrm {nr}}^{\textrm {RGB}}$ and $\mathcal {L}_{\textrm {nr}}^{\textrm {NIR}}$ are defined as

(2)$$\mathcal{L}_{\textrm{nr}}^{\textrm{RGB}} = \lambda_{1} \cdot (1-\mathcal{L}_{\textrm{SSIM}}^{\textrm{RGB}}) + (1-\lambda_{1}) \cdot \mathcal{L}_{L1}^{\textrm{RGB}},$$

(3)$$\mathcal{L}_{\textrm{nr}}^{\textrm{NIR}} = \lambda_{1} \cdot (1-\mathcal{L}_{\textrm{SSIM}}^{\textrm{NIR}}) + (1-\lambda_{1}) \cdot \mathcal{L}_{L1}^{\textrm{NIR}}.$$

In comparison, the loss term $\mathcal {L}_{\textrm {cr}}$ for the color restoration task is defined as

(4)$$\mathcal{L}_{\textrm{cr}} = (1-\lambda_{2}) \cdot (1-\mathcal{L}_{\textrm{SSIM}}) + \lambda_{2} \cdot \mathcal{L}_{L1},$$

where the parameter $\lambda _{1}$ and $\lambda _{2}$ adjust the weights of $L1$ and SSIM loss functions for different tasks. We set $\lambda _{1}=\lambda _{2}=0.9$ in our experiments so that the SSIM loss plays a more important role in the noise reduction task while the $L1$ loss dominates the color restoration task. Comparative results of different lose functions including $L1$, SSIM, and our proposed task-specific loss functions are provided in Sec. 4.5.

4. Experiments

4.1 Training and testing datasets

We use 70 images captured under 7 different illuminations (0.5 lux, 1 lux, 3 lux, 5 lux, 7 lux, 10 lux, and 30 lux) of indoor scene 1 as the training dataset. Instead of using the full-size $1920\times 1080$ images to train CNN model, we uniformly crop each full-size $1920\times 1080$ image into a number of $256\times 256$ image patches for increasing the number of training samples and improving the computational efficiency for the training phase [26]. Images of indoor scene 2 captured under 9 different illuminations (0.3 lux, 0.5 lux, 0.7 lux, 1 lux, 3 lux, 5 lux, 7 lux, 10 lux, and 30 lux) are used as the testing images. As shown in Fig. 10, RGB images captured in two indoor scenes contain different targets/objects and crosstalk distortion. Moreover, the testing dataset contains 0.3 lux and 0.7 lux low-light images which are not included in the training dataset. The robustness of the proposed TS-RNDF model is further evaluated using desaturated RGB images of various outdoor scenes captured during nighttime.

Fig. 10. Sample images of (a) The training dataset. (b) The testing dataset.

Download Full Size | PDF

4.2 Implementation details

We implement the TS-RNDF model base on the PyTorch framework and train our model on NVIDIA RTX 2080Ti with CUDA 10.1 and cuDNN 7.6.3 for 500 epochs. Adam optimizer is utilized to optimize the weights by setting $\alpha =1e-5$, $\beta _{1}=0.9$, and $\beta _{2}=0.999$. We apply standard image flipping and rotating for data augmentation. The batch size is set to 8 and the learning rate is fixed to $1e-3$. The source code of the TS-RNDF model will be made publicly available in the future.

4.3 Evaluation metrics

We adopt the peak signal-to-noise ratio (PSNR) to evaluate the pixel-level difference between the predicted color-corrected results and the undistorted RGB images captured when turning off the NIR light source (Ground Truth - GT). We also utilize the CIEDE 2000 color difference formula $\Delta E_{00}$ to evaluate the performances of various color restoration methods. $\Delta E_{00}$ takes into account the lightness, chroma, and hue differences in CIELAB color space and is more consistent with human perception [29]. Note higher PSNR and lower $\Delta E_{00}$ values indicate better color restoration results.

4.4 Comparisons with state-of-the-art

We compare our proposed TS-RNDF model with a number of existing color restoration solutions qualitatively and quantitatively. We first consider three CCM-based methods which are calculated by least squares regression to minimize the mean colorimetric error between crosstalk distorted and undistorted RGB images using $3\times 19$ CCM [12], RPCC [16], and 2-order PCC [17]. We make use of color squares in a standard X-Rite colorchecker to calibrate CCMs under different illuminations. We also consider two deep learning-based models (CD-Net [23] and U-Net [24]), which are specially designed for color restoration of images captured using RGBN multispectral filter array sensors. Source codes of CD-Net and U-Net models are publicly available and re-trained using our own captured RGBN images.

4.4.1 Indoor scene

We first evaluate the performance of different color restoration methods on RGB images of indoor scene 2 captured under 9 different illuminations (0.3 lux, 0.5 lux, 0.7 lux, 1 lux, 3 lux, 5 lux, 7 lux, 10 lux, and 30 lux). Note the CCM-based methods are only applicable for images captured under 0.5 lux, 1 lux, 3 lux, 5 lux, 7 lux, 10 lux, and 30 lux illuminations since calibration images are not provided for 0.3 lux and 0.7 lux in the training dataset. Table 1 summarizes the quantitative results and Fig. 11–14 show some qualitative comparisons.

Table 1. Quantitative color restoration results of RGB images captured under various illumination conditions. Red and blue indicate the best and the second-best performance, respectively. Note $3\times 19$ CCM, RPCC, 2-Order PCC methods are not applicable for 0.3 and 0.7 lux illuminations since the corresponding calibration images are not provided in the training dataset.

View Table | View all tables in this article

Fig. 11. Color restoration results of 5 lux RGB images using $3\times 19$ CCM, RPCC, 2-Order PCC, CD-Net, U-Net, and our proposed TS-RNDF model.

Download Full Size | PDF

It is observed that CCM-based and deep learning-based methods can generally achieve satisfactory color restoration results for RGB images captured under decent lighting conditions (e.g., 5 lux - illumination from street lamps during nighttime). Compared with other alternatives, our proposed TS-RNDF model can achieve more accurate color restoration results and effectively suppress undesired artifacts or noises in homogeneous image regions as shown in Fig. 11.

As illustrated in Fig. 12, 13, and 14, the performance of CCM-based methods drop significantly when processing desaturated and noisy RGB images captured under low-light conditions (below 1 lux). CD-Net [23] and U-Net [24] can better restore color and enhance contrast but adversely cause substantial artifacts. In comparison, our proposed multi-task TS-RNDF model performs simultaneous noise reduction and color restoration thus can achieve high-accuracy color restoration in the presence of severe color desaturation and noise disturbance.

Fig. 12. Color restoration results of 1 lux RGB images using $3\times 19$ CCM, RPCC, 2-Order PCC, CD-Net, U-Net, and our proposed TS-RNDF model.

Download Full Size | PDF

Fig. 13. Color restoration results of 0.7 lux RGB images using CD-Net, U-Net, and our proposed TS-RNDF model. Note $3\times 19$ CCM, RPCC, 2-Order PCC methods are not applicable since 0.7 lux calibration images are not provided in the training dataset.

Download Full Size | PDF

Fig. 14. Color restoration results of 0.3 lux RGB images using CD-Net, U-Net, and our proposed TS-RNDF model. Note $3\times 19$ CCM, RPCC, 2-Order PCC methods are not applicable since 0.3 lux calibration images are not provided in the training dataset.

Download Full Size | PDF

Another advantage of our proposed method is that it can be firstly trained by utilizing RGB images captured under a number of illuminations (0.5 lux, 1 lux, 3 lux, 5 lux, 7 lux, 10 lux, and 30 lux) and then extended to unseen illuminations. Therefore, it can generate high-quality color restoration results for 0.3 lux and 0.7 lux images which are not included in the training dataset. In comparison, CCM-based methods require capturing calibration images under each illumination to calculate the optimal CCMs.

4.4.2 Outdoor scenes

The proposed TS-RNDF model is further evaluated using nighttime RGB images with spectral crosstalk of various outdoor scenes. Note these images are captured under moonlight illuminations with the NIR light source turned on. Some color restoration results are shown in Fig. 15. Due to the absence of visible light input during nighttime, the responses of RGB channels are mostly caused by undesirable NIR spectral crosstalk and the captured color images are significantly desaturated to gray ones. Moreover, the real-captured low-light RGB images contain undesired noise which will negatively affect the color restoration results. Our proposed TS-RNDF model incorporates RGB and NIR images as inputs and performs simultaneous noise reduction and color restoration of extremely desaturated and noisy RGB images captured under low-light conditions. As the results, the TS-RNDF model can accurately restore high-fidelity color information of clothes (e.g., black pants and overcoat) and a colorchecker in these nighttime RGB images in the presence of severe color desaturation and noise disturbance, as illustrated in Fig. 15.

Fig. 15. Color restoration results of our proposed TS-RNDF model for nighttime RGB images captured in various outdoor scenes. Note that the outdoor objects (e.g., pedestrians, trees, or vehicles) could not remain completely static during the image acquisition process, therefore distorted and undistorted RGB images captured in outdoor scenes are not perfectly aligned.

Download Full Size | PDF

4.5 Performance analysis

In this section, we set up experiments to evaluate the effectiveness of the proposed training strategy including (1) simulation of high-quality training labels and (2) task-specific joint loss function. For a fair comparison, the experiments are conducted using the same CNN architecture as shown in Fig. 7. Table 2 illustrates the experimental results in terms of PSNR and $\Delta E_{00}$.

Table 2. Quantitative evaluation comparative results for 3 selected illumination conditions (5 lux, 1 lux, and 0.7 lux) using different training labels and loss functions. Red indicates the best performance.

View Table | View all tables in this article

We first evaluate the impact of utilizing the simulated high-quality reference RGB images for training the TS-RNDF model. As shown in Table 2 (row 2), using the real-captured undistorted RGB images as the training labels will seriously undermine the accuracy of color restoration results. For instance, the PSNR index significantly drops from 29.0209 dB to 20.4441 dB while the $\Delta E_{00}$ value adversely increases from 4.0951 to 9.1656 under 0.7 lux illumination condition. It is visually observed that the real-captured low-light RGB images contain a considerable amount of noise which will negatively affect the performances of color restoration models.

In Table 2 (row 3 - 5), we compare the proposed task-specific loss function ($\lambda _{1}=\lambda _{2}=0.9$) with a number of alternatives including $L1$ loss function ($\lambda _{1}=0$, $\lambda _{2}=1$), SSIM loss function ($\lambda _{1}=1$, $\lambda _{2}=0$), and $L1+\textrm {SSIM}$ loss function ($\lambda _{1}=\lambda _{2}=0.5$). Our experiments illustrate that the choice of loss function significantly affects the quality of color restoration results, which is consistent with previous studies [28]. $L1$ loss function is more sensitive to color variances while SSIM loss function is more suitable for the noise reduction task. Therefore, improved performance (high PSNR and lower $\Delta E_{00}$ values) is achieved by equally combining $L1$ and SSIM loss functions ($\lambda _{1}=\lambda _{2}=0.5$). It is worth mentioning that our proposed task-specific loss function can further improve the performance, setting $\lambda _{1}=\lambda _{2}=0.9$ so that the SSIM loss plays a more important role in the noise reduction task while the $L1$ loss dominates the color restoration task.

We set up experiments to add the denoising module to another deep learning-based color correction method [24] and the comparative results are shown in Fig. 16. Such modification can generally achieve better color restoration results, proving that it is important to integrate noise reduction functionality into the overall color correction model otherwise severe noise disturbance will adversely affect the training of the color restoration model. Our proposed end-to-end TS-RNDF model, which integrates two important and highly related sub-tasks (noise reduction and color restoration) and is trained using the task-specific joint loss function (setting $\lambda _{1}=\lambda _{2}=0.9$), can achieve significantly improved color restoration results for images captured using RGBN multispectral filter array sensors under various illumination conditions.

Fig. 16. Color restoration results of RGB images under 3 selected illumination conditions (5 lux, 1 lux, and 0.7 lux) using U-Net without Denoising, U-Net with Denoising, and our proposed TS-RNDF model. "w/o" means without and "w/i" means with. Red and blue indicate the best and the second-best performance, respectively.

Download Full Size | PDF

5. Conclusions

Spectral crosstalk between RGB and NIR bands adversely desaturate color images captured using RGBN multispectral filter array sensors. Therefore, color correction/restoration provides an indispensable component for building high-quality RGBN multispectral imaging systems. In this paper, we present a data-driven framework for effective spectral crosstalk compensation of RGBN multispectral filter array sensors and demonstrate its effectiveness for high-accuracy color restoration given extremely desaturated and noisy RGB images captured under low-light conditions (below 1 lux). We build a multispectral image acquisition system including a high-resolution RGBN multispectral filter array sensor and controllable visible and NIR light sources to capture well-aligned RGB (with and without NIR spectral crosstalk) and NIR image pairs under various illumination conditions. To achieve high-accuracy color restoration of desaturated RGB images captured under low illumination conditions, we propose to divide this challenging task into two sub-tasks and design the TS-RNDF model performing simultaneous noise reduction and color restoration. Moreover, we present an effective technique to generate noise-free RGB images and a task-specific joint loss function to facilitate the training of the proposed multi-task CNN model. Compared with the state-of-the-art calibration-based and deep learning-based color restoration solutions, our proposed method exhibits good generalization properties and achieves significantly improved color restoration results for indoor and outdoor images captured using RGBN multispectral filter array sensors under various illumination conditions. The proposed method could be potentially utilized to build high-quality RGBN multispectral image acquisition systems working well in low-light conditions for security surveillance and night vision applications.

Funding

National Natural Science Foundation of China (52075485).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. D. Hertel, H. Maréchal, D. A. Tefera, W. Fan, and R. Hicks, “A low-cost vis-nir true color night vision video system based on a wide dynamic range cmos imager,” in 2009 IEEE Intelligent Vehicles Symposium, (IEEE, 2009), pp. 273–278.

2. X. Hou, M. Zhang, G. Li, H. Tian, S. Yang, X. Feng, L. Lin, and Z. Fu, “Accuracy improvement of quantitative analysis in vis-nir spectroscopy using the gkf-wtef algorithm,” Appl. Opt. 58(28), 7836–7843 (2019). [CrossRef]

3. C. Shen, A. C. S. Chan, J. Chung, D. E. Williams, A. Hajimiri, and C. Yang, “Computational aberration correction of vis-nir multispectral imaging microscopy based on fourier ptychography,” Opt. Express 27(18), 24923–24937 (2019). [CrossRef]

4. M. Á. Martínez, E. M. Valero, J. L. Nieves, R. Blanc, E. Manzano, and J. L. Vílchez, “Multifocus hdr vis/nir hyperspectral imaging and its application to works of art,” Opt. Express 27(8), 11323–11338 (2019). [CrossRef]

5. M. Brown and S. Süsstrunk, “Multi-spectral sift for scene category recognition,” in CVPR 2011, (IEEE, 2011), pp. 177–184.

6. G. Oh, H. J. Cho, S. Suh, D. Lee, and K. Kim, “Multicolor fluorescence imaging using a single rgb-ir cmos sensor for cancer detection with smurfp-labeled probiotics,” Biomed. Opt. Express 11(6), 2951–2963 (2020). [CrossRef]

7. M. Kise, B. Park, G. W. Heitschmidt, K. C. Lawrence, and W. R. Windham, “Multispectral imaging system with interchangeable filter design,” Comput. Electron. Agric. 72(2), 61–68 (2010). [CrossRef]

8. S. Hwang, J. Park, N. Kim, Y. Choi, and I. So Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 1037–1045.

9. S. Koyama, Y. Inaba, M. Kasano, and T. Murata, “A day and night vision mos imager with robust photonic-crystal-based rgb-and-ir,” IEEE Trans. Electron Devices 55(3), 754–759 (2008). [CrossRef]

10. L. Frey, L. Masarotto, L. El Melhaoui, S. Verrun, S. Minoret, G. Rodriguez, A. André, F. Ritton, and P. Parrein, “High-performance silver-dielectric interference filters for rgbir imaging,” Opt. Lett. 43(6), 1355–1358 (2018). [CrossRef]

11. B. E. Bayer, “Color imaging array,” (1976). US Patent 3, 971, 065.

12. Z. Chen, X. Wang, and R. Liang, “Rgb-nir multispectral camera,” Opt. Express 22(5), 4985–4994 (2014). [CrossRef]

13. Y. Monno, H. Teranaka, K. Yoshizaki, M. Tanaka, and M. Okutomi, “Single-sensor rgb-nir imaging: High-quality system design and prototype implementation,” IEEE Sensors J. 19(2), 497–507 (2019). [CrossRef]

14. H. R. Kang, Computational color technology (spie press monograph vol. pm159), (SPIE-International Society for Optical Engineering, 2006).

15. G. Hong, M. R. Luo, and P. A. Rhodes, “A study of digital camera colorimetric characterization based on polynomial modeling,” Color. Res. & Appl. Endorsed by Inter-Society Color. Counc. The Colour Group (Great Britain), Can. Soc. for Color. Color. Sci. Assoc. Jpn. Dutch Soc. for Study Color. The Swed. Colour Centre Foundation, Colour Soc. Aust. Centre Français de la Couleur 26, 76–84 (2001).

16. G. D. Finlayson, M. Mackiewicz, and A. Hurlbert, “Color correction using root-polynomial regression,” IEEE Trans. on Image Process. 24(5), 1460–1470 (2015). [CrossRef]

17. Y. Monno, M. Tanaka, and M. Okutomi, “N-to-srgb mapping for single-sensor multispectral imaging,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, (2015), pp. 33–40.

18. C. H. Park, H. M. Oh, and M. G. Kang, “Color restoration for infrared cutoff filter removed rgbn multispectral filter array image sensor,” in VISAPP (1), (2015), pp. 30–37.

19. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.

20. R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, (2015), pp. 1440–1448.

21. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

22. C. Aguilera, X. Soria, A. D. Sappa, and R. Toledo, “Rgbn multispectral images: A novel color restoration approach,” in International Conference on Practical Applications of Agents and Multi-Agent Systems, (Springer, 2017), pp. 155–163.

23. X. Soria, A. D. Sappa, and R. I. Hammoud, “Wide-band color imagery restoration for rgb-nir single sensor images,” Sensors 18(7), 2059 (2018). [CrossRef]

24. Z. Han, L. Li, W. Jin, X. Wang, G. Jiao, and H. Wang, “Convolutional neural network training for rgbn camera color restoration using generated image pairs,” IEEE Photonics J. 12(5), 1–15 (2020). [CrossRef]

25. R. Yamakabe, Y. Monno, M. Tanaka, and M. Okutomi, “Tunable color correction for noisy images,” J. Electron. Imag. 29(3), 033012 (2020). [CrossRef]

26. F. Lv, Y. Zheng, Y. Li, and F. Lu, “An integrated enhancement solution for 24-hour colorful imaging,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34 (2020), pp. 11725–11732.

27. X.-J. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” in NIPS, (2016).

28. H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Trans. on Comput. Imaging 3(1), 47–57 (2017). [CrossRef]

29. G. Sharma, W. Wu, and E. N. Dalal, “The ciede2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations,” Color. Res. & Appl. Endorsed by Inter-Society Color. Counc. The Colour Group (Great Britain), Can. Soc. for Color. Color. Sci. Assoc. Jpn. Dutch Soc. for Study Color. The Swed. Colour Centre Foundation, Colour Soc. Aust. Centre Français de la Couleur 30, 21–30 (2005).

Data-driven framework for high-accuracy color restoration of RGBN multispectral filter array sensors under extremely low-light conditions

Abstract

1. Introduction

2. Multispectral image acquisition system and dataset

2.1 Hardware configuration

2.2 Multispectral dataset

3. Methodology

3.1 Multi-task network architecture

3.2 Network training strategies

3.2.1 Simulation of high-quality training labels

3.2.2 Task-specific joint loss function

4. Experiments

4.1 Training and testing datasets

4.2 Implementation details

4.3 Evaluation metrics

4.4 Comparisons with state-of-the-art

4.4.1 Indoor scene

4.4.2 Outdoor scenes

4.5 Performance analysis

5. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (16)

Tables (2)

Equations (4)

Optics Express