pHSCNN: CNN-based hyperspectral recovery from a pair of RGB images

Yuanyuan Sun; Junchao Zhang; Junchao Zhang; Rongguang Liang; Rongguang Liang

doi:10.1364/OE.461383

1. Introduction

Hyperspectral imaging (HSI) obtains two-dimensional images across a multitude of wavelengths and has been widely applied in biomedical imaging [1] and computer vision tasks [2]. However, conventional HSI approaches compromise between spatial, spectral, and temporal resolutions. Scanning HSI system can obtain full-resolution hyperspectral images by spatially scanning or spectrally switching bandpass filters. However, scanning mechanism is time-consuming and eliminates the possibility of dynamic measurements. Snapshot HSI can realize dynamic measurement by utilizing prism for dispersion or mosaic hyperspectral sensors. However, these methods suffer a heavy tradeoff between spectral and spatial resolutions [3]. With increasing studies on spectral sparsity and correlation, learning-based hyperspectral recovery from RGB images has drawn much attention as it enables dynamic measurement without sacrificing spatial or spectral resolutions. RGB images are considered as linear measurements of original hyperspectral signature, possibly with added noise. Although retrieving hypercube from few RGB images is a severely ill-posed problem, a robust recovery is possible since hypercube is sparse spatially and spectrally [4].

In recent year, many methods have been proposed to solve this inverse problem. In [4], sparse coding is applied to build a dictionary basis where the hyperspectral prior is most representative. The test RGB image can be rewritten as a weighted superposition of the RGB projection of the learned hyperspectral dictionary, and the unknown hypercube can be retrieved as a superposition of hyperspectral dictionary using the same weights. Manifold learning applies a nonlinear dimensionality reduction technique on the hyperspectral prior and learns the direct mapping relationship between RGB space and embedded hyperspectral space [5]. Z. Xiong et al. [6] introduced convolutional neural network (CNN) into hyperspectral recovery and proposed a hyperspectral recovery network (HSCNN) which recovers hypercube from a spectrally up-sampled RGB cube. Based on HSCNN, Z. Shi et al. [7] proposed HSCNN+ with added residual blocks and dense blocks. They have very deep network structures and achieve higher accuracy. To increase the computation efficiency and shrink the calculation time, J. Zhang et al. [8] proposed a network containing only one dense block with fewer convolution layers. It offers significant savings in computation power and exhibits outstanding performance. Besides regression models, selection of camera spectral sensitivity (CSS) has also attracted more and more attention in hyperspectral recovery. Since CSS decides how the hyperspectral signature is projected to the RGB measurements. Its optimization closely affects the recovery fidelity. B. Arad et al. [9] implemented an exhaustive search and proved that the optimal CSS can improve the performance by 30%. In [10], a learning-based CSS selection hyperspectral recovery network is proposed to realize CSS selection and hyperspectral recovery at the same time. S. Nie et al. [11] designed a network to learn the optimal sensitivity profile instead of choosing from available CSS candidates. Besides selecting optimal color filter array, illumination optimization has also been brought into consideration to achieve higher accuracy [12].

Most existing methods only consider recovering hyperspectral signal from single RGB image. Intuitively, adding more RGB images is an effective way to increase the recovery accuracy. However, using more RGB sensors greatly increases the cost and difficulty of data acquisition. It must be ensured that the misalignment between the different RGB images is small enough to be neglect otherwise it will severely degrade the recovery quality. In this paper, we proposed a pairwise hyperspectral recovery convolutional neural network (pHSCNN) to reconstruct hyperspectral images from two different RGB images using a uniform sensor with an optical filter. By adding and removing this filter, two RGB images with different spectral information can be obtained by one sensor to retrieve hyperspectral signature. Also, we add filter optimizing layer in pHSCNN to select the optimal filter. Most existing methods directly optimize the CSS of color chips on the sensor. It is a costly solution because it requires to manufacture a customed mosaic color filter array sensor in application. In our study, it only requires learning the spectrum transmittance of an optical filter which greatly reduces the expense in application.

The rest of the paper is organized as follows. Section 2 illustrates the network structure and implementation details. Section 3 shows our dual-camera HSI system and experimental results with both simulated and real-captured RGB images. Finally, the conclusion is presented in Section 4.

2. Method

Hyperspectral recovery from RGB images is a many-to-one problem. It is required to reconstruct high-dimensional hyperspectral information (in our case 440 nm to 670 nm with 10 nm increment) from low-dimensional RGB images. Each RGB image can be seen as noisy linear measurements of true hyperspectral signature according to the CSS of the color sensor. Denote ${\mathbf{{\rm H}}}$ the hyperspectral cube, one RGB measurement ${\mathbf{{\rm I}}}$ can be described as

(1)$${I}({x,y,c} )= \mathop \sum \limits_i^N {H}({x,y,i} )\ast {r_c}(i )+ \eta$$

where $({x,y} )$ represents the Cartesian coordinate of one pixel. $c \in \{{R,G,B} \}$ denotes the color channels and $i \in \{{440nm,450nm, \ldots ,670{\rm{nm}}} \}$ denotes hyperspectral wavebands with total number ${\rm{N}} = 24$. ${r_c}$ describes the camera’s spectral response of channel $c$ and $\eta $ is the noise. Without prior information of ${\mathbf{{\rm H}}}$, it is impossible to address this ill-posed problem. However, it is well studied that hypercube ${\mathbf{{\rm H}}}$ of natural scenes is sparse and can be well described using several principal components [13]. The probability of metamerism, which is considered to be tricky scenarios in hyperspectral recovery, is relatively low in nature [14]. According to a principal component analysis (PCA) shown in Fig. 1, our hyperspectral prior can be well described using the first 3 to 5 principal components with total explained variance higher than 98%.

Fig. 1. Explained variance ratio of the PCA for our hyperspectral dataset.

Download Full Size | PDF

2.1 Mechanisms to increase the accuracy

According to the standard compressed sensing theory, increasing the number of measurements gives a stronger guarantee for a more accurate recovery [15]. In other words, adding more RGB images can effectively increase the hyperspectral recovery fidelity. One simple and direct way to obtain more RGB images is to add different sensors. However, adding sensors greatly increases hardware cost and complexity. The alignment between different sensors as well as the image registration work also require extra care because it must be ensured that all the sensors capture the same object pixel wisely. Other solutions achieve this goal either with different illumination conditions or with different imaging conditions [16]. Various methods, such as filters or tunable light source, can be used in illumination path to change the spectrum. In imaging path, filters can be placed in front of the imaging lens to capture images with different spectral information. In this study, we employ a thin film filter in front of the imaging system to manipulate spectral transmission and generate a filtered RGB image ${{\mathbf{{\rm I}}}^f}$ which takes the form of

(2)$${{\boldsymbol I}^f}({x,y,c} )= \mathop \sum \limits_i^N {\boldsymbol H}({x,y,i} )\ast f(i )\ast {r_c}(i )+ \eta$$

where $f$ describes the filter transmittance spectrum. One of the RGB pair comes from a common color sensor (denoted ${\mathbf{{\rm I}}}$), the other is simulated as being captured after adding the optimized filter ${F_{opt}}$ (denoted ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$).

Another mechanism to increase the accuracy is to optimize the added filter transmittance spectrum. Each RGB image should provide unique spectral information contributing to a higher accurate recovery. In our method, the optimal filter ${F_{opt}}$ introduces variations into the hyperspectral-to-RGB mapping in ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$. Hence, we add a filter optimizing layer in pHSCNN to let the network optimize ${F_{opt}}$ by searching for the best filter.

2.2 Structure of pHSCNN

Based on the two mechanisms, we proposed pHSCNN whose structure is illustrated in Fig. 2. It consists of HS module, RGB module and a filter optimizing layer. During training, both hyperspectral ground-truth ${\mathbf{{\rm H}}}$ and RGB image ${\mathbf{{\rm I}}}$ are fed into pHSCNN as inputs. ${\mathbf{{\rm H}}}$ first passes the filter optimizing layer to mimic the physical process of passing through an optical filter. Then the filtered hypercube is fed into pre-trained RGB module to generate the filtered RGB image. Together with the input RGB image without filter, they form the RGB image pair. This RGB pair is passed to HS module parallelly to output hyperspectral recovery ${\mathbf{\hat {\rm H}}}$. By minimizing the difference between ${\mathbf{{\rm H}}}$ and ${\mathbf{\hat {\rm H}}}$, parameters in both the filter optimizing layer and the HS module are updated and optimized simultaneously. In validation, the parameters in filter optimizing layer will be fixed using the optimal filter ${F_{opt}}$ learned from the training stage.

Fig. 2. Architecture of the proposed pHSCNN.

Download Full Size | PDF

The HS module outputs a hyperspectral prediction ${\mathbf{\hat {\rm H}}}$ from RGB pair $\{{{\mathbf{{\rm I}}},{{\mathbf{{\rm I}}}^{{F_{opt}}}}} \}$. It is modified from our previous hyperspectral recovery network from a single RGB image [8]. Both ${\mathbf{{\rm I}}}$ and ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$ undergo the same layers separately and their intermediate outputs are combined. In details, ${\mathbf{{\rm I}}}$ or ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$ is first passed through a convolutional (Conv) layer with 16 filters followed by a rectified linear unit (ReLU) activation function. The output is fed into a dense block (DB) consisting of 6 Conv layers. Every layer in DB takes all the preceding outcomes as input and concatenates the previous and new outputs together. The stacked feature maps are passed through two Conv layers with 112 and 64 filters to obtain intermediate output. The intermediate outputs from both two RGB images are summed together pixel-to-pixel and then fed into the final Conv layer with 24 filters to give the correct dimension.

Opposite to HS module, the RGB module learns the mapping relationship from high-dimensional hyperspectral space to low-dimensional RGB space. In existing methods, the generation of RGB image is commonly described as multiplying hypercube and the CSS of color sensor. However, this multiplication oversimplifies the generation of RGB image as it ignores the nonlinear processing in sensor as well as the aberration brought by the imaging optics. The output RGB image from CSS multiplication usually suffers from color distortion and looks different from the RGB image in daily life. This motivates us to utilize RGB module instead of CSS to simulate ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$. It consists of three Conv layers containing 12, 6 and 3 filters sequentially and each is followed by a ReLU activation function. The RGB module is pre-trained using pairwise hyperspectral-RGB images we captured. It is positioned right after the filter optimizing layer as a generator of ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$.

The filter optimizing layer consists of a vector in the dimension of 24, each element ranging from 0 to 1. It represents the optical transmittance spectrum of a thin film filter. Before training, the vector was first initialized as a constant vector. During training, the vector was regarded as a changeable variable whose value was kept updated during training. Finally, the vector would converge to a unique curve which describes the optimal filter ${F_{opt}}$ we want.

After pHSCNN is trained, the optimal filter can be either fabricated or substituted using commercial spectral filter then real-captured ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$ can be obtained. To validate the superiority of our trained optimal filter in real application, a modified pHSCNN is proposed to recovery hyperspectral images with all real-captured RGB inputs. Its structure is shown in Fig. 3 with RGB module and filter optimizing layer removed. The HS module shares the same structure with original pHSCNN.

Fig. 3. Architecture of the modified pHSCNN. The RGB module and filter optimizing layer are removed. And both RGB inputs are real captured with or without the optimal filter.

Download Full Size | PDF

2.3 Implementation details

To search for the global minimum, root mean squared error (RMSE), mean absolute error (MAE) and structure similarity index (SSIM) are utilized in loss function which take the form as

(3)$$RMSE({\hat {H},{H}} )= \parallel\hat {H} - {H}\parallel_2$$

(4)$$MAE({\hat {H},{H}} )= \parallel\hat {H} - {H}\parallel_1$$

(5)$$SSIM({{{\hat {H}}_i},{{H}_i}} )= \frac{{({2{{\bar \omega }_{{{\hat {H}}_i}}}{{\bar \omega }_{{{H}_i}}} + {C_1}} )({2{\sigma_{{\omega_{{{\hat {H}}_i}}}{\omega_{{{H}_i}}}}} + {C_2}} )}}{{({\bar \omega_{{{\hat {H}}_i}}^2 + \bar \omega_{{{H}_i}}^2 + {C_1}} )({\sigma_{{\omega_{{{\hat {H}}_i}}}}^2 + \sigma_{{\omega_{{{H}_i}}}}^2 + {C_2}} )}}$$

where ${{\rm{\hat {\rm H}}}_{\rm{i}}},{{\rm{{\rm H}}}_{\rm{i}}}$ represents the predicted and ground-truth hyperspectral images at $i$ th wavelength separately. $\parallel \cdot \parallel_2$ and $\parallel \cdot \parallel_1$ denote L2 and L1 norm. ${\omega _{{{H}_i}}}$ defines the region of image ${\rm{\;}}{{\rm{{\rm H}}}_{\rm{i}}}$ within window $\omega $.${\bar \omega _{{{\rm{{\rm H}}}_{\rm{i}}}}}$, $\sigma _{{\omega _{{{\rm{{\rm H}}}_{\rm{i}}}}}}^2$ and ${\sigma _{{\omega _{{{{\rm{\hat {\rm H}}}}_i}}}{\omega _{{{\rm{{\rm H}}}_{\rm{i}}}}}}}$ are the mean, variance, and covariance of ${\omega _{{{\rm{{\rm H}}}_{\rm{i}}}}}$. The final loss function for HS module is the combination of three components which can be written as

(6)$$los{s_{HS}}({\hat {H},{H}} )= RMSE({\hat {H},{H}} )+ MAE({\hat {H},{H}} )+ \lambda \mathop \sum \limits_i^N SSIM({{{\hat {H}}_i},{{H}_i}} )$$

where $\lambda $ balances the weight of SSIM component. The SSIM component is the mean value of all bands. The loss function for the RGB module takes a similar form as

(7)$$los{s_{RGB}}({\hat {I},{I}} )= RMSE({\hat {I},{I}} )+ MAE({\hat {I},{I}} )+ \lambda \mathop \sum \limits_c^3 SSIM({{{\hat {I}}_c},{{I}_c}} ).$$

While training, we added flip (up-down and left-right) and rotation ($0^\circ,\;\;90^\circ,180^\circ\;\;\rm{and}\;\;270^\circ$) for data augmentation. We chose the Adam optimizer [17] with batch size of 64 and epoch number of 100. The learning rate was initially set to 0.001 and exponentially decayed with a rate of 0.99. In loss functions, the balance parameter $\lambda $ was set to 0.5 experimentally and ${C_1}$, ${C_2}$ were set as $1 \times {10^{ - 4}}$ and $9 \times {10^{ - 4}}$ respectively.

3. HSI system and experiments

Most existing hyperspectral-RGB datasets are half-captured and half-simulated. The hypercube is captured by hyperspectral cameras while the corresponding RGB images are mostly synthesized by multiplying hypercube with CSS. To obtain a real-captured hyperspectral-RGB dataset, we built a dual-camera HSI system.

3.1 Dual-camera HSI system

Figure 4(a) exhibits the optical layout of our dual-camera HSI system with a rotating filter wheel. There are three kinds of filters on the wheel: bandpass filters, glass window and the commercial spectral filter. The bandpass filters are hard coated OD (optical density) 4.0 10 nm bandpass filters from Edmund Optics. Each one transmits light within a narrow10nm band centered at a certain wavelength and block the light at other wavelength with OD larger than 4.

Fig. 4. (a) Optical layout and (b) experimental setup of a dual-camera HSI system. BS: Beam Splitter; CS: Color Sensor; FW: Filters Wheel; L: Imaging Lens; MR: Motorized Rotator; MS: Monochromatic Sensor.

Download Full Size | PDF

The transmitted light is collected by the monochromatic sensor (MS) after passing through the imaging lens (L) and beam splitter (BS) sequentially. This gives hyperspectral ground-truth images. On the other hand, light passing through glass window experiences no spectral filtering. It is captured by the color sensor (CS) after reflected by the BS as RGB image ${\mathbf{{\rm I}}}$. The reason to utilize a glass window instead of leaving it empty is to compensate for the optical path difference brought by bandpass filters. The commercial spectral filters are filters with different transmission at different wavelengths. They are used to substitute the trained optimal filters to validate the superiority of optimal filter in real application. The RGB image obtained using the commercial spectral filter is used to train and validate modified pHSCNN. Figure 4(b) demonstrates the experimental setup of our HSI system. In practice, the rotation of the wheel is implemented by a motorized rotator (Thorlabs ELL14 rotation mount). A plate positioner (Thorlabs SPT1/M-XY slip Plate Positioner) is utilized to mount the color sensor with a freedom of ±1 mm on horizontal and perpendicular axis. It ensures a better alignment between the color sensor and the monochromatic sensor. The two sensors we used in our system is color sensor Blackfly S BFS-U3-51S5C and monochromatic sensor Blackfly S BFS-U3-51S5M. The dynamic range for the color sensor is 70.74 dB and 71.13 dB for the mono sensor.

3.2 Image preprocessing

The raw images from the system still suffer from spatial and chromatic aberration. We designed a preprocessing pipeline (shown in Fig. 5) to compensate for the imperfections of our system. Because every bandpass filter mounted on the filter wheel experiences slight but variable amounts of tilt, there are different misalignments among images at different bands and between hyperspectral and RGB images. The 13th hyperspectral image at 560 nm acts as the uniform reference and the images at other wavelengths as well as the RGB images are spatially aligned to the reference using the single-step DFT registration algorithm [18]. The postprocessed misalignments are reduced to subpixel. All RGB images are color corrected by the Gray World algorithm and a color correction matrix (CCM) which is calculated based on a standard X-rite color checker [19].

Fig. 5. The preprocessing pipeline for our hyperspectral-RGB dataset.

Download Full Size | PDF

Our hyperspectral dataset includes 60 scenes of colorful dried flowers and paintings without showing any strong bias towards specific colors. Every scene contains 1 RGB images with resolution $2400 \times 2000 \times 3$ and a hypercube in the dimension of $2400 \times 2000 \times 24$ with wavelength ranging from 440 nm to 670 nm. 45 out of 60 scenes were selected as training samples and were cropped into small patches of size $40 \times 40$. The total patch number in training dataset is 134976. The remaining 15 scenes constitute the validation dataset.

3.3 Experiments with simulated filtered RGB

The first experiment was implemented with simulated ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$ generated from RGB module after obtaining the optimal filter. To show the superiority of pHSCNN, two prior RGB based hyperspectral recovery methods, sparse coding in [4] (termed SC) and deep learning based hyperspectral recovery network in [8] (termed HS net) were implemented for comparisons. In SC, the hyperspectral dictionary was regenerated using our own training data. The size of hyperspectral dictionary was set to the maximum value presented in their paper to ensure the recovery quality. Similarly, the HS net was also retrained on our dataset. New images were used to validate the performances of the three methods. Part of the results are displayed in Fig. 6. Besides, to emphasize the contribution brought by the learned optimized filter ${F_{opt}}$, an ablation experiment was implemented (termed Ablation). In the ablation experiment, the filter optimizing layer was frozen and replaced by a constant curve (the transmittance spectrum of filter FEL0400 shown in Fig. 9) during training. Only parameters in HS module were updated and optimized.

Fig. 6. Spectral difference maps between the recovered hyperspectral images and ground truth in the experiments with simulated filtered RGB.

Download Full Size | PDF

In Fig. 6, the spectral difference maps of two sample scenes are presented. The first sample shows an image of some colorful dried flowers and green leaves. The second one shows some hand-painted color patches. Compared with pHSCNN in the fourth row, both SC (first row) and HS net (second row) exhibit larger deviations from the ground-truth. Specifically, SC suffers from greater errors especially at long wavelengths. The HS net fails to give a good prediction on green and blue patches around 470 nm in the second sample. The results from pHSCNN share more resemblance with the ground-truth no matter on short or long wave bands. It indicates that our proposed method achieves a robust performance with a higher recovery fidelity. The third row demonstrates the results from Ablation experiments with an unoptimized filter. It is inferior to pHSCNN on some pixels with large error margin. For example, pHSCNN effectively shrinks the error range on the red flower in the first sample. It also strongly reduces the deviation around the yellow patch in the second sample. The comparisons indicate that the pairwise-input mechanism as well as the filter optimization in pHSCNN can greatly boost the performance of hyperspectral estimation.

Besides spectral difference map, the spectral profile across the whole wave bands gives another perspective to evaluate the hyperspectral recovery. As shown in Fig. 7, the full-band spectral responses of selected pixels from different methods are plotted. Both SC (blue dashed) and HS net (magenta dashed) depart from the ground-truth with larger margin than pHSCNN (red solid). Specifically, the blue lines tend to deviate from the ground-truth (black solid) at wavelength larger than 600 nm while the magenta lines suffer from larger error within short wavelength less than 500 nm. Only pHSCNN can achieve accurate predictions in full bands. Besides perceptual quality, numerical metrics were calculated to evaluate performances quantitively. In our study, spectral angle mapper (SAM), peak signal-to-noise ratio (PSNR) and SSIM are employed, which take the form of

(8)$$SAM = \parallel arccos\left\langle {\hat {H}({x,y} ),H({x,y} )} \right\rangle \parallel_1$$

(9)$$PSNR = \frac{1}{N}\mathop \sum \limits_i^N\;\;20{\log _{10}}\;\;\left( {\frac{{\max {H_i}}}{{\parallel{{\hat {H}}_i} - {{ H}_i}\parallel_2}}} \right)$$

Fig. 7. Spectral responses of selected points from different hyperspectral recovery methods.

Download Full Size | PDF

Among three metrics, SAM calculate the arccosine value of the inner product of two spectral vectors. If the two vectors are highly correlated, a small angle will be obtained. Hence a minor SAM indicates a more accurate reconstruction. Oppositely, SSIM and PSNR accentuate the spatial similarity with higher values indicating better result. For each validation scene, SSIM and PSNR are averaged over the entire wavebands. SAM takes the averaged value over all spatial pixels. The averaged values of three metrics on all validation images are listed in Table 1. pHSCNN achieves an outstanding increase in PSNR and SSIM and achieves an obvious reduction in SAM. It verifies that pHSCNN shows significant improvements compared to other previous methods.

Table 1. Numerical Results of Different hyperspectral recovery Models

View Table

Besides spectral-averaged value, the SSIM curves from 440 nm to 670 nm with 10 nm increment for two selected are also attached in Fig. 8. From 440 nm to 670 nm, our method outperforms SC method and HS net with the highest SSIM.

Fig. 8. SSIM curves at different wavelengths for two selected scenes. (a)(b) the RGB image of sample 1 and 2. (c)(d) the SSIM curve at different wavelengths for sample 1 and 2.

Download Full Size | PDF

3.4 Experiments with commercial candidate filter

In practice, the hyperspectral ground-truth is difficult and expensive to obtain. The temporary pHSCNN structure still need hyperspectral ground-truth to simulate ${{\mathbf{{\rm I}}}^{{F_{opt}}}}$ which limits its real application. After training pHSCNN, we already have prior information about the optimal filter ${F_{opt}}$ (Shown in Fig. 9). It reflects a zigzagging transmittance profile with two peaks around 550 nm and 600 nm. The zigzagging pattern poses great difficulty in fabricating accurately. This is because the only constraint on ${F_{opt}}$ during training is to take values nonnegative and upper bounded by one. There is no constraint on gradient or smoothness of transmittance spectrum. Limited by the hardware, we compromised to commercial candidate filter to evaluate the performance of optimal filter in real application. We select a commercial multi-band pass filter (termed FF01, Semrock bandpass filter FF01-465/537/623-25) which shares most resemblance with ${F_{opt}}$ to represent the optimal filter. Two other filters were also utilized to compare with FF01, namely a longpass filter (termed FEL0400, Thorlabs longpass filter #FEL0400) and a neutral density filter (termed NE03B, Thorlabs neutral density filter # NE03B-B). The filtered RGB image is directly captured using our HSI system instead of simulation. We collected the RGB images with the corresponding filters and retrained the modified pHSCNN. Part of the validation results is presented in Fig. 10. As shown in Fig. 10, the recovery corresponding to filter FF01 exhibits the highest accuracy among the three filters. In the yellow boxed area, reconstruction using filter FEL0400 shows obvious error. In the same area, model with filter FF01 gives a successful prediction with negligible deviation from ground-truth. Similarly in red boxed region, results using filter NE03B experiences an error boost on 630 nm and 670 nm. However, model with filter FF01 outputs a robust recovery. An appropriate interpretation is that both the learned optimal filter and filter FF01 feature sharp variations in transmitted spectrum, providing extra spectral features beyond the RGB image without filter. In contrast, the filter FEL0400 and NE03B exhibit more evenly distributed transmitted spectrum with less extra spectral information. As expected, the reconstructions from model with filter FF01 superb the rest.

Fig. 9. The normalized spectral transmittance of the learned optimized filter and FEL0400, NE03B and FF01.

Download Full Size | PDF

Fig. 10. Spectral difference maps between the recovered hyperspectral image and ground truth in the experiments of modified pHSCNN.

Download Full Size | PDF

4. Conclusion

In conclusion, we proposed a pairwise-image-based hyperspectral convolutional neural network (pHSCNN) which reconstructs hyperspectral signature from two RGB images and outputs the spectral transmittance of the optimal filter. Based on our dual-camera HSI system, a real-captured hyperspectral-RGB dataset was built and utilized to train pHSCNN. The experimental results show that pHSCNN outperforms the prior methods both on perceptual quality and numerical metrics no matter with simulated or real-captured filtered RGB. In the current study, we demonstrate the concept to improve the accuracy in recovering hyperspectral information from two different RGB images. The same concept can be utilized to further improve the performance with three RGB images. In the meantime, instead of optimizing the filter in the imaging path, we can also optimize and change the illumination spectrum to capture the additional images. This approach could be much more practical to implement in applications with active illumination. In the future, we will incorporate manufacturing feasibility into network design. One potential solution is to introduce constrain terms in the model to obtain optimal filters which can be easily and accurately manufactured. Another approach is to incorporate thin film design parameters (total layer number, thickness, and material of each layer) into network variables so that the network can optimize the filter and output thin film parameters at the same time.

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data underlying the presented results are available from [20].

References

1. G. Lu and B. Fei, “Medical hyperspectral imaging: a review,” J. Biomed. Opt. 19(09), 1–24 (2014). [CrossRef]

2. X. Yang, Y. Ye, X. Li, R. Y. K. Lau, X. Zhang, and X. Huang, “Hyperspectral Image Classification With Deep Learning Models,” IEEE Trans. Geosci. Remote Sens. 56(9), 5408–5423 (2018). [CrossRef]

3. N. A. Hagen and M. W. Kudenov, “Review of snapshot spectral imaging technologies,” Opt. Eng. 52(9), 090901 (2013). [CrossRef]

4. B. Arad and O. Ben-Shahar, “Sparse Recovery of Hyperspectral Signal from Natural RGB Images,” in ECCV, 19–34 (2016).

5. Y. Jia, Y. Zheng, L. Gu, A. Subpa-Asa, A. Lam, Y. Sato, and I. Sato, “From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping,” in 2017 IEEE International Conference on Computer Vision (ICCV) (2017), pp. 4715–4723.

6. Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu, “Hscnn: Cnn-based hyperspectral image recovery from spectrally undersampled projections,” in ICCV, 518–525 (2017).

7. Z. Shi, C. Chen, Z. Xiong, D. Liu, and F. Wu, “Hscnn+: Advanced cnn-based hyperspectral recovery from rgb images,” in CVPR, 939–947 (2018).

8. J. Zhang, Y. Sun, J. Chen, D. Yang, and R. Liang, “Deep-learning-based hyperspectral recovery from a single RGB image,” Opt. Lett. 45(20), 5676–5679 (2020). [CrossRef]

9. B. Arad and O. Ben-Shahar, “Filter selection for hyperspectral estimation,” in ICCV, 3153–3161 (2017).

10. Y. Fu, T. Zhang, Y. Zheng, D. Zhang, and H. Huang, “Joint camera spectral sensitivity selection and hyperspectral image recovery,” in ECCV, 788–804 (2018).

11. S. Nie, L. Gu, Y. Zheng, A. Lam, N. Ono, and I. Sato, “Deeply learned filter response functions for hyperspectral reconstruction,” in CVPR, 4767–4776 (2018).

12. Y. Fu, Y. Zou, Y. Zheng, and H. Huang, “Spectral reflectance recovery using optimal illuminations,” Opt. Express 27(21), 30502–30516 (2019). [CrossRef]

13. A. Chakrabarti and T. Zickler, “Statistics of real-world hyperspectral images,” in CVPR 2011 (2011), pp. 193–200.

14. D. H. Foster, K. Amano, S. M. C. Nascimento, and M. J. Foster, “Frequency of metamerism in natural scenes,” J. Opt. Soc. Am. A 23(10), 2359–2372 (2006). [CrossRef]

15. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

16. W. Zhang, H. Song, X. He, L. Huang, X. Zhang, J. Zheng, W. Shen, X. Hao, and X. Liu, “Deeply learned broadband encoding stochastic hyperspectral imaging,” Light: Sci. Appl. 10(1), 1–7 (2021). [CrossRef]

17. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv Prepr. arXiv1412.6980 (2014).

18. M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup, “Efficient subpixel image registration algorithms,” Opt. Lett. 33(2), 156–158 (2008). [CrossRef]

19. X. Tu, O. J. Spires, X. Tian, N. Brock, R. Liang, and S. Pau, “Division of amplitude RGB full-Stokes camera using micro-polarizer arrays,” Opt. Express 25(26), 33160–33175 (2017). [CrossRef]

20. https://wp.optics.arizona.edu/ualiangaol/publications/experimental-data/

	SAM $↓$	PSNR $↑$	SSIM $↑$
SC	0.138	31.179	0.808
HS net	0.099	34.251	0.910
Ablation	0.080	36.988	0.936
pHSCNN	0.078	37.309	0.939

pHSCNN: CNN-based hyperspectral recovery from a pair of RGB images

Abstract

1. Introduction

2. Method

2.1 Mechanisms to increase the accuracy

2.2 Structure of pHSCNN

2.3 Implementation details

3. HSI system and experiments

3.1 Dual-camera HSI system

3.2 Image preprocessing

3.3 Experiments with simulated filtered RGB

3.4 Experiments with commercial candidate filter

4. Conclusion

Disclosures

Data Availability

References

Data Availability

Cited By

Figures (10)

Tables (1)

Equations (9)

Optics Express