Lightweight computational spectrometer enabled by learned high-correlation optical filters

Zhen Liu; Haojie Liao; Lin Yang; Lin Yang; Guiqiang Du; Guiqiang Du; Lei Wei; Lei Wei; Yansong Wang; Yansong Wang; Yao Chen; Yao Chen; Yao Chen

doi:10.1364/OE.495087

1. Introduction

In recent years, spectrometers have been widely used in remote sensing, meteorology, agriculture, and other fields [1]. The most popular approaches for spectral sensing are monochromators and Fourier transform interferometers, both of which require delicate instruments with long optical paths and bulky dimensions, thus limiting their use in low-cost applications and tough environments. An alternative way to achieve a long optical path in a compact dimension is to have light bounce back and forth many times between interfaces on an integrated optical chip, such as in photonic crystal(PC) labs [2]. To increase the spectral resolution, the compressed sensing (CS) method was introduced. In 2005, Tao et al. [3] first proposed CS, which proved mathematically that unknowns could be solved with fewer equations under certain conditions, and this condition was determined as the restricted isometry property (RIP). In 2007, Brady et al. [4] proposed a computational spectrum technology based on CS, which quickly became a research hotspot. This technology can complete spectrum reconstruction under the condition that the number of spectral channels is much larger than that of filters, which is known as sparse signal reconstruction.

Based on the CS theory, a variety of signal reconstruction algorithms have been proposed, such as orthogonal matching pursuit (OMP) [5] and GPSR [6]. Due to the large number of calculations required for signal reconstruction guided by RIP [7], Donoho simplified the restricted isometry property and proposed the incoherence criterion of the measurement matrix (filter transmission matrix) [8]. Under the guidance of incoherence criterion, spectral reconstruction can be achieved by considering only the non-correlation of the filters, which means that any two row vectors in the matrix of filter transmittance curves are almost orthogonal. The incoherence criterion has promoted the development of various computational spectrometers [9–11].

However, satisfying the incoherence criterion requires minimum correlation between any two filters, which brings great challenges to the design and fabrication of broadband filters. Contrarily, the training process of the filters in NNs only pays attention to the conformity of the reconstructed spectrum and the ground truth, which does not require the incoherence criterion to be satisfied. The NN proposed in this study can achieve very good reconstruction accuracy with highly correlated broadband optical filters. In computation spectroscopy, the encoding procedure is completed by random optical filters and focal plane detectors which are alike in various spectrometers. The decoding procedure, which is completed by the reconstruction algorithm of an NN, often requires storage in megabytes, and the reconstruction procedure takes up a large amount of computing resources. As a result, it is difficult to achieve the embedded integration of miniature spectrometers with NNs. Contrarily, GPSR as a representative conventional reconstruction algorithm, has the advantage of taking up less storage and consumes a lot less computing resources. But its spectral reconstruction accuracy is relatively poor due to its dependence on non-correlated optical filters. Therefore, this study innovatively proposes the adoption of broadband optical filters trained by NN as the encoder, and the optimization of the conventional GPSR as the decoder, which we refer to as the NN-GPSR method. Notably, the training process of broadband optical filters by NN is a one-time deal and it does not consume computing resources of the embedded system. This NN-GPSR can achieve high spectral reconstruction accuracy and take up less storage of the spectrometer system at the same time.

The reminder paper is organized as follows. Section 2 introduces the GPSR method and the recovery of the spectrum with the randomly designed PC slabs. Section 3 introduces the NN architecture and training process. Section 4 describes the construction of NN-GPSR and its performance. Section 5 concludes the paper.

2. GPSR in a computational spectrometer

The mathematical expression of CS is an NP-hard problem, solving which directly requires finding all non-zero elements; this is almost impossible to complete in big data reconstruction [7]. Consequently, many approximate methods have appeared as solutions, among which the most typical method is to approximate the problem as a convex optimization problem. GPSR is a relatively mature convex optimization algorithm in the field of CS, which uses the gradient projection method to iteratively approach the extreme point of the problem until the gradient is less than the preset value. Compared with other CS reconstruction algorithms, GPSR has the advantage of higher reconstruction accuracy [6]. This section introduces the GPSR method and discusses how the spectrum reconstruction is performed.

2.1 Basic principle of GPSR

If the target spectral signal x enters the camera through the filters, the observed signal y is obtained and the filter transmittance is $\phi $. Ignoring the effect of noise, y can be expressed as follows:

(1)$$\begin{array}{c} {y = \phi x} \end{array}$$

According to CS, most signals in nature can be represented as a sparse signal with a certain sparse base. Therefore, x can be expressed as follows:

(2)$$\begin{array}{c} {x = \psi z} \end{array}$$

where $\psi $ is the sparse base matrix and z is the sparse representation of x with the sparse base matrix $\psi $. In z, the absolute values of a few elements are much larger than zero and the values of most elements are equal to zero or approximate to zero. At this point, the observed signal y can also be expressed as follows:

(3)$$\begin{array}{c} {y = \phi x = \phi \psi z = Az} \end{array}$$

where A is the observation matrix and represents the product of the filter transmittance $\phi $ and the sparse base matrix $\psi $.

From Eq. (2), if x is required to be solved, z needs to be solved first. However, the number of rows in matrix $A{\; }$ is smaller than that of columns at this time, and the solution of signal z is an ill-posed problem that cannot be solved with the least square method. When $\phi $ and $\psi $ satisfy the incoherence criterion and there is noise, the mathematical expression for solving z is as follows:

(4)$$\begin{array}{c} {\mathop {\textrm{min}}\limits_z \frac{1}{2}|{|{y - Az} ||_2^2 + \tau } ||z |{|_1}} \end{array}$$

Equation (4) is a basis pursuit denoising (BPDN) problem, where $\tau $ is a hyperparameter greater than zero. At this point, the problem of solving z has been transformed into a two-way optimization problem about the reconstruction error and signal sparsity. GPSR is the conventional convex optimization algorithm for solving the BPDN problem.

Solving z with GPSR requires that z is expressed as the difference between two non-negative vectors:

(5)$$\begin{array}{c} {z = u - v,u\mathrm{\geqslant }0,v\mathrm{\geqslant }0} \end{array}$$

This representation converts the BPDN problem into a standard bound-constrained quadratic program:

(6)$$\begin{array}{c} {\mathop {\textrm{min}}\limits_Z \; \; {c^T}Z + \frac{1}{2}{Z^T}BZ \equiv F(Z )\; \; \; s.t.\; Z\mathrm{\geqslant }0} \end{array}$$

where $Z = \left[ {\begin{array}{c} u\\ v \end{array}} \right],{\; }b = {A^T}y,{\; }c = \tau {1_{2n}} + \left[ {\begin{array}{c} { - b}\\ b \end{array}} \right],{\; }B = \left[ {\begin{array}{cc} {{A^T}A}&{ - {A^T}A}\\ { - {A^T}A}&{{A^T}A} \end{array}} \right]$.

GPSR iterates in the direction of the negative gradient:

(7)$$\begin{array}{c} {{Z_{k + 1}} = {Z_k} - \alpha \nabla F({{Z_k}} )} \end{array}$$

where the iteration step length $\alpha $ needs to ensure that the reduction of $F(Z )$ in the iteration direction is maximized:

(8)$$\begin{array}{c} {\alpha = arg\textrm{min}F({{Z_k} - \alpha \nabla F({{Z_k}} )} )} \end{array}$$

Equation (8) has an exact solution:

(9)$$\begin{array}{c} {\alpha = \frac{{\nabla F{{({{Z_k}} )}^T}\nabla F({{Z_k}} )}}{{\nabla F{{({{Z_k}} )}^T}{A^T}A\nabla F({{Z_k}} )}}} \end{array}$$

After the iteration step length and iteration direction are determined, an arbitrary value ${Z_0}$ is given as the initial value, and then ${Z_0}$ moves for the initial step α along the iteration direction to obtain ${Z_1}$. The iteration step is continuously reduced in exponential form, as follows:

(10)$$\begin{array}{c} {{\alpha _k} = {\beta ^k}\alpha ,\beta \in ({0,1} ),k = 0,1, \ldots } \end{array}$$

The iteration stop condition is as follows:

(11)$$\begin{array}{c} {F({{Z_{k + 1}}} )\mathrm{\leqslant }F({{Z_k}} )+ \mu \nabla F{{({{Z_k}} )}^T}({{Z_{k + 1}} - {Z_k}} ),\mu \in [{0,0.5} ]} \end{array}$$

The iteration is temporarily stopped when the stop condition is met. The above process is counted as one iteration and ${Z_k}$ is obtained. When the absolute value of the gradient $|{\nabla F({{Z_k}} )} |$ is small enough, ${Z_k}$ is the extreme point of the function:

(12)$$\begin{array}{c} {|{\nabla F({{Z_k}} )} |\mathrm{\leqslant }tolP} \end{array}$$

where $tolP$ is a positive hyperparameter approximating zero. When Eq. (12) is satisfied, the whole algorithm is terminated, ${Z_k}$ is transformed into sparse signal z, and z is the output. If this condition is not satisfied, ${Z_k}$ is used as the initial value ${Z_0}$ for the next iteration and iteration occurs again until Eq. (12) is satisfied.

2.2 Filter design

With the parameters of GPSR determined, the form of the observation matrix A has a significant effect on the spectral reconstruction performance. To be specific, the observation matrix A is the product of the filter transmittance $\phi $ and sparse base matrix $\psi $. Therefore, both the filter transmittance and the sparse base matrix contribute to the reconstruction performance. According to the incoherence criterion, spectral reconstruction can be performed by optimizing the non-correlation of the filters. Therefore, filter design is crucial in conventional CS algorithms such as GPSR.

Six hundred filters are designed using PC as the material and adopting the method of “random design + selection” in this study. Then four groups of filters are selected with the correlation coefficient as the main criterion, and each group contains 15 optical filters. PC is a dielectric material with a photonic band gap structure that can be used to modulate the transmittance at different wavelengths. The transmittance of a PC filter is closely related to the geometry and size of the lattice structure. Compared with optical thin films, PC has more parameters and more complex structures, which is more advantageous in random filter design. In this study, twelve lattice structures are designed and simulated with COMSOL Multiphysics 5.6. The structure types are shown in the Fig. 1. Different lattice structures have different photonic band gap distributions. As shown in Fig. 2, the band gaps in the square lattices show an aligned distribution. While in the hexagonal lattices, every row of band gaps is aligned with the gaps of the previous row.

Fig. 1. Lattice structures of PC slabs.

Download Full Size | PDF

Fig. 2. Band gap distributions in different lattices.

Download Full Size | PDF

Based on these 12 lattice structures, 600 PC filters are simulated by changing their lattice constants and pore parameters. The range of lattice constants range from 350 to 750 nm, and pore structure parameters must be smaller than lattice constants. The top layer material of these filters are silicon whose thickness is 175 nm, and the substrate is made of silicon dioxide. The correlation coefficients are used to characterize the non-correlation property which can be calculated by Eq. (13).

(13)$$\begin{array}{c} {P({x,y} )= \frac{{\mathop \sum \nolimits_{i = 1}^N ({{x_i} - \bar{x}} )({{y_i} - \bar{y}} )}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^N {{({{x_i} - \bar{x}} )}^2}} \sqrt {\mathop \sum \nolimits_{i = 1}^N {{({{y_i} - \bar{y}} )}^2}} }}} \end{array}$$

where, x and y each represent a filter transmittance, ${x_i}$ and ${y_i}$ represent the transmittance corresponding to the ith wavelength of filter curve, $\bar{x}$ and $\bar{y}$ represent the average transmittance of filter curve. The value of the correlation coefficient is between -1 and 1. The smaller the absolute value of the correlation coefficients is, the better the non-correlation of the transmittance. With the correlation coefficient as the main criterion, we select four non-correlation filter group A, B, C, and D, each group has fifteen filters. Transmittance curves of group A are shown in Fig. 3 as an example. The spectral range of filters is 400–700nm@1 nm. We calculate the absolute value of the correlation coefficients of the filters in each group and show them in Fig. 4. To demonstrate the correlation coefficients of the filter group more clearly, we plot the correlation coefficient distribution histogram of these four groups, in which the step length of the correlation coefficients is set to 0.1 in the range of 0–1. The correlation coefficient distribution histograms of these four groups are shown in Fig. 5. The mean correlation coefficients of these four filter groups are 0.2231, 0.2136, 0.2208, and 0.281.

Fig. 3. Transmittance curves of PC filter group A.

Download Full Size | PDF

Fig. 4. Correlation coefficients of the filters in four filter groups.

Download Full Size | PDF

Fig. 5. Correlation coefficient distribution histograms of four filter groups.

Download Full Size | PDF

2.3 Sparse base and reconstruction performance

In the real filter design and production, it is difficult to achieve a good non-correlation of any two filters in the filter groups, so the choice of a sparse base has an important impact on the reconstruction results. Research on CS indicates that a sparse base matrix usually adopts a discrete cosine transform matrix (DCT) [12] and a discrete wavelet transform matrix (DWT) [13], which are commonly used in the field of image reconstruction. In this study, a DCT, DWT, unit diagonal matrix E, and Gaussian base matrix are used for spectral reconstruction, and the reconstruction performances of different sparse bases are compared and discussed.

The essence of the sparse base matrix is a projection domain in which the spectral signal is projected as a sparse signal, similar to that of the coordinate transformation matrix. Therefore, the complete coordinate system has a basic spectral reconstruction capability and its corresponding matrix is the unit diagonal matrix E. It is feasible to perform spectral reconstruction using E.

According to the Monte Carlo method, we use nearly one thousand real spectral data obtained by ASD spectrometer (LiSen Optics, iSecField-HH) as the original spectrum data, which is 1-nm resolution data in the spectral range of 400–700 nm. Filter group A is used for encoding, and these spectral data is reconstructed with four sparse bases, respectively. Figure 6 shows the reconstruction performance of different sparse bases on the same original spectrum. Table 1 shows the mean values of the reconstruction parameters for different sparse bases. The GPSR algorithm runs on an Intel Core i5-11400.

Fig. 6. Reconstructed images of the same spectrum with different sparse bases

Download Full Size | PDF

Table 1. Comparison of the reconstruction performances of different sparse bases

View Table | View all tables in this article

As shown in Table 1, DCT has the smallest MSE at $8.0195 \times {10^{ - 4}}$ and the highest reconstruction accuracy among the four sparse bases. The MSE of DWT is $8.7309 \times {10^{ - 4}}$, which is slightly less accurate than DCT. The MSE of E and the Gaussian base matrix is larger compared to DCT and DWT, reaching $1 \times {10^{ - 3}}$, indicating that the spectral reconstruction ability of E and the Gaussian base matrix has some gap compared to DCT. In addition, the single reconstruction time of DCT is significantly shorter than that of other sparse bases, which is 2/3 of DWT and close to 1/3 of the Gaussian base matrix. In summary, DCT has a fast reconstruction speed, relatively high reconstruction accuracy, and strong spectral reconstruction capability.

Since the DCT sparse base matrix is essentially a two-dimensional DCT transform coefficient matrix for which the number of spectral channels is equal to M × N, the DCT sparse base matrices with the same size are not necessarily identical. Therefore, we have studied the reconstruction performance of different DCT sparse base matrices which have different M and N. However, it’s found that the spectral reconstruction performance of different groups of M and N does not differ much when using randomly designed filters as encoder. In order to correspond to the optimized DCT sparse base in the later section, the parameters of the DCT sparse base in this section are set as M = 1, N = 300.

Additionally, we perform spectral reconstruction with the four different filter groups designed as described in Section 2.2 to investigate the spectral reconstruction performance of different filter groups. This experiment uses DCT as the sparse base matrix with a spectral range of 400–700@1 nm. Table 2 shows the comparison of the reconstruction parameters for these four filter groups.

Table 2. Comparison of the reconstruction performance of the four filter groups

View Table | View all tables in this article

In Table 2, the reconstruction times of all filter groups are basically the same among these four filter groups, all reaching $1 \times {10^3}\; \mu s$. The MSE of group D is the smallest at $5.3408 \times {10^{ - 4}}$, and the MSE of group B is the largest at $1.0742 \times {10^{ - 3}}$. There is not much difference between the MSE of the four groups, indicating that the reconstruction ability of different filter groups does not differ much for the same sparse base.

3. NNs in computational spectrometers

An NN is a complex computing structure based on the biological neuron model. NNs have solved many problems that were difficult to solve in the past, such as the NP-hard problem mentioned in this study, which Hopfield solved for the first time using an NN [14]. The work of Hopfield proved the powerful computing capability of NN and promoted its rapid development. Currently, NNs have been widely used in a variety of fields [15–17]. Kulkarni applied an NN in the field of image compression reconstruction [18], which was the first study that solved the CS problem with NN. Subsequently, Zhang used a feedforward neural network (FNN) for spectral reconstruction [19]. Bao used an NN to improve the reconstruction accuracy of the conventional spectral reconstruction algorithm [20].

In this study, a convolutional neural network (CNN) is constructed. The first layer is a fully connected layer (FC layer) without a bias term, which is used to simulate the transmittance of the filters. This layer plays the role of encoding. The decoding network is comprised of two FC layers and three convolutional layers (Conv layer), which is used for spectral reconstruction. The first FC layer converts the encoded signal into a matrix, and then three Conv layers extracts feature information from the output of the previous layer, the last FC layer transforms the matrix output from the previous Conv layer into a reconstructed spectrum in the end. In this NN, the ReLU activation function is used except for the first and last layers. The above structure can also be described as (FC without bias) – FC – Rule - (Conv - Rule)3 – FC. The network structure diagram is shown in Fig. 7. The parameter settings of CNN network are as follow. Layer 1 is a $300 \times 15$ FC layer. Layer 2 is a $15 \times 169$ FC layer and converts its output signal into a $13 \times 13$ matrix. Layer 3 has 64 convolution kernels that are $11 \times 11$. Layer 4 has 32 kernels that are $1 \times 1$. Layer 5 has a $7 \times 7$ kernel. Layer 6 is a $169 \times 300$ FC Layer and converts its input matrix into a $1 \times 169$ signal.

Fig. 7. NN structure.

Download Full Size | PDF

3.1 Dataset training

The NN is trained using the CAVE [21] and ICVL [22] hyperspectral image datasets. Both datasets have a spectral range of 400–700@10 nm, which is 1,650,000 spectral data in total. To achieve higher spectral resolution, we perform a data enhancement operation on these two datasets by increasing the number of spectral channels with the least-square fitting method and obtaining training sets with the spectral range of 400–700@1 nm.

3.2 Loss function

The input and output of the NN are spectral data in 400–700 nm @ 1 nm, representing the original and reconstructed spectrum. The goal of training the NN is to minimize the error between the original and reconstructed spectrum.

(14)$$\theta = arg\textrm{min}||{S - \hat{S}} |{|_2}$$

where $\theta $ is the set of NN parameters, S is the original spectrum, and $\hat{S}$ is the reconstructed spectrum.

By directly using Eq. (14) as the loss function of NN, the error can be reduced to a small amount in the ideal case and the most accurate filter transmittance can be obtained. However, according to CS, the most accurate filter transmittance matrix tends to be a random matrix without other constraints. This has been demonstrated in several past studies. The obtained transmittance matrix is a Gaussian random matrix when only the reconstruction accuracy is considered [23]. The excessively random transmittance causes great difficulty in the design and fabrication of filters, especially broadband ones. Therefore, the loss function needs to be optimized so that it can decrease the error and ensure the manufacturability of the filters at the same time.

To have tractable optical filters, it is necessary to strictly limit the variation range of the weight parameters in the first layer, which can ensure the smoothness of the transmittance curve. The first-order and second-order derivatives of row vectors in the first layer weight coefficient matrix are optimized to be minimal:

(15)$$loss = ||{S - \hat{S}} |{|_2} + {\delta _1}\sum |{{W_{1,N - 1}} - {W_{2,N}}} |+ {\delta _2}\sum |{W{\mathrm{^{\prime}}_{1,N - 1}} - W{\mathrm{^{\prime}}_{2,N}}} |$$

where ${\delta _1}$ and ${\delta _2}$ are artificially hyperparameters, ${W_{1,N - 1}}$ is a matrix consisting of the first N-1 column vectors in the first layer weight coefficient matrix, ${W_{2,N}}$ is a matrix consisting of the last N-1 column vectors in the matrix. $W{\mathrm{^{\prime}}_{1,N - 1}}$ and $W{\mathrm{^{\prime}}_{2,N}}$ are the derivative matrices of ${W_{1,N - 1}}$ and ${W_{2,N}}$, respectively. Figure 8 compares the filter transmittance obtained from our NN with the filter transmittance obtained from WER-Net [24]. Compared with WER-Net, the loss function of our NN has one more second-order derivative term. This term can amplify the ratio between larger and smaller values, which is used to suppress the abrupt changes and reduce the changes occurred in narrow amplitude space. This also increases the smoothness of the transmittance. With the addition of the variation limitation, the transmittance of the filters trained by the first layer becomes manufacturable and can be applied.

Fig. 8. Comparison between the filter transmittance of WER-Net and NN in this study.

Download Full Size | PDF

3.3 Encoding optimization

The neurons in this study use the classical M-P neuron model, which is structured as shown in Fig. 9.

Fig. 9. M-P neuron model.

Download Full Size | PDF

The mathematical expression of the neuron model illustrated in Fig. 8 is:

(16)$$\begin{array}{c} {y = f\left( {\mathop \sum \limits_{i = 1}^n {w_i}{x_i} + {b_i}} \right) = f({{W^T}X + B} )} \end{array}$$

where $X = [{{x_1},{x_2},\ldots ,{x_n}} ]$ represents the input, $W = [{{w_1},{w_2},\ldots ,{w_n}} ]$ represents the weight values corresponding to the different inputs, $B = [{{b_1},{b_2},\ldots ,{b_n}} ]$ represents the bias terms, and $f({\cdot} )$ represents the activation function. Ignoring the effect of the activation function, the mathematical expression of a fully connected layer is as follows:

(17)$$\begin{array}{c} {Y = {W^T}X + B} \end{array}$$

The mathematical expression of the spectral encoding with the broadband spectrum filter is shown in Eq. (1). By discretizing Eq. (1), the following expression can be obtained:

(18)$$\begin{array}{c} {Y = {\Phi ^T}X} \end{array}$$

If the bias term B in Eq. (17) is set to zero, Eq. (17) and Eq. (18) have the same mathematical form; i.e., the weight value of a fully connected layer without the bias term is equivalent to the filter transmittance. Therefore, in this study, the first network structure is set as a fully connected layer without bias term to simulate the filter transmittance, which can achieve the filter encoding design.

As discussed in Section 3.2, we add a variation limitation to the loss function. However, when the NN is trained more than a certain number of times, it outputs the transmittance shown in Fig. 10. This transmittance appears to have a high-frequency perturbation in the full spectrum, which greatly increases the non-correlation, but brings great difficulty to the real production of filters. To solve this problem, a hierarchical optimization method for the NN is used in this study. After the first fully connected layer is trained and produces a sufficiently rich and producible transmittance, the training of the first layer is stopped and saved. Then the training of the reconstruction algorithm is continued. This hierarchical optimization method further ensures the manufacturability of the filters and speeds up the training of the NN.

Fig. 10. Perturbation transmittance curve.

Download Full Size | PDF

3.4 CNN in spectral reconstruction

An NN has an extremely powerful computing capability and can simulate any function with only a single hidden layer and a finite number of neurons [25]. In this study, a CNN is used for spectral reconstruction, and the mathematical expression of the convolutional layer structure is:

(19)$$\begin{array}{c} {y = \mathop \sum \limits_{i,j}^n {\theta _{i,j}}{x_{i,j}} + b} \end{array}$$

where ${\theta _{i,j}}$ represents the convolution kernel element in the i-th row and j-th column, ${x_{i,j}}$ represents the i-th row and j-th column input, and b represents the bias term.

In fact, an FNN can also perform spectral reconstruction, and some computational spectral reconstruction studies have introduced such networks [14,26], but FNNs have some problems. The training of an FNN relies on finding correlation features between the data and optimizing the network structure parameters. For the natural spectrum, the spectral data of adjacent wavelengths have a strong correlation, giving the spectral curve a slow-varying feature, and this correlation feature facilitates the training of an NN. In contrast, spectral data with distant wavelengths do not have obvious correlation features and are useless for NN training. The simulation method of an FNN acquires the features of the whole spectrum, which lacks a clear target, leading to more iterations for network convergence and occupying more computing resources. In contrast, a CNN has the advantages of weighting parameter sharing and the use of sparse connections in local receptive fields [27], which can better acquire effective spectral features in the adjacent range and avoid the problem of acquiring too many useless features. This allows a CNN to have faster convergence and higher reconstruction accuracy.

3.5 Activation function

The CS problem is an NP-hard problem and is not a simple linear problem. It can be seen from Eq. (16) and (19) that when the activation function is not considered, the NN primarily uses linear computation and is not suitable for solving nonlinear problems. Therefore, according to the universal approximation theorem, if a researcher wishes to use an NN to simulate a nonlinear problem, it is necessary to use an activation function to improve the nonlinearity of the NN [25].

Since the first layer simulates the filter encoding process, all of the hidden layers except the first layer have to use activation functions to achieve a nonlinear simulation. In a series of activation functions, the ReLU function has a better nonlinear capability. As shown in Fig. 11, the four-layer CNN using the ReLU function achieves the 25% training error rate for CIFAR-10 six times faster than the equivalent network with the Tanh activation function [28]. Therefore, the NN in this study is structured with the ReLU function as the activation function.

Fig. 11. Comparison of training speeds for ReLU and Tanh.

Download Full Size | PDF

3.6 Training and simulation discussion

3.6.1 Training process

We use the data-enhanced hyperspectral dataset obtained as described in Section 3.1 as the training model, which is divided into a training set and a test set by 10:1. Due to the large amount of data in the training set, training each spectrum individually would take up considerable computational resources and time. Therefore, we use a batch training method to train the NN, which can greatly speed up the training process and ensure training accuracy. Figure 12 shows the training error and test error after each training session. After ten training sessions, the NN shows significant convergence, and the MSE of the reconstruction spectrum has reached $1.5 \times {10^{ - 4}}$ for the training set and $1.4 \times {10^{ - 4}}$ for the test set. Continuing the training 300 times, the MSE decreases very slowly, indicating that the training of the NN is close to its limit. Notably, continued training is expensive and would consume excess time and computational resources, which could also lead to overfitting. Therefore, we stop the training at this point. The MSE of the reconstruction spectrum reaches $4.4 \times {10^{ - 5}}$ for the training set and $4.3 \times {10^{ - 5}}$ for the test set.

Fig. 12. Training error and test error of NN training.

Download Full Size | PDF

At 40 training sessions, the training is paused and the weight parameters are derived from the first layer to obtain the filter transmittance. Figure 13 shows the transmittance curves of these 15 filters. The spectral curves have excellent smoothness. At this point, keeping the first layer weight parameters constant, we continue to train the reconstruction algorithm of NN until MSE starts to converge. Figure 14 shows the correlation coefficient distribution of the NN filters. The mean correlation coefficient of the NN filters is 0.3314. Compared with the filter groups described in Section 2.2, NN filters have a relatively high correlation.

Fig. 13. Spectral curves obtained from NN training.

Download Full Size | PDF

Fig. 14. Correlation coefficient distribution histogram of NN filters.

Download Full Size | PDF

3.6.2 Simulation

After training the NN, a large number of spectral reconstruction experiments are performed to verify the reliability of the NN. We use nearly one thousand spectral data, as discussed in Section 2.3, as input, and we import these data into the NN. Some reconstruction results are shown in section 4.2 and are compared with NCC-GPSR. The mean MSE of the reconstructed spectrum can reach $2.96 \times {10^{ - 6}}$. The memory occupied by the reconstruction process is 970 MByte. After spectral reconstruction, we obtain the reconstruction algorithm documents with the storage space of 697 KByte. The NN training and reconstruction are run on an Nvidia GeForce RTX2060 platform.

To verify the physical realizability of these 15 spectral curves, we input these transmittance data into the inverse design network IDN [19] to obtain the corresponding structural parameters, and we simulate these parameters with COMSOL Multiphysics 5.6 to obtain the real transmittance of the filters. The error between the transmittance simulated by COMSOL Multiphysics 5.6 and the transmittance trained by NN is less than ${10^{ - 2}}$, indicating that the transmittance obtained from the NN can be produced.

In addition, we select two conventional CS algorithms, GPSR and orthogonal matching pursuit (OMP), and the two algorithms are used to reconstruct the spectral data described in Section 2.3. The reconstruction time for a single spectrum and MSE with NN are compared. The results are shown in Table 3.

Table 3. Comparison of the reconstruction performance for NN and the other algorithms

View Table | View all tables in this article

In Table 3, the MSE and reconstruction time of the NN are two or three orders of magnitude lower compared with the conventional algorithms, indicating that the spectral reconstruction capability of the NN is much better than that of the conventional algorithms. The NN has great potential in spectral reconstruction.

4. NN-GPSR method

An NN has a spectral reconstruction capability that is difficult to achieve with conventional algorithms, but there are some problems. The input data are based on a transmittance curve with at least hundreds of variables, and the first layer simulates the encoding process with thousands of neurons, which means that each iteration of the first layer alone generates data on the order of 100,000. The dramatic increase in the amount of data requires a large amount of system capacity, placing high demands on the storage and computing capabilities of the platform. Moreover, the storage space of reconstruction process is 970 MByte, which requires a large amount of memory. This makes it impossible for many hardware setups to perform embedded integration of this network and difficult to utilize the NN for in-situ measurements.

To solve these problems, we combine NN with the conventional CS algorithm GPSR and propose a lightweight encoding and reconstruction method with highly correlated broadband optical filters. The encoding process is performed using the filters trained by the NN, while the reconstruction algorithm uses the GPSR algorithm. This combined approach not only ensures spectral reconstruction accuracy but also greatly reduces the storage space of the reconstruction algorithm, providing the possibility of embedded integration and miniaturization.

4.1 Optimization of the reconstruction algorithm

We compare the reconstructed spectra of the conventional GPSR with that of the NN. The results are show in Fig. 15. The reconstructed spectrum of GPSR has full-spectrum perturbations. This situation also appears in other studies [20]. This perturbation not only affects the spectral reconstruction accuracy but also makes it difficult to optimize the reconstruction algorithm, while the NN reconstructs the spectrum with an excellent fit and does not show such a vigorous perturbation. After combining the NN with GPSR and performing spectral reconstruction, we find that the high-accuracy reconstruction produced by NN encoding avoids the all spectrum perturbation of GPSR, allowing the reconstruction algorithm to be further optimized.

Fig. 15. Reconstructed images comparison of conventional GPSR and NN.

Download Full Size | PDF

In this study, the GPSR reconstruction algorithm is optimized based on two aspects, and this optimization improves the reconstruction accuracy of the algorithm.

4.1.1 Sparse base optimization

As discussed in Section 2.3, we use the DCT sparse base matrix for spectral reconstruction. In fact, the DCT matrix in this study is constructed based on a two-dimensional discrete cosine transform. The two-dimensional DCT transform is a separable linear transform, which is equivalent to first performing a one-dimensional DCT along one dimension and then performing a one-dimensional DCT in the other dimension. If the input matrix is A and the output matrix is B, the mathematical expression of the two-dimensional DCT transform is as follows:

(20)$$\begin{array}{c} {{B_{pq}} = {a_p}{a_q}\mathop \sum \limits_{m = 0}^{M - 1} \mathop \sum \limits_{n = 0}^{N - 1} {A_{mn}}\textrm{cos}\frac{{\pi ({2m + 1} )p}}{{2M}}\textrm{cos}\frac{{\pi ({2n + 1} )q}}{{2N}},} \end{array}$$

where $0\mathrm{\leqslant }p\mathrm{\leqslant }M - 1,{\; }0\mathrm{\leqslant }q\mathrm{\leqslant }N - 1,{\; }{a_p} = \left\{ {\begin{array}{c} {\frac{1}{{\sqrt M }},p = 0}\\ {\sqrt {\frac{2}{M}} ,1\mathrm{\leqslant }p\mathrm{\leqslant }M - 1} \end{array}} \right.,{\; }{a_q} = \left\{ {\begin{array}{c} {\frac{1}{{\sqrt N }},q = 0}\\ {\sqrt {\frac{2}{N}} ,1\mathrm{\leqslant }q\mathrm{\leqslant }N - 1} \end{array}} \right.$, and M and N represent the number of rows and columns in the input matrix A. A is a matrix of size $M \times N$.

Equation (20) can also be expressed as:

(21)$$\begin{array}{c} {{B_{pq}} = {D_{mn,pq}}{A_{mn}}} \end{array}$$

where D is a two-dimensional DCT transform coefficient matrix, a square matrix for which the number of both rows and columns is $M \times N$. Each element in D is a product of two cosine values determined by different m, n, p, and q. When M and N are changed, the transformation coefficient matrix D is also changed.

Notably, the DCT sparse base matrix is the two-dimensional DCT transform coefficient matrix for which the number of spectral channels is equal to $M \times N$. For a spectrum dataset, the number of spectral channels remains the same, but M and N can be changed, and different M and N correspond to different DCT sparse base matrices, leading to changes in the reconstructed spectrum.

Due to the perturbation of the full spectrum produced when using GPSR, it is difficult to estimate the quality of the different DCT sparse base reconstruction performances, which all have the same order of magnitude in MSE. After combining the NN with GPSR, the transmittance of the NN filters significantly suppresses the perturbation of the spectral curve, allowing the optimization of the reconstruction algorithm based on the reconstruction accuracy of different DCT sparse base matrices.

In this study, we obtain different DCT sparse bases by varying the values of M and N and use these sparse base matrices to reconstruct the dataset described in Section 2.3 using the composite method. Figure 16 shows the mean MSE obtained with different DCT sparse base matrices, where the MSE of the reconstruction accuracy is the maximum at $1.6861 \times {10^{ - 5}}$ when $M = 20,N = 15$, and the minimum at $5.5188 \times {10^{ - 6}}$ when $M = 1,N = 300$. A difference of one order of magnitude appears between the reconstruction accuracies for different DCT sparse base matrices, indicating that the choice of the DCT sparse base can have a significant effect on spectral reconstruction. Figure 17 shows the reconstructed spectrums and the original spectrum when the MSEs of different DCT sparse bases are minimum or maximum, and there is a significant difference in the reconstruction performance.

Fig. 16. MSE of the reconstructed spectrums with different DCT sparse bases.

Download Full Size | PDF

Fig. 17. Comparison of the reconstructed and original spectrums at the minimum and maximum MSEs

Download Full Size | PDF

Fig. 18. Samples of the reconstruction results.

Download Full Size | PDF

The DCT sparse base matrix with $M = 1,N = 300$ has the minimum MSE in the DCT sparse base matrices that match the number of spectral channels in this study. Although the spectral intensity is weak in the initial and final bands of the spectrum, the sparse base with $M = 1,N = 300$ achieves a simulation with the highest reconstruction accuracy. Therefore, we use this sparse base for the spectral reconstruction in the following.

4.1.2 Applicability expansion of conjugate gradient method

The conjugate gradient method (CG) is a method for solving specific systems of linear equations. The conjugate gradient method is an iterative method for sparse linear equation systems, and this method is widely used in sparse signal reconstruction problems [29,30]. In GPSR, the conjugate gradient method is applied after the algorithm described in Section 2.1 to further optimize the previously obtained sparse signal $z$ [6].

GPSR has added application conditions for the conjugate gradient method to reduce the computational complexity and time:

(22)$$\begin{array}{c} {nonzero({z(: )} )\mathrm{\leqslant }length({y(: )} )} \end{array}$$

That is, the number of non-zero elements in the sparse signal z obtained as described in Section 2.1 is less than the number of elements in the observed signal y. However, this condition is too strict, which limits the applications of the conjugate gradient method in GPSR. Firstly, the reconstruction object of the conventional algorithm is a sparse signal in the strict sense; i.e., most of the elements in the sparse signal z are zero except for a few non-zero elements. This sparse signal brings convenience to the study, but the sparse projections of the natural spectrum are in most cases relatively sparse signals, most of whose elements are approximately zero except for a few non-zero elements. Equation (22) indicates that the relatively sparse signal cannot be optimized as it should be, which is not applicable to the reconstruction of the natural spectrum. Additionally, during spectral reconstruction using the composite method of NN and GPSR, the filter transmittance of the NN suppresses the perturbation of the spectral curve, when the elements approximating zero also contribute to the spectral reconstruction. However, Eq. (22) restricts the optimization to elements that are approximately zero.

For the above reasons, we remove the application condition of the conjugate gradient method and make it applicable for the optimization of the sparse signal z obtained directly.

4.2 Simulation

We use the optimized GPSR as the reconstruction algorithm and the NN filters as the encoder to perform spectral reconstruction, whose spectral resolution is 1 nm in the range of 400-700 nm. Furthermore, we use the real spectral data described in Section 2.3 as the original spectrums. Some of the reconstructed results are shown in Fig. 18. The results show that the mean MSE reaches $1.8409 \times {10^{ - 6}}$, and the MSE of some spectrums even reaches $9 \times {10^{ - 7}}$, which is a high reconstruction accuracy value. In Fig. 18 we compare the reconstructed spectrums of NN-GPSR with those of NN in Section 3.6. It can be seen that the reconstructed spectrums of NN-GPSR have a better fit in some details. The memory occupied by the reconstruction process is 291.3 MByte. The NN-GPSR algorithm runs on an Intel Core i5-11400.

To test the robustness of the system, we add different levels of Gaussian random noise to the NN filter transmittance, which can simulate the manufacturing error caused by the filter production process. Gaussian random noise with the standard deviations of $\sigma = 0.01$ and $0.001$ is added to the filter transmittance. We perform spectral reconstructions with different degrees of noise and compare these with the noise-free case. The results are shown in Table 4. The MSE increases slightly with the increase of the variance of the noise, but the order of magnitude is not larger than ${10^{ - 6}}$, which still maintains a high reconstruction accuracy. Moreover, the single reconstruction time does not increase significantly, which proves that the whole system has robustness and a strong tolerance capability.

Table 4. Comparison of the reconstruction performance for different random noises

View Table | View all tables in this article

4.3 Discussion

We perform spectral reconstruction with two conventional GPSR and OMP algorithms and two PCSED [19] and WER-Net [24] NN algorithms. The MSE and the reconstruction time are obtained and compared with those of the NN-GPSR composite method. The results are shown in Table 5.

Table 5. Comparison of NN-GPSR with other algorithms

View Table | View all tables in this article

Compared with the conventional algorithms, the NN-GPSR method has obvious advantages in spectral reconstruction. The reconstruction accuracy is 436 times better than that of conventional GPSR and 1924 times better than that of OMP, and the reconstruction time is only 53% of that of conventional GPSR and 35% of that of OMP. Compared with other NNs, NN-GPSR also has higher reconstruction accuracy, 294 times that of PCSED and 51 times that of WER-Net, and the reconstruction time is in the same order of magnitude. These results indicate that the NN-GPSR composite method has an excellent reconstruction ability that is comparable to or even higher than that of other NNs.

In addition, we acquire the storage of the NN reconstructed algorithm obtained as described in Section 3.6 and compare it with the storage of the reconstruction algorithm in NN-GPSR. The results are shown in Table 6. The reconstruction algorithm documents of NN-GPSR costs a storage space of 25 KByte, which is 3.59% of the NN reconstruction process. The memory of the NN reconstruction process is 970 MByte, which is difficult for embedded integration, while the reconstruction process of NN-GPSR occupies 291.3 MByte, which is 29.33% of the NN reconstruction process, leaving a lot of storage space and offering the possibility for in-situ measurement.

Table 6. Storage comparison of NN and NN-GPSR

View Table | View all tables in this article

5. Conclusion

In this study, we propose an NN-GPSR composite method of applying optical filters trained by an NN as an encoder and optimized GPSR for reconstruction in computational spectroscopy. First, we introduce the GPSR method, design four PC filter groups, and perform spectral reconstruction with these filter groups and different sparse bases. Second, the NN is constructed and the first layer is set as a fully connected layer without bias terms to simulate the filter encoding process. The filter transmittance is obtained by training the NN with the data-enhanced CAVE and ICVL spectral datasets, whose spectral range is 400–700 nm@1 nm. Building upon the analysis of GPSR and the NN, we apply the learned high-correlation optical filters as the encoder and optimize the GPSR algorithm as the decoder. The sparse base matrix in the optimized GPSR is set as the DCT sparse base matrix with $M = 1,N = 300$, and we expand the application of the conjugate gradient method in the optimized GPSR. The NN filters are combined with the optimized GPSR, which we refer to as the NN-GPSR method. We reconstruct nearly one thousand real object spectral data with the NN-GPSR method. The results show that the reconstruction accuracy of the NN-GPSR is much better than that of conventional algorithms, 436 times that of conventional GPSR, and 1924 times that of OMP, and the reconstruction time is only 53% that of conventional GPSR and 35% that of OMP. Compared with NNs, the reconstruction accuracy is 294 times that of PCSED and 51 times that of WER-Net, and the reconstruction time has the same order of magnitude. The reconstruction capability of NN-GPSR is comparable to or even higher than that of other NNs. Equally importantly, the reconstruction algorithm of NN-GPSR occupies only 25 KByte, which is 3.59% of the NN reconstruction algorithm; the reconstruction process of NN-GPSR occupies 291.3 MByte, which is 29.33% of the NN reconstruction process. The storage of NN-GPSR is much smaller than that of the NN reconstruction algorithm. Resultantly, NN-GPSR has great potential for the embedded design of a spectrometer, thus broadening the application thereof in the field of real time and in situ applications.

Funding

National Natural Science Foundation of China (12203032); Natural Science Foundation of Shandong Province (ZR2022QA030).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [21,22].

References

1. L. Huang, R. Luo, X. Liu, and X. Hao, “Spectral imaging with deep learning,” Light: Sci. Appl. 11, 61 (2022). [CrossRef]

2. Z. Wang and Z. Yu, “Spectral analysis based on compressive sensing in nanophotonic structures,” Opt. Express 22(21), 25608 (2014). [CrossRef]

3. E. J. Candes and T. Tao, “Decoding by Linear Programming,” IEEE Trans. Inform Theory 51(12), 4203–4215 (2005). [CrossRef]

4. M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14013–14027 (2007). [CrossRef]

5. D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery of sparse overcomplete representations in the presence of noise,” IEEE Trans. Inform Theory 52(1), 6–18 (2006). [CrossRef]

6. M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems,” IEEE J. Sel. Top. Signal Process. 1(4), 586–597 (2007). [CrossRef]

7. R. Baraniuk, “Compressive Sensing [Lecture Notes],” IEEE Signal Process. Mag. 24(4), 118–121 (2007). [CrossRef]

8. D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Trans. Inform Theory 47(7), 2845–2862 (2001). [CrossRef]

9. C. C. Chang and H. N. Lee, “On the estimation of target spectrum for filter-array based spectrometers,” Opt. Express 16(2), 1056–1061 (2008). [CrossRef]

10. E. Huang, Q. Ma, and Z. Liu, “Etalon Array Reconstructive Spectrometry,” Sci. Rep. 7(1), 40693 (2017). [CrossRef]

11. J. Bao and M. G. Bawendi, “A colloidal quantum dot spectrometer,” Nature 523(7558), 67–70 (2015). [CrossRef]

12. E. Candès, L. Demanet, D. Donoho, and L. Ying, “Fast Discrete Curvelet Transforms,” Multiscale Model. Simul. 5(3), 861–899 (2006). [CrossRef]

13. M. Stéphane, A wavelet tour of signal processing (Academic Press, San Diego, 1999).

14. J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational Abilities,” Proc. Natl. Acad. Sci. 79(8), 2554–2558 (1982). [CrossRef]

15. D. Kusumoto and S. Yuasa, “The application of convolutional neural network to stem cell biology,” Inflamm. Regen. 39(1), 14 (2019). [CrossRef]

16. A. A. Shatskiy and I. Y. Evgeniev, “Neural Network Astronomy as a New Tool for Observing Bright and Compact Objects,” J. Exp. Theor. Phys. 128(4), 592–598 (2019). [CrossRef]

17. M. Shi, W. Sun, T. Zhang, Y. Liu, S. Wang, and X. Song, Geology prediction based on operation data of TBM: comparison between deep neural network and soft computing methods, (IEEE, 2019), pp. 1–5.

18. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Measurements,” (IEEE, 2016), pp. 449–458.

19. W. Zhang, H. Song, X. He, L. Huang, X. Zhang, J. Zheng, W. Shen, X. Hao, and X. Liu, “Deeply learned broadband encoding stochastic hyperspectral imaging,” Light: Sci. Appl. 10(1), 108 (2021). [CrossRef]

20. J. Zhang, X. Zhu, and J. Bao, “Solver-informed neural networks for spectrum reconstruction of colloidal quantum dot spectrometers,” Opt. Express 28(22), 33656 (2020). [CrossRef]

21. F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spectrum,” IEEE Trans. Image Process. 19(9), 2241–2253 (2010). [CrossRef]

22. B. Arad and O. Ben-Shahar, “Sparse Recovery of Hyperspectral Signal from Natural RGB Images,” (Springer International Publishing, Cham, 2016), pp. 19–34.

23. L. Wang, C. Sun, M. Zhang, Y. Fu, and H. Huang, “DNU: Deep Non-Local Unrolling for Computational Spectral Imaging,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 1658–1668.

24. X. Ding, L. Yang, M. Yi, Z. Zhang, Z. Liu, and H. Liu, “WER-Net: A New Lightweight Wide-Spectrum Encoding and Reconstruction Neural Network Applied to Computational Spectrum,” Sensors 22(16), 6089 (2022). [CrossRef]

25. K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Netw. 2(5), 359–366 (1989). [CrossRef]

26. W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” B. Math Biol. 52(1-2), 99–115 (1990). [CrossRef]

27. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]

28. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM 60(6), 84–90 (2017). [CrossRef]

29. Z. Shi, S. Wang, and Z. Xu, “The convergence of conjugate gradient method with nonmonotone line search,” Appl. Math Comput. 217(5), 1921–1932 (2010). [CrossRef]

30. Y. Ou and Y. Liu, “A nonmonotone superstorage gradient algorithm for unconstrained optimization,” J. Appl. Math. Comput. 46(1-2), 215–235 (2014). [CrossRef]

	DCT	DWT	E	Gaussian
MSE	$8.0195 \times 10^{- 4}$	$8.7309 \times 10^{- 4}$	$1.0787 \times 10^{- 3}$	$1.0275 \times 10^{- 3}$
Deviation of peak wavelength position	11.81 nm	11.79 nm	9.35 nm	9.84 nm
Peak amplitude error	7.1626%	6.7565%	6.8045%	15.0856%
Reconstruction time for a single spectrum	1523.48 $μ s$	2254.42 $μ s$	3654.68 $μ s$	4475.70 $μ s$

	A	B	C	D
MSE	$8.0195 \times 10^{- 4}$	$1.0742 \times 10^{- 3}$	$9.3839 \times 10^{- 4}$	$5.3408 \times 10^{- 4}$
Deviation of peak wavelength position	11.81nm	12.57nm	11.96nm	10.96nm
Peak amplitude error	7.1626%	7.9836%	7.3412%	7.2068%
Reconstruction time for a single spectrum	1523.48 $μ s$	1450.42 $μ s$	1554.23 $μ s$	1306.62 $μ s$

	GPSR	OMP	NN
MSE	$8.0195 \times 10^{- 4}$	$3.54 \times 10^{- 3}$	$2.96 \times 10^{- 6}$
Deviation of peak wavelength position	11.81nm	12.61nm	2nm
Peak amplitude error	7.1626%	9.7851%	0.45%
Reconstruction time for a single spectrum	1523.48 $μ s$	2268.31 $μ s$	174.76 $μ s$

	$σ = 0$	$σ = 0.001$	$σ = 0.01$
MSE	$1.8409 \times 10^{- 6}$	$1.8492 \times 10^{- 6}$	$1.9002 \times 10^{- 6}$
Deviation of peak wavelength position	1 nm	2 nm	2 nm
Peak amplitude error	0.29%	0.23%	0.33%
Reconstruction time for a single spectrum	819.9 $μ s$	849.8 $μ s$	868.8 $μ s$

	GPSR	OMP	PCSED	WER-Net	NN-GPSR
MSE	$8.02 \times 10^{- 4}$	$3.54 \times 10^{- 3}$	$5.413 \times 10^{- 4}$	$9.374 \times 10^{- 5}$	$1.84 \times 10^{- 6}$
Reconstruction time for a single spectrum	1523.48 $μ s$	2268.31 $μ$ s	963.37 $μ s$	343.89 $μ s$	819.9 $μ s$

Lightweight computational spectrometer enabled by learned high-correlation optical filters

Abstract

1. Introduction

2. GPSR in a computational spectrometer

2.1 Basic principle of GPSR

2.2 Filter design

2.3 Sparse base and reconstruction performance

3. NNs in computational spectrometers

3.1 Dataset training

3.2 Loss function

3.3 Encoding optimization

3.4 CNN in spectral reconstruction

3.5 Activation function

3.6 Training and simulation discussion

3.6.1 Training process

3.6.2 Simulation

4. NN-GPSR method

4.1 Optimization of the reconstruction algorithm

4.1.1 Sparse base optimization

4.1.2 Applicability expansion of conjugate gradient method

4.2 Simulation

4.3 Discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (18)

Tables (6)

Equations (22)

Optics Express

	NN	NN-GPSR
Algorithmic storage consumption	697 Kbyte	25 KByte
Reconstruction process memory consumption	970 MByte	291.3 MByte