DQN based single-pixel imaging

Zhirun Wang; Wenjing Zhao; Aiping Zhai; Aiping Zhai; Peng He; Dong Wang; Dong Wang; Dong Wang

doi:10.1364/OE.422636

1. Introduction

Single-pixel imaging (SPI) allows for cost-efficiency and resolution enhancement, and thus it has potential in various imaging areas such as terahertz imaging [1,2], scattering imaging [3], infrared imaging [4,5], microscopic imaging [6], and 3D imaging [7,8]. SPI derived from quantum ghost imaging [9], which was followed by thermal light ghost imaging [10], until the computational ghost imaging [11], realized the authentic “single-pixel imaging”. However, a high sampling ratio is required for the reconstruction of a high-resolution image. How to reconstruct a high-quality image from fewer samplings, and thus reduce imaging time is one of the important issues of current researches.

To reduce the sampling ratio of SPI, the classic compressed sensing (CS) based on random subsampling of sparse or compressible signals was first proposed [12,13]. CS recovers the original signal from fewer samplings, which breaks the limitation of Shannon's sampling theorem. Although reducing the sampling ratio, CS increases the time of reconstruction because of the complexity of the reconstruction algorithm, so that rapid imaging is difficult to achieve.

Then, SPIs based on deterministic orthogonal transforms were presented which reconstructed the image relatively faster with good fidelity. There are SPIs based on Fourier transform (FSPI) [14–19], Hadamard transform (HSPI) [20–24], discrete cosine transform [25–27], wavelet transform [28,29], Krawtchouk moments transform [30] and so on. For this type of SPIs, to perfectly restore an $M \times N$ pixels image, at least $M \times N$ samplings are required. According to the characteristics of the orthogonal transforms, methods that artificially plan the sampling path to achieve high imaging quality at a low sampling ratio were proposed [22–24]. However, these methods are relatively complicated and not optimal, since different transforms and target objects demand different sampling paths, resulting in that it even could not work when the transform has no obvious characteristics. There is still a lack of a down-sampling strategy suitable for different orthogonal transforms based SPIs without the need for artificial planning.

Recently, the rapid development of artificial intelligence has brought subversive improvements in research methods for solving traditional problems in various fields, such as computational imaging to which SPI belongs. Deep learning has been used for the classification of moving objects based on single-pixel detection [31] and improving the imaging fidelity of SPI [32,33]. There also have been researches on deep learning based down-sampling methods for SPI [34–37]. In which, the deep learning algorithms utilized are data-driven, which require large datasets for the training, bringing problems of data collection and selection. Deep Reinforcement Learning (DRL) [38,39] is a rapidly developed branch of artificial intelligence, avoiding issues of data collection and selection. It has been successfully applied in the field of robotics and resource allocation [40]. DRL could be applied to optimize the sampling path in an orthogonal transform based SPI (OT-SPI).

Inspired by this, we proposed a down-sampling method for OT-SPIs using Deep Q-learning Network (DQN) [38,39] in DRL. We verified the proposed method on different OT-SPIs, e.g. HSPI and FSPI. The proposed method obtained the optimal sampling strategies in the transform domains for automatical down-sampling of the OT-SPIs, hence accelerating the imaging speed by setting the sampling ratio. Our method can achieve better imaging performance for different transforms and target objects, especially at low sampling ratios, while avoiding artificial planning the sampling strategies for the different transforms as the normal way does.

2. Principle

SPI uses a bucket or single-pixel detector to measure a series of intensity coefficients, and reconstructs the object through an inversion algorithm, which follows:

(1)$$y = \phi x + \eta ,$$

where $\phi $ is a matrix, each row of it represents a pattern to be projected, transforming x into y in the transform domain, y is the collected light intensity sequence and each element of it corresponds to a pattern in $\phi $, x is the target object and $\eta $. is ambient noise. The selection of the pattern depends on the transform employed. The acquisition time can be shortened by reducing the number of illumination patterns, however, this leads to loss of information, so that the imaging quality will degenerate. To accelerate imaging speed while degrading as little as possible of imaging quality, sampling strategies are elaborately planned in the normal way, and thus patterns that contribute more to reconstruction can be projected first. Nevertheless, the planned sampling strategies may not be the optimal ones.

Here, we take the HSPI and FSPI as examples to explain the proposed method. As shown in Fig. 1, the spectra of the two images after Hadamard transform (HT) and Fourier transform (FT) present significantly distinct distributions. For FT, the low-frequency components are focused on the center of the spectra as shown in Fig. 1(a) and (b), while for HT, the low-frequency components are concentrated on the upper left corner of the spectra as shown in Fig. 1(c) and (d). Besides, the distributions of the spectra of the ‘Cameraman’ image and ‘Pirate’ image after the same transform also show a slight discrepancy as can be seen from the comparison between Fig. 1(a) and (b) as wl as Fig. 1(c) and (d). The centralization characteristic of the spectra for the orthogonal transforms provides the possibility for the down-sampling reconstruction of a target object with the OT-SPIs, while the significant discrepancy in the spectra for different transforms, as well as the slight discrepancy in the spectra for different target objects, makes the sampling path artificially planning for accelerating OT-SPIs complicated since different transforms and target objects demand different sampling paths. As can be noted in Fig. 1, the zig-zag sampling path is more favorable for HT while the circular sampling path is more preferred for FT [41]. The shortcoming of the sampling path artificially planning as the normal way does is particularly prominent, since it may not work effectively when the spectra distribution of a target object after the transform has no obvious characteristics. Also, the circular sampling path may not be optimal for reconstructing a target object with a Fourier spectrum of radiation-like shape as shown in Fig. 1(a).

Fig. 1. Spectra of (a) ‘Cameraman’ image and (b) ‘Pirate’ image with the same resolution of 128×128 after FT, and spectra of (c) the ‘Cameraman’ image, and (d) ‘Pirate’ image after HT.

Download Full Size | PDF

To solve this problem, we proposed an optimized sampling method using DQN, which considers the sampling as decision-making, and the improvement of the reconstructed image as feedback, to obtain an optimal sampling strategy for an OT-SPI. The principle of DQN can be found in Appendix 1. Figure 2 depicts the schematic of the proposed sampling method using the DQN.

Fig. 2. Schematic of the proposed sampling method using the DQN. (a) Coefficient matrix generated for the training of the DQN based SPI. (b) Top: sampling path optimization process. Bottom: the single-step exploration of the DQN. (c) Down-sampling SPI using the optimized sampling path.

Download Full Size | PDF

As illustrated in Fig. 2(a), we first transformed the target object to obtain its spectrum in the transform domain, namely the coefficient matrix. In simulations, since the target image is available, the coefficient matrix can be obtained directly by an orthogonal transform of the image. As for experiments, the orthogonal transform can be achieved by projecting complete orthogonal patterns onto the target object and collecting the reflected intensities respectively to form the coefficient matrix.

Then, this coefficient matrix is utilized for training the DQN based SPI to obtain an optimized sampling path, which corresponds to the order of projection patterns as shown on the right side of Fig. 2(b). In the training process, the sampling path of the coefficient measurement of the target can be simply represented by sampling the coefficient matrix by the state matrix in DQN. The state of the DQN is in the form of a binary matrix, termed state matrix. State matrix is utilized as a flag of the current state of sampling by setting the coefficient of the sampled position to 1 and the unsampled position to 0. The optional action space of the DQN is the unsampled position, and the output of the action is the coordinates of the corresponding positions in the optional action space. The state matrix coefficient of the corresponding position of the current action is set to 1 when an action is taken, transforming the current state to the next.

As shown in Fig. 2(b), the state matrix is initialized to all 0 when an episode began. Each episode contains M${\times} $N steps for an image with a resolution of M${\times} $N. By training through steps and episodes, a sequence of state matrixes can be predicted by DQN, obtaining a sequence of patterns to be projected onto the target for optimal down-sampling of the OT-SPI. The order of projection patterns corresponds to the optimal sampling path of the coefficient measurement there.

One loop of a single step is shown at the bottom of Fig. 2(b). The agent of DQN decides the action according to the current state matrix, then takes the action to transit to the next state. After that, a partial coefficient matrix is obtained by calculating the Hadamard product of the next state matrix and the coefficient matrix. Then the reconstructed image of the target object can be obtained by an inverse transform of the partial coefficient matrix. Finally, the structural similarity (SSIM) [42] between the reconstructed image and image of the target object is fed back to the agent of DQN for calculating e reward following Eq. (2) below.

At a given sampling ratio $\beta $ set by users, the proposed method aims to reconstruct the target with as high as possible imaging fidelity following the predicted sampling path as shown in Fig. 2(b), so that every single-step sampling on the sampling path should contribute to the imaging fidelity as much as possible. Regarding the SSIM as the assessment metric for evaluating imaging fidelity, the reward function is related to the improvement of SSIM of current sampling, based on the fact that training of DQN inclines to maximize the reward:

(2)$$r = \alpha \times [SSIM(s + 1) - SSIM(s)],$$

where r is the reward, s and s+1 are the current and the next state, SSIM(s) and SSIM(s+1) are the SSIM value of the current and next state. The training process of the DQN is shown in Algorithm 1.

oe-29-10-15463-i001

After the training process above, an optimized sampling path that corresponds to the order of projection patterns, highlighted by the dash line rectangle is shown at the bottom of the right side of Fig. 2(b). It is shown in Fig. 2(c) that the target can be reconstructed at arbitrarily set sampling ratio based upon the optimal sampling path obtained, in the meanwhile guarantee the relatively optimal imaging fidelity.

3. Simulation

For comparing the proposed method with the artificial planning methods, SPIs using the two kinds of down-sampling methods were simulated, including HSPI using the optimal sampling path obtained by DQN (DQN-HSPI), HSPI using adaptive sampling along the zigzag path (AZ-HSPI), FSPI using adaptive sampling along the circular path (AC-FSPI) and FSPI using the optimal sampling path obtained by DQN (DQN-FSPI). The network of DQN is composed of two fully connected layers and the number of neurons in each hidden layer is 400. Figure 3 shows the reconstructed results of the ‘Cameraman’ image with a resolution of 64×64 at the sampling ratio of 5%, 10%, 20%, 40%, and 60%. Table 1 gives the calculated peak signal-to-noise ratio (PSNR) and SSIM of the reconstructed images for quantitative comparison.

Fig. 3. The reconstructed ‘Cameraman’ images at different sampling ratios using (a) AZ-HSPI, (b) AC-FSPI, (c) DQN-HSPI, and (d) DQN-FSPI.

Download Full Size | PDF

Table 1. PSNR and SSIM of the reconstructed images using the different methods

View Table | View all tables in this article

From the comparisons between Fig. 3(a) and (b) as well as Fig. 3(c) and (d), the DQN based methods i.e. DQN-HSPI and DQN-FSPI perform better than the artificial planning methods i.e. AZ-HSPI and AC-FSPI respectively, especially at low sampling ratios, since DQN based method can adopt a superior sampling path, leading to better effectiveness of reconstruction, as compared to the artificial planning method. For example, it can be seen that at a low sampling ratio of 5%, the shape of the tripod in the reconstructed image is hardly recognized for AZ-HSPI, while DQN-HSPI still can reconstruct the outline of the tripod.

As can be noted in Table 1 and Fig. 3, as the sampling ratio increases, the reconstructed results from both of the two kinds of down-sampling methods improve, overall the reconstructed results of the DQN based method are still better. At a high sampling ratio of 60%, the ‘Cameraman’ image all can be adorably reconstructed as shown in Fig. 3. As can be seen in Table 1, the differences in the SSIM values for the two kinds of down-sampling methods gradually decrease with the increase of the sampling ratio, which suggests the difference in the performance of the two kinds of down-sampling methods gradually shrinks. This is reasonable since the primary information of the natural image is mostly concentrated in the low-frequency area and the small details of it are in the high-frequency region.

To further study the influence of noise on the proposed method, DQN-HSPI and DQN-FSPI were simulated at sampling ratios of 5%, 10%, 20%, 40%, and 60% with the noise level ranging from 0dB to 20dB. Figure 4 shows the reconstructed results of the ‘Cameraman’ image with a resolution of 64×64. The calculated PSNR and SSIM are given in Table 2 and 3 for quantitative comparison.

Table 2. PSNR and SSIM of the reconstructed images at different noise levels using DQN-HSPI

View Table | View all tables in this article

Table 3. PSNR and SSIM of the reconstructed images at different noise levels using DQN-FSPI

View Table | View all tables in this article

Fig. 4. The reconstructed ‘Cameraman’ images using (a) DQN-HSPI and (b) DQN-FSPI at different sampling ratios with different levels of noise added.

Download Full Size | PDF

As shown in Fig. 4, Table 2 and Table 3, at the noise level of 20 dB, the SSIM values of the reconstructed images using DQN-HSPI and DQN-FSPI are increased when the sampling ratio increases from 5% to 20%. And then, they are gradually reduced a little bit. When the noise level is high, e.g. 0 dB, the quality of the reconstructed images using DQN-HSPI and DQN-FSPI are both gradually deteriorated as the sampling ratio increases. This is because when the sampling ratio is high, more noise is collected by DQN-HSPI and DQN-FSPI, which contaminates the coefficient matrices, especially the high-frequency components.

4. Experiment

Regarding the hypothesis above, experimental verifications were performed. The structure of the experimental system is shown in Fig. 5. Patterns are generated by PC and projected by a commercial projector (ACER V36X). The intensity sequence of the reflected light containing information of the target was collected by a single-pixel detector (KG-PR-200K-A-FS) and sampled by the DAQ (NI DAQ USB-6216), which was controlled by PC for OT-SPIs using the proposed method. The training of the DQN based OT-SPIs for generations of the optimal sampling paths, corresponding to the orders of projection patterns, was performed on the python tensorflow2.0 framework.

Fig. 5. Schematic diagram of the experimental setup.

Download Full Size | PDF

The reconstructed images of the target i.e. a tortoise model, obtained by AZ-HSPI and DQN-HSPI at the sampling ratios of 5%, 10%, 20%, 40%, and 60%, are shown in Fig. 6. To facilitate quantitative comparisons of the results, the PSNR and SSIM of the reconstructed images are calculated and shown in Table 4.

Fig. 6. The results with a resolution of 64×64 reconstructed at different sampling ratios using (a) AZ-HSPI and (b) DQN-HSPI, (c) and (d) are accordingly the coefficient matrices at the different sampling ratios.

Download Full Size | PDF

Table 4. PSNR and SSIM of the reconstructed images obtained by AZ-HSPI and DQN-HSPI

View Table | View all tables in this article

As can be seen in Figs. 6(a)-(b) as well as Table 4, similar to the simulation results, DQN-HSPI performs better at low sampling ratios than AZ-HSPI does while the relative high-quality images can be obtained at the sampling ratio of 60% using both methods.

As shown in Fig. 6(c), for AZ-HSPI, the distributions of the coefficient matrices are rudely truncated by the artificially planned sampling path at different sampling ratios, rendering regular triangles in different sizes. However, as noted in Fig. 6(d), unlike AZ-HSPI, the distributions of the coefficient matrices for DQN-HSPI are not artificially limited but freely distributed there, interestingly in spite of this, the overall distribution tendency of the coefficient matrices in Fig. 6(d) has some similarities to that for AZ-HSPI, which explains why the performance of DQN-HSPI is better than that of AZ-HSPI at a low sampling ratio since the proposed DQN-HSPI can give the optimal sampling path for collecting as much useful information as possible in whole transformation domain.

It is worth noting that when the sampling ratio increases, i.e. $\beta $=60% as shown in Table 4, AZ-HSPI performs better in SSIM than DQN-HSPI does in the experiments, which is a bit not agreed with the simulations shown in Table 1. This is because that in practice there exists inevitable noise that contaminates the coefficient matrices, especially the high-frequency components collected by DQN-HSPI are more likely to be contaminated by the noise during the experiment and therefore degrades the imaging fidelity, which is agreed well with the simulations in Fig. 4, Table 2, and Table 3. While the AZ-HSPI reduces the collection of unnecessary high-frequency components contaminated by the noise, and thus the influence of the high-frequency noise is eliminated. Besides, the simulations have suggested the differences in the SSIM values for the two methods should gradually shrink with the increase of the sampling ratio.

We also experimentally demonstrate SPI based on Fourier transform using the two kinds of down-sampling methods, i.e. AC-FSPI and DQN-FSPI. The reconstructed images of the same target, obtained by the methods at the sampling ratios of 3%, 5%, 10%, 20%, and 30%, are shown in Fig. 7. The PSNR and SSIM of the reconstructed images are calculated and shown in Table 5. As can be seen in Figs. 7(a)-(b) as well as Table 5, when the sampling ratio is less than 20%, the reconstruction results of DQN-FSPI are obviously better than that of AC-FSPI. With the increase of the sampling ratio, the differences in the SSIM values for the two methods gradually shrink which is agreed with the preceding simulations.

Fig. 7. The results with a resolution of 64×64 reconstructed at different sampling ratios using (a) AC-FSPI and (b) DQN-FSPI, (c) and (d) are accordingly the coefficient matrices at the different sampling ratios.

Download Full Size | PDF

Table 5. PSNR and SSIM of the reconstructed images using AC-FSPI and DQN-FSPI

View Table | View all tables in this article

As shown in Fig. 7(c), for AC-FSPI, the distributions of the coefficient matrices are rudely truncated by the artificially planned sampling path at different sampling ratios, rendering regular circulars in different sizes. However, as noted in Fig. 7(d), unlike AC-FSPI, the distributions of the coefficient matrices for DQN-FSPI are not artificially limited but freely distributed there. Interestingly, the distributions of the coefficient matrices for DQN-FSPI are diamond-shaped which seems to have some similarities to that for AC-FSPI, since the proposed DQN-HSPI can give the optimal sampling path for collecting as much useful information as possible in the whole transformation domain. The coefficient matrices in Fig. 7(d) clearly demonstrate that DQN-FSPI inclines to give a higher priority to the positions which have a large absolute coefficient in the transform domain. Positions of small absolute coefficient also include sometimes for its possible higher contribution under the metric of SSIM used in the training. This explains why the performance of DQN-FSPI is better than that of AC-FSPI at a low sampling ratio. Therefore, our proposed method is suitable for SPIs based on different orthogonal transforms, e.g. HT and FT, and is capable of obtaining an optimal sampling strategy directly, and thus it requires no artificial planning of the sampling path there, which eliminates the influence of the imperfect sampling path planning on the imaging performance.

It should be pointed out that the proposed DQN based OT-SPI technique is trained using a single target scene, which is normally consisted of objects and background. Changing the scene may result in the degradation of the performance of it. To test the adaptivity of the proposed technique, we first put four white bars on a black background and use the DQN based OT-SPIs, i.e. DQN-HSPI and DQN-FSPI, to image it at the sampling ratios of 10%, 20%, 40%, 60%, and 100%. The imaging results are shown in Fig. 8(a) and (c). Then, we put a cup ahead there to change the scene and use DQN-HSPI and DQN-FSPI following the same sampling path to image it. The imaging results are shown in Fig. 8(b) and (d). The targets can still be reconstructed which suggests that the technique has some adaptivity but with more or less a bit of degradation in the imaging quality there. The reasons are as follows: First, the large coefficients in the transform domain containing more information of the target scene concentrate on the low-frequency component mostly for orthogonal transforms. The sampling path in down-sampling demonstrates a similar tendency, making the adaptivity possible. Second, the target scene does not change drastically, thus the optimized sampling path is adaptable to the scene changed. Thus, the DQN-HSPI and DQN-FSPI still can work as shown in Fig. 8. The results suggest that the DQN based OT-SPIs have the potential for applications involving video monitoring, in which the target scene normally does not change drastically. To further make the proposed technique more generalized for completely different target scenes, we may resort to regularization [43] and meta-reinforcement learning (Meta-RL) [44–46], which is what we are going to do next.

Fig. 8. The reconstructed results of a target scene (a) and its changes (b) using DQN-HSPI, the results of the target scene (c), and its changes (d) using DQN-FSPI.

Download Full Size | PDF

5. Discussion

Through the above simulations and experiments, we have demonstrated SPIs based on HT and FT using our proposed DQN based method (i.e. DQN-HSPI and DQN-FSPI) and compared it with SPIs based on HT and FT using the artificial planning method (i.e. AZ-HSPI and AC-FSPI). The proposed DQN-HSPI and DQN-FSPI perform better than AZ-HSPI and AC-FSPI at low sampling ratios. Since, unlike AZ-HSPI and AC-FSPI, the distributions of the down-sampled coefficient matrices are confined within a partial regular region of the matrices, and the down-sampling is rudely forced to follow the artificially planned path which normally is not optimal. However, for DQN-HSPI and DQN-FSPI, the distributions of the down-sampled coefficient matrices are not confined within a partial regular region of the matrices but freely distributed on the matrices and the down-sampling is optimized by the DQN. Thanks to the DQN, the proposed technique is capable of obtaining an optimal sampling strategy directly, therefore, it requires no artificial planning of the sampling path there, which eliminates the influence of the imperfect sampling path planning on the imaging performance. As compared with the artificial planning method, in the proposed method, the inevitable training process of the DQN needs to take some time, however, the training process only needs to be done once before imaging, after the optimal sampling path is obtained, the imaging process is very fast and straightforward. Besides, to accelerate the training process, parallel processing, more robust training strategies for convergence, and more powerful processors could be used. Although the proposed method has been demonstrated using the Hadamard and Fourier transforms, it can also be applied to SPIs using other transforms such as discrete cosine transform, wavelet transform [28,29], Krawtchouk moments transform [30], and so on.

We speculate that the image quality assessment metric may be an important element in the sampling path acquirement. The reward in the proposed method is relevant to the improvement of the SSIM which is an image quality assessment metric concerning the structural information of objects. Therefore, DQN prefers the sampling path that increases the SSIM fast. A larger coefficient in the transform domain which represents the low-frequency component promotes the reconstruction more in most cases, while the high-frequency component with a smaller value of coefficient sometimes improves fidelity more for its possibly higher contribution for SSIM increasing then. Other image quality assessment metrics such as PSNR can be used as required.

There have been reported several deep learning based SPIs [32–37], in which the neural network is often used to fit the relationship between down-sampling image [35] or the coefficients in the transform domain [37] and the ground truth. However, a large data set is required in deep learning for its generalization and performance. The selection of the dataset has a great impact on reconstruction, and the preparation of a large dataset may be laborious. As compared with them, DQN is introduced to SPIs for optimizing the down-sampling in this work. In the training, only a single target scene is used. However, to some extent, it still has adaptivity to image changes of the target scene but with more or less degradation in imaging quality. To thoroughly solve this issue, we may resort to regularization and Meta-RL for improving the generalization of DQN in the future.

6. Conclusion

In summary, a DQN-based down-sampling method for SPI is proposed in which artificial planning is not needed and an optimal strategy for down-sampling for a target object can be obtained. The proposed method was verified on HSPI and FSPI through simulations and experiments. As shown in simulated and experimental results, the proposed method performs better than the artificial planning method especially at low sampling ratios following the sampling strategy optimized by DQN, eliminating the influence of the imperfect sampling path planning on the imaging performance. Besides HSPI and FSPI, the proposed method is also suitable for other OT-SPIs for down-sampling. The DQN based OT-SPIs have the potential for applications involving video monitoring, in which the target scene normally does not change drastically.

Appendix 1: Principle of DQN

In DRL, the learning agent interacts with the environment by executing actions and receiving observations and rewards. Every interaction with the environment, the agent selects an action $a \in A$ at state $s \in S$ where A is the optional action space and S is the state space, aiming to find an optimal strategy $\pi $. DQN is derived from Q-learning [47], in which the agent assigns a state-action value Q which is the function of the current state s and the action adopted a to estimate the value of an action in a given state under the optimal strategy $\pi $. The decision is made according to the Q value.

(3)$$a = \arg {\max\nolimits _a}{Q_\pi }(s,a),$$

The revision of the strategy is achieved by continuous optimization of the Q value. The Q value optimization process can be realized by iteration using the temporal difference [48]:

(4)$$Q(s,a) \leftarrow Q(s,a) + \alpha [r + \gamma \max Q(s + 1,a + 1) - Q(s,a)],$$

where s denotes the current state, a denotes current action, s+1 and a+1 are the next state and action, r denotes the reward of the current action, and $\gamma $ denotes the discount factor.

The Q-learning algorithm is incapable of solving high-dimensional issues while DQN can deal with this problem by combining Q-learning with the neural network. DQN describes the decision-making process by setting the current state and Q value as the input and output of the network respectively. The update of the Q value corresponds to the network update. The network parameters are optimized according to the loss function shown in Eq. (5).

(5)$$L(\theta ) = {E_{\{ s,a,r,s + 1\} \sim M}}[{(r + \gamma \max Q(s + 1,a + 1;{\theta ^ - }) - Q(s,a;\theta ))^2}],$$

where L is the loss function, M is the historical experience memory, which stores the previous status that the agent has gone through. Every interaction with the environment, the four-tuple {s, a, r, s+1} is stored in M so that the previous experience can be used to optimize the network. ${\theta ^ - }$ and $\theta $ are the parameters of two different networks, where $\theta $ is used for training, ${\theta ^ - }$ is used to store the previous network parameters only. Exchange the parameters of two networks i.e. ${\theta ^ - }$ and $\theta $ timely can speed up the convergence of the network.

Funding

National Natural Science Foundation of China (61805167); Taiyuan University of Technology (NO. TYUTRC-2019); .

Acknowledgment

We thank the support of the National Natural Science Foundation of China and Taiyuan University of Technology. We thank the Editors and Reviewers a lot for their efforts to help us improve the manuscript during this difficult time due to COVID-19.

Disclosures

The authors declare no conflicts of interest.

References

1. W. L. Chan, K. Charan, D. Takhar, K. F. Kelly, R. G. Baraniuk, and D. M. Mittleman, “A single-pixel terahertz imaging system based on compressed sensing,” Appl. Phys. Lett. 93(12), 121105 (2008). [CrossRef]

2. C. M. Watts, D. Shrekenhamer, J. Montoya, G. Lipworth, J. Hunt, T. Sleasman, S. Krishna, D. R. Smith, and W. J. Padilla, “Terahertz compressive imaging with metamaterial spatial light modulators,” Nat. Photonics 8(8), 605–609 (2014). [CrossRef]

3. E. Tajahuerce, V. Dur, P. Clemente, E. Irles, F. Soldevila, and P. Andr, “Image transmission through dynamic scattering media by single-pixel photodetection,” Opt. Express 22(14), 16945–16955 (2014). [CrossRef]

4. M. P. Edgar, G. M. Gibson, R. W. Bowman, B. Sun, N. Radwell, K. J. Mitchell, S. S. Welsh, and M. J. Padgett, “Simultaneous real-time visible and infrared video with single-pixel detectors,” Sci. Rep. 5(1), 10669 (2015). [CrossRef]

5. N. Radwell, K. J. Mitchell, G. M. Gibson, M. P. Edgar, R. Bowman, and M. J. Padgett, “Single-pixel infrared and visible microscope,” Optica 1(5), 285–289 (2014). [CrossRef]

6. V. Studer, J. Bobin, M. Chahid, H. S. Mousavi, E. Candes, and M. Dahan, “Compressive fluorescence microscopy for biological and hyperspectral imaging,” Proc. Natl. Acad. Sci. 109(26), E1679–E1687 (2012). [CrossRef]

7. R. J. Woodham, “Photometric method for determining surface orientation from multiple images,” Opt. Eng. 19(1), 191139 (1980). [CrossRef]

8. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D Computational imaging with single-pixel detectors,” Science 340(6134), 844–847 (2013). [CrossRef]

9. T. B. Pittman, Y. H. Shih, D. V. Strekalov, and A. V. Sergienko, “Optical imaging by means of two-photon quantum entanglement,” Phys. Rev. A 52(5), R3429–R3432 (1995). [CrossRef]

10. A. Gatti, E. Brambilla, M. Bache, and L. A. Lugiato, “Ghost imaging with thermal light: comparing entanglement and classical correlation,” Phys. Rev. Lett. 93(9), 093602 (2004). [CrossRef]

11. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]

12. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]

13. M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

14. B. Xu, H. Jiang, H. Zhao, X. Li, and S. Zhu, “Projector-defocusing rectification for Fourier single-pixel imaging,” Opt. Express 26(4), 5005 (2018). [CrossRef]

15. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Fast Fourier single-pixel imaging via binary illumination,” Sci. Rep. 7(1), 12029 (2017). [CrossRef]

16. L. Bian, J. Suo, X. Hu, F. Chen, and Q. Dai, “Efficient single pixel imaging in Fourier space,” J. Opt. 18(8), 085704 (2016). [CrossRef]

17. H. Jiang, S. Zhu, H. Zhao, B. Xu, and X. Li, “Adaptive regional single-pixel imaging based on the Fourier slice theorem,” Opt. Express 25(13), 15118–15130 (2017). [CrossRef]

18. J. Huang, D. Shi, K. Yuan, S. Hu, and Y. Wang, “Computational-weighted Fourier single-pixel imaging via binary illumination,” Opt. Express 26(13), 16547–16559 (2018). [CrossRef]

19. Z. Zhang, S. Liu, J. Peng, M. Yao, G. Zheng, and J. Zhong, “Simultaneous spatial, spectrum, and 3D compressive imaging via efficient Fourier single-pixel measurements,” Optica 5(3), 315–319 (2018). [CrossRef]

20. Q. Yi, Z. Lim, L. Li, G. Zhou, and F. Chau, “Hadamard-transform-based hyperspectral imaging using a single-pixel detector,” Opt. Express 28(11), 16126–16139 (2020). [CrossRef]

21. L. Martínez-León, P. Clemente, Y. Mori, V. Climent, J. Lancis, and E. Tajahuerce, “Single-pixel digital holography with phase-encoded illumination,” Opt. Express 25(5), 4975–4984 (2017). [CrossRef]

22. M. Sun, L. Meng, M. P. Edgar, M. J. Padgett, and N. Radwell, “Russian Dolls ordering of the Hadamard basis for compressive single-pixel imaging,” Sci. Rep. 7(1), 3464 (2017). [CrossRef]

23. W. Yu, “Super sub-Nyquist single-pixel imaging by means of cake-cutting Hadamard basis sort,” Sensors 19(19), 4122 (2019). [CrossRef]

24. X. Yu, R. I. Stantchev, F. Yang, and E. Pickwell-Macpherson, “Super sub-Nyquist single-pixel imaging by total variation ascending ordering of the Hadamard basis,” Sci. Rep. 10(1), 9338 (2020). [CrossRef]

25. B. Liu, Z. Yang, X. Liu, and L. Wu, “Coloured computational imaging with single-pixel detectors based on a 2D discrete cosine transform,” J. Mod. Opt. 64(3), 259–264 (2017). [CrossRef]

26. Q. Guo, H. Chen, Y. Wang, M. Chen, S. Yang, and S. Xie, “High-speed real-time image compression based on all-optical discrete cosine transformation,” Proc. SPIE 10076, 100760E (2017). [CrossRef]

27. Y. Chen, S. Liu, X. Yao, Q. Zhao, X. Liu, B. Liu, and G. Zhai, “Discrete cosine single-pixel microscopic compressive imaging via fast binary modulation,” Opt. Commun. 454, 124512 (2020). [CrossRef]

28. F. Rousset, N. Ducros, A. Farina, G. Valentini, C. D’Andrea, and F. Peyrin, “Adaptive basis scan by wavelet prediction for single-pixel imaging,” IEEE Trans. Comput. Imaging 3(1), 36–46 (2017). [CrossRef]

29. M. Alemohammad, J. R. Stroud, B. T. Bosworth, and M. A. Foster, “High-speed all-optical Haar wavelet transform for real-time image compression,” Opt. Express 25(9), 9802–9811 (2017). [CrossRef]

30. Y. Chen, X. Yao, Q. Zhao, Q. Zhao, S. Liu, X. Liu, C. Wang, and G. Zha, “Single-pixel compressive imaging based on the transformation of discrete orthogonal Krawtchouk moments,” Opt. Express 27(21), 29838–29853 (2019). [CrossRef]

31. Z. Zhang, X. Li, S. Zheng, M. Yao, G. Zheng, and J. Zhong, “Image-free classification of fast-moving objects using “learned” structured illumination and single-pixel detection,” Opt. Express 28(9), 13269–13278 (2020). [CrossRef]

32. T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, A. Shiraki, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. 413, 147–151 (2018). [CrossRef]

33. S. Rizvi, J. Cao, and Q. Hao, “Deep learning based projector defocus compensation in single-pixel imaging,” Opt. Express 28(17), 25134–25148 (2020). [CrossRef]

34. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]

35. Y. He, G. Wang, G. Dong, S. Zhu, H. Chen, A. Zhang, and Z. Xu, “Ghost imaging based on deep learning,” Sci. Rep. 8(1), 6469 (2018). [CrossRef]

36. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]

37. F. Li, M. Zhao, Z. Tian, F. Willomitzer, and O. Cossairt, “Compressive ghost imaging through scattering media with deep learning,” Opt. Express 28(12), 17395–17408 (2020). [CrossRef]

38. Z. Wang, T. Schaul, M. Hessel, H. V. Hasselt, M. Lanctot, and D. Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581 (2015).

39. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 (2013).

40. S. S. Mousavi, M. Schukat, and E. Howley, “Deep reinforcement learning: an overview,” arXiv preprint arXiv: 1806.08894v1 (2018).

41. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard single-pixel imaging versus Fourier single-pixel imaging,” Opt. Express 25(16), 19619–19639 (2017). [CrossRef]

42. W. Zhou, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

43. J. Farebrother, M. C. Machado, and M. Bowling, “Generalization and regularization in DQN,” arXiv preprint arXiv:1810.00123 (2016).

44. J. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blundell, D. Kumaran, and M. Botvinick, “Learning to reinforcement learn,” arXiv preprint arXiv:1611.05763 (2016).

45. J. Wang, Z. Kurth-Nelson, D. Kumaran, D. Tirumala, H. Soyer, J. Z. Leibo, D. Hassabis, and M. Botvinick, “Prefrontal cortex as a meta-reinforcement learning system,” Nat. Neurosci. 21(6), 860–868 (2018). [CrossRef]

46. Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, L. Sutskever, and P. Abbee, “RL²: Fast reinforcement learning via slow reinforcement learning,” arXiv preprint arXiv:1611.02779 (2016).

47. R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning (MIT University, 1998).

48. R. S. Sutton, “Learning to predict by the method of temporal difference,” Mach Learn 3(1), 9–44 (1988). [CrossRef]

		Sampling ratio
	method	5%	10%	20%	40%	60%
PNSR (dB)	AZ-HSPI	15.04	15.80	16.43	22.06	23.53
	DQN-HSPI	19.68	21.23	23.91	27.84	32.63
	AC-FSPI	17.13	18.19	22.62	23.35	24.76
	DQN-FSPI	20.54	22.56	25.24	28.49	32.66
SSIM (%)	AZ-HSPI	45.8	57.0	70.5	82.3	90.8
	DQN-HSPI	55.5	65.2	76.6	88.2	94.2
	AC-FSPI	54.1	69.2	81.1	90.2	93.7
	DQN-FSPI	64.6	75.7	85.1	91.3	95.5

		Sampling ratio
	Noise level	5%	10%	20%	40%	60%
PNSR (dB)	0dB	15.75	14.71	12.61	9.72	7.85
	10dB	19.17	19.95	20.19	19.40	17.82
	20dB	19.62	21.11	23.39	25.70	26.71
SSIM (%)	0dB	22.4	23.6	22.0	15.5	12.0
	10dB	44.7	49.2	50.6	47.4	44.4
	20dB	54.2	62.7	70.8	74.8	73.9

		Sampling ratio
	Noise level	5%	10%	20%	40%	60%
PNSR (dB)	0dB	16.18	15.19	12.43	9.36	7.63
	10dB	19.96	20.90	20.50	19.19	17.62
	20dB	20.48	22.38	24.59	26.13	26.92
SSIM (%)	0dB	27.9	26.4	20.4	11.9	10.0
	10dB	52.8	55.8	51.3	45.7	41.4
	20dB	63.4	72.0	76.7	76.3	75.1

		Sampling ratio
	Method	5%	10%	20%	40%	60%
PNSR (dB)	AZ-HSPI	17.29	20.20	22.66	25.86	28.98
PNSR (dB)	DQN-HSPI	24.54	26.42	29.44	33.67	35.13
SSIM (%)	AZ-HSPI	49.2	65.4	83.6	91.7	97.2
SSIM (%)	DQN-HSPI	71.5	79.9	88.4	95.2	96.1

		Sampling ratio
	Method	3%	5%	10%	20%	30%
PNSR (dB)	AC-FSPI	18.57	19.27	20.36	23.54	25.62
PNSR (dB)	DQN-FSPI	24.36	25.47	27.54	30.44	32.20
SSIM (%)	AC-FSPI	57.4	63.5	76.8	89.3	93.3
SSIM (%)	DQN-FSPI	71.3	77.5	88.6	92.6	95.0

DQN based single-pixel imaging

Abstract

1. Introduction

2. Principle

3. Simulation

4. Experiment

5. Discussion

6. Conclusion

Appendix 1: Principle of DQN

Funding

Acknowledgment

Disclosures

References

Cited By

Figures (8)

Tables (5)

Equations (5)

Optics Express