## Abstract

Single-pixel imaging allows for high-speed imaging, miniaturization of optical systems, and imaging over a broad wavelength range, which is difficult by conventional imaging sensors, such as pixel arrays. However, a challenge in single-pixel imaging is low image quality in the presence of undersampling. Deep learning is an effective method for solving this challenge; however, a large amount of memory is required for the internal parameters. In this study, we propose single-pixel imaging based on a recurrent neural network. The proposed approach succeeds in reducing the internal parameters, reconstructing images with higher quality, and showing robustness to noise.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Conventional imaging techniques often use pixel array devices, such as a charge-coupled device (CCD). In contrast, single-pixel imaging (SPI) [1] only uses a single-element photodetector without spatial resolution. In SPI, engineered illumination patterns are sequentially projected on target objects. The scattered light (or transmitted light) from the target objects is collected by a lens and recorded by a single-element photodetector as measured data. Finally, signal processing is performed to reconstruct the target objects from the recorded measured data and the known illumination patterns. SPI has several advantages, including the ability to operate in a low-light environment, and high-speed and broad wavelength imaging, which is difficult for conventional CCD and complementary metal–oxide–semiconductor (CMOS) cameras. Owing to these advantages, SPI has been applied in various fields, such as bioimaging [2], remote sensing [3], object tracking [4], three-dimensional measurement [5,6], terahertz imaging [7], and holography [8,9].

The challenges of SPI are low image quality in the presence of undersampling, and a long measurement time for image reconstruction. Various methods have been proposed to solve these challenges: the inverse transform method using orthogonal basis patterns used in the Fourier transform and Hadamard transform [10–12], compressive sensing [13,14], and ghost imaging based on correlation calculation [15–17]. In addition, deep-learning-based methods [18–22] have attracted significant attention. These methods can obtain high image quality even if the number of samples is low. In addition, the calculation can be easily accelerated using graphics processing units that are tuned to deep learning calculations.

In this study, we propose SPI using a recurrent neural network (RNN) [23] to improve the image quality and measurement time with fewer parameters than for a deep neural network. In addition, our proposed method is more robust to noise than other neural network approaches. An RNN is a network for handling time-series data since it can consider previous input data. In the single-pixel imaging, input data can be considered time-series data (measured data) due to multiple illumination patterns. We assumed that each measured data has some relationship because they were obtained from a single object. In conventional methods using convolutional neural networks, measured data are inputted at once. In contrast, measured data are inputted by dividing into some blocks in our method. The information of the reconstructed image in the RNN is accumulated and updated, as a new block is entered. As a result, the quality of the final reconstructed image is improved.

The remainder of this paper is organized as follows. In Section 2, we describe the proposed method. In Section 3, we present the simulation and optical experimental results, and in Section 4, we provide our conclusions.

## 2. Proposed method

An RNN is a type of neural network that can efficiently handle time-series data due to its recursive structure, as illustrated in Fig. 1. This recursive structure allows for fewer network parameters. The left part of the figure is a block diagram of an RNN with feedback, while the right part of the figure illustrates the expansion of the RNN along the time direction. The right part of the figure implies that the RNN can naturally acquire a deep structure with fewer network parameters since the RNN along the time direction can be considered as deep layers and weight of all recursive layers are shared.

The mathematical expression of an RNN is expressed as

where $\boldsymbol{ U}$ and $\boldsymbol{ W}$ are weight parameters for the current and previous input, respectively, $\boldsymbol{ b}$ is the bias parameter, $\boldsymbol{ x}(t )$ is the input, $f(\boldsymbol{ x} )$ is the activation function, and $\boldsymbol{ h}(t )$ is the output via the activation function. The second term in (1) represents the feedback from a previous signal. RNNs have derivations, such as long short-term memory (LSTM) [24] and gated recurrent units (GRUs) [25]. LSTMs and GRUs are effective at training on long time-series data; however, their network parameters tend to be larger than those of RNNs. In this study, we use an RNN because handling long time-series data is not required; thus, the network parameters can be reduced owing to the simple structure of the RNN.Figure 2 presents our proposed method using an RNN. First, patterns are illuminated on the target object. Subsequently, measured data of the light intensity are obtained from the scattered or transmitted light. The obtained light intensities are expressed as

where*N*denotes the number of patterns, ${I_i}({x,y} )$ denotes the

*i*th illuminated pattern, $T({x,y} )$ denotes the target object, and ${s_i}$ denotes the measured light intensity. The number of light intensities is the same as the number of patterns.

Next, the measured data of the light intensity are divided into blocks of length *T* as follows:

The structure of the proposed network based on prior research [20] is presented in Fig. 3. *X* and *Y* denote the image size of the target object, while *T* and *M* denote the length of a block and the number of blocks, respectively. Therefore, $N = T \times M$. In this study, we used $N = 333$, $T = 37$, $M = 9$, and an image size of $X = \textrm{Y} = 64$. We selected random patterns and optimized patterns proposed in [20] as the illumination patterns. The network is an encoder-decoder neural network. In the training of this network, the encoder optimizes projection patterns. In the acquisition of measured data, the optimized patterns are illuminated to objects and the measured data are inputted to the decoder to obtain reconstructed images. Furthermore, we used regularization function $\mathop \sum \nolimits_i^N \mathop \sum \nolimits_y^Y \mathop \sum \nolimits_x^X {({1 + {I_i}({x,y} )} )^2}{({1 - {I_i}({x,y} )} )^2}$ as binary regularization for the optimized illumination patterns. The regularization function decreases the value as the epoch of the training increase. As a result, we can obtain the optimized patterns approximately to be −1 or +1 thank to the regularization function.

The activation function of the RNN was a hyperbolic tangent function, while the activation function of the last convolutional layer was the leaky rectified linear unit (ReLU) function. The other activation functions were also ReLU functions.

## 3. Experiment

We compared the proposed network illustrated in Fig. 3 with the conventional networks [20] illustrated in Fig. 4. The number of illumination patterns *N* is 333 for all the networks. For random illumination patterns, we compared Fig. 3 with Fig. 4(a), while for optimized illumination patterns, we compared Fig. 3 with Fig. 4(b). The method for obtaining the optimized patterns is the same for all networks. However, the optimization results are different for each network. We used the STL-10 dataset [26] to train the networks. The number of training data was 90,000, while the number of validation data was 10,000. The number of epochs for the training was determined by early stopping. We used Adam as the optimizer, and the number of parameters in each network was 873,425 in Fig. 3, 898,225 in Fig. 4(a), and 1,376,193 in Fig. 4(b).

We evaluated the proposed network with the numerical simulation used three test datasets: MNIST, Fashion-MNIST [27], and STL-10, and resized the images to $64 \times 64$. Figure 5 presents the reconstructed results of MNIST using each network, while Table 1 presents the quantitative evaluation. With random pattern illumination, the proposed network was able to reconstruct the images, whereas the conventional network was not. With optimized pattern illumination, the images reconstructed by the proposed network had less noise than those produced by the conventional network. In the quantitative evaluation, the proposed network demonstrated better performance than the conventional network on MNIST.

Figure 6 presents the reconstruction results using Fashion-MNIST, while Table 2 presents the quantitative evaluation. When using random pattern illumination, similar results were obtained to those in Fig. 5. The proposed network was able to reconstruct the images, whereas the conventional network was not. When using optimized pattern illumination, the proposed network was more effective in reconstructing sparse objects, whereas the conventional network was more effective in reconstructing dense objects.

Finally, Fig. 7 presents the reconstruction results using the STL-10 dataset, while Table 3 presents the quantitative evaluation. When using random pattern illumination, the proposed network was able to reconstruct the images, whereas the conventional network was not. When using optimized pattern illumination, both networks were able to reconstruct the images well, but the conventional network performed slightly better for almost objects, as the images reconstructed by the proposed network were blurred.

The results of these experiments indicate that the proposed network was superior to the conventional network when using random pattern illumination and for sparse objects.

Next, we evaluated the robustness of the proposed network to noise by numerical simulation adding white noise to the illumination patterns. The white noise was generated from a standard normal distribution. The values of the random pattern were 1 and 0, while the values of the optimized pattern were 1 and −1. The standard deviation $\mathrm{\sigma }$ of the white noise ranged from 0.0 to 1.0. Figures 8 and 9 present graphs of the robustness and the reconstructed images, respectively, when using random patterns with white noise. Similarly, Figs. 10 and 11 present graphs of the robustness and the reconstructed images, respectively, when using optimized patterns with white noise.

When using random pattern illumination, the robustness of the conventional network remained unchanged, as this network could hardly reconstruct the images, as illustrated in Fig. 9, whereas the proposed network maintained better image quality than the conventional network as the white noise increased. As illustrated in Figs. 10 and 11, when using optimized pattern illumination, the proposed network was less affected by noise than the conventional network.

Finally, we attempted experiment to reconstruct images of target objects with an actual optical system illustrated in Fig. 12. We use DLP light crafter 6500 to illuminate pattern, developed by Texas Instruments. Figure 13 presents the optical reconstruction results. The results indicate that the proposed network performed better than the conventional network for both random patterns and optimized patterns. In optimized pattern illumination, the values of the pattern were +1 and −1, where −1 could be obtained by subtracting a pattern by its inverse pattern [20]. These results demonstrate that the proposed network was less affected by noise than the conventional network using an actual optical system.

## 4. Conclusion

In this study, we proposed SPI based on an RNN combined with convolutional layers. Our results demonstrated that the proposed network was effective in reconstructing sparse objects and was robust to noise. The single-pixel imaging cannot reconstruct the detail of objects if the number of measured data is not enough. As a result, the spatial resolution of reconstructed images becomes low. In the proposed method, we divided measured data into blocks, so that the number of measured data per block is small. It may reduce the spatial resolution of reconstructed images per unit block. As a result, we consider that spatial resolution of reconstructed images slightly reduced compared to the conventional convolutional networks. In contrast, the proposed method has an averaging effect of reconstructed images by accumulating the previous information of reconstructed images in the RNN. The averaging effect would work well noise reduction in sparse objects. In future work, we plan to improve the proposed network to reconstruct high-quality images with not only sparse objects, but also more complex objects.

## Funding

Yazaki Memorial Foundation for Science and Technology.

## Acknowledgments

This research is supported by Yazaki Memorial Foundation for Science and Technology.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and Prospects for Single-Pixel Imaging,” Nat. Photonics **13**(1), 13–20 (2019). [CrossRef]

**2. **S. Ota, R. Horisaki, Y. Kawamura, M. Ugawa, I. Sato, K. Hashimoto, and K. Waki, “Ghost cytometry,” Science **360**(6394), 1246–1251 (2018). [CrossRef]

**3. **B. I. Erkmen, “Computational ghost imaging for remotesensing,” J. Opt. Soc. Am. A **29**(5), 782–789 (2012). [CrossRef]

**4. **D. Shi, K. Yin, J. Huang, K. Yuan, W. Zhu, C. Xie, D. Liu, and Y. Wang, “Fast tracking of moving objects using single-pixel imaging,” Opt. Commun. **440**, 155–162 (2019). [CrossRef]

**5. **B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D computational imaging with single-pixel detectors,” Science **340**(6134), 844–847 (2013). [CrossRef]

**6. **M. J. Sun and J. M. Zhang, “Single-Pixel Imaging and Its Application in Three-Dimensional Reconstruction: A Brief Review,” Sensors **19**(3), 732 (2019). [CrossRef]

**7. **L. Olivieri, J. S. Totero Gongora, L. Peters, V. Cecconi, A. Cutrona, J. Tunesi, R. Tucker, A. Pasquazi, and M. Peccianti, “Hyperspectral terahertz microscopy via nonlinear ghost imaging,” Optica **7**(2), 186–191 (2020). [CrossRef]

**8. **P. Clemente, V. Durán, E. Tajahuerce, P. Andrés, V. Climent, and J. Lancis, “Compressive holography with a single-pixel detector,” Opt. Lett. **38**(14), 2524–2527 (2013). [CrossRef]

**9. **Y. Endo, T. Tahara, and R. Okamoto, “Color single-pixel digital holography with a phase-encoded reference wave,” Appl. Opt. **58**(34), G149–G154 (2019). [CrossRef]

**10. **Z. Zhang, X. Ma, and J. Zhong, “Single-Pixel Imaging by Means of Fourier Spectrum Acquisition,” Nat. Commun. **6**(1), 6225 (2015). [CrossRef]

**11. **Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Fast Fourier Single-Pixel Imaging via Binary Illumination,” Sci. Rep. **7**(1), 12029 (2017). [CrossRef]

**12. **Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard Single-Pixel Imaging versus Fourier Single-Pixel Imaging,” Opt. Express **25**(16), 19619–19639 (2017). [CrossRef]

**13. **O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett. **95**(13), 131110 (2009). [CrossRef]

**14. **M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-Pixel Imaging via Compressive Sampling,” IEEE Signal Process. Mag. **25**(2), 83–91 (2008). [CrossRef]

**15. **T. B. Pittman, Y. H. Shon, D. V. Strekalov, and A. V. Sergienko, “Optical imaging by means of two photon quantum entanglement,” Phys. Rev. **52**(5), R3429–R3432 (1995). [CrossRef]

**16. **J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. **78**(6), 061802 (2008). [CrossRef]

**17. **F. Ferri, D. Magatti, L. A. Lugiato, and A. Gatti, “Diﬀerential ghost imaging,” Phys. Rev. Lett. **104**(25), 253603 (2010). [CrossRef]

**18. **M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. **7**(1), 17865 (2017). [CrossRef]

**19. **T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. **413**, 147–151 (2018). [CrossRef]

**20. **C. F. Higham, R. M. Smith, M. J. Padgett, and M. P. Edgar, “Deep Learning for Real-Time Single-Pixel Video,” Sci. Rep. **8**(1), 2369 (2018). [CrossRef]

**21. **A. L. Mur, F. Peyrin, and N. Ducros, “Recurrent Neural Networks for Compressive Video Reconstruction,” IEEE 17th International Symposium on Biomedical Imaging (ISBI), 1651–1654 (2020).

**22. **K. Komuro, T. Nomura, and G. Barbastathis, “Deep ghost phase imaging,” Appl. Opt. **59**(11), 3376–3382 (2020). [CrossRef]

**23. **J. L. Elman, “Finding Structure in Time,” Cognit. Sci. **14**(2), 179–211 (1990). [CrossRef]

**24. **S. Hochreiter and J. Schmidhuber, “LONG SHORTTERM MEMORY,” Neural Comput. **9**(8), 1735–1780 (1997). [CrossRef]

**25. **K. Cho, B.V. Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learing Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation,” Proc. Conf. on Empirical Methods in Natural Language Processing, 1724–1734 (2014).

**26. **A. Coates, H. Lee, and A. Y. Ng, “An analysis of Single-Layer Networks in Unsupervised Feature Learning,” Appearing in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), Vol 15 of Proceedings of Machine Learning Research, 215–223 (PMLR, Fort Lauderdale, FL, 2011).

**27. **H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv:1708.07747 (2017).