## Abstract

Ghost imaging (GI) usually requires a large number of samplings, which limit the performance especially when dealing with moving objects. We investigated a deep learning method for GI, and the results show that it can enhance the quality of images with the sampling rate even down to 3.7%. With a convolutional denoising auto-encoder network trained with numerical data, blurry images from few samplings can be denoised. Then those outputs are used to reconstruct both the trajectory and clear image of the moving object via cross-correlation based GI, with the number of required samplings reduced by two-thirds.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

It is well known that ghost imaging (GI) [1,2] is an advanced method for imaging, based on second order correlation between a set of reference patterns and the corresponding echoes from the objects. It can be achieved in a lensless way [3], and constructs possibility for x-ray imaging and terahertz wave application [4–7], for which spectrum array sensors and/or lens can be hard to be achieved. With the illumination field actively controlled and calculated, GI can be realized with only a single-pixel detector [8–10]. For data processing, it is easy to be combined with compressed sensing and machine learning [11,12], benefited by its mathematical structure. In addition, GI can reach better performance at low photon flux since the echoes are collected for the bucket detection [13–16]. However, a large number of samplings are required to obtain the correlation with fairly high quality, which limits the performance and application of GI, especially when dealing with moving objects.

Towards tracking and imaging of moving objects via GI, many schemes have been proposed from different aspects. The main solution is to reduce the sampling duration, within which the relative displacement will be reduced. An additional precise aiming and tracking system can help to keep the imaging system gazing at the target, introducing optical and mechanical complexity. On the other side, increasing the refresh frequency of the illumination source [17–19] can reduce the time consumption of samplings thus improve imaging quality of moving objects. While, the tolerant speed of the object is limited, with the relative displacement expected to be smaller than spatial resolution of the system during the sampling duration. Beyond these two ways, image-free method for tracking has also been proposed based on single-pixel imaging(SPI) [20]. By estimating the displacement via 6 Fourier basis patterns, this method enables real-time tracking, while no image can be obtained. To achieve both tracking and imaging, cross-correlation based ghost imaging(CBGI) [21] has been demonstrated. Taking the cross correlation between blurry images at different transients into consideration, the displacement thus trajectory of the object can be retrieved. Then image of high quality can be achieved. With standard GI setup and linear algorithm, both tracking and imaging are successfully realized. An acceptable correlation peak is required for displacement retrieval, which requires a certain level of contrast to noise ratio (CNR) [22] for those blurry images. Therefore, if the quality of those temporary images can be higher with the same number of or even fewer samplings, the performance of CBGI can be further improved.

Different algorithms and methods have been discussed to improve imaging quality of GI, such as differential GI [23], ghost imaging with normalization [24,25], ghost imaging with background subtraction [22] and so on. These algorithms can achieve better results than conventional GI algorithm, while cannot further improve CNR of the image under a certain number of samplings. Situ proposed a deep-learning-based GI scheme in which a deep neural network was trained to learn the sensing model and increase the quality of image reconstruction [11]. Tomoyoshi proposed a method for computational ghost imaging based on deep learning which can be able to predict low-noise images from new noise-contaminated GI images [26]. Further, machine learning based on denoising auto-encoder has been applied to medical imaging such as ultra-low-dose CT [27]. With those deep learning method [11,27,28], images can be obtained with the sampling rate down to 5% [11]. In this paper, we investigate deep learning based GI with even fewer samplings and demonstrate its application for tracking and imaging.

To increase CNR of images with fewer samplings, we trained a convolutional denoising anto-encoder network with numerical inputs. Then those temporary blurry images are denoised via our deep learning network, with the cleaned output used for CBGI. The experiments show that the number of required samplings can be reduced to one third of that required in previous scheme of CBGI. Therefore, the tolerant speed of moving objects can be remarkably improved. With different intensities of illumination, the effects of noise are discussed.

## 2. Theoretical analysis and model training

In ghost imaging(GI) system, an image can be reconstructed in practice by

CBGI provided an effective way for tracking and imaging, by retrieving the displacement of a moving object at two different moments from blurry images, which can be achieved via a small number of samplings. Since the displacement is obtained from the cross-correlation result, the peak-sidelobe constract-to-noise ratio(PSCNR) matters much, which is

where $N_C$ is the number of samplings for each temporary image, $T$ is the number of pixels of the object, $E$ is the number of pixels occupied by the edge of the object, and $M$ is the number of pixels of the image. Comparing Eq. (3a) with Eq. (2), PSCNR grows faster than CNR of the image. Therefore, CBGI can obtain the displacement from fewer samplings. Or, the number of samplings for each blurry image, to retrieve the displacement, is not required to be so high as that for imaging. To improve the performance of CBGI, we try to further reduce the required number of samplings, introducing deep learning.To improve quality of image with certain number of samplings or to reduce the number of required samplings for certain CNR, we are investigating deep learning method based on convolutional denoising auto-encoder (CDAE). Auto-encoder [29] is a useful unsupervised learning model which consists of three layers, the input layer, the hidden layer, and the output layer. The input data is encoded to the hidden layer, then the decoder will retrieve the hidden layer to output. The goal is to decrease the loss between the reconstruction and the ground truth. This is only a scheme and it should be combined with other algorithms to express the encoder and decoder. Noise is added into the training data, such that denoising auto-encoder will be trained to denoise the image, as shown in Fig. 1(a). Robust expression should be learned by the network from the input data. The task is to reconstruct $Z$ from the corrupted data $\tilde {X}$, with the ground truth being $X$. $q_D$ is the operation to degenerate the data, for training. Then the corrupted $\tilde {X}$ is mapped to the hidden layer $Y$ by $f_\theta$. Later $X$ can be decoded by $g_{\theta '}$ from $Y$. During the training process, the reconstruction error $L_H(X,Z)$ is trained to be minimized. To preserve the spatial position of the object within the 2-dimensional image, a convolutional neural network is applied in our scheme.

To train the network, data set $D_n =\left \{ X_1,X_2,\ldots ,X_n\right \}$ is firstly prepared, where $X_i$ is a $2D$ image matrix. $\tilde {X}$ is the corrupted data which is generated by adding noise to $X$. Then there come the pairs of original input and the corrupted input $P_n=\left \{(X_1,\tilde {X_1}),\ldots ,(X_n, \tilde {X_n})\right \}$, serving as the training set. This scheme will take $\tilde {X_i}$ as the input and map it to the hidden representation $Y$ using a function $Y=f_\theta = s(W*\tilde {X_i} + b)$, where $*$ denotes convolution, parameterized by $\theta = \left \{W, b\right \}$. $s$ is the rectified linear unit (ReLU) function, used as activation function in our scheme. Then, the hidden representation $Y$ is used to reconstruct the clean output $Z$ by reversely mapping $Z=s(W'*Y+b')$ with parameters $\left \{ W', b' \right \}$. The parameters of CDAE are optimized, by minimizing the cost function between $X$ and $Z$ over the training set. Here, the cost function of CDAE is set as the mean squared error $MSE = \frac {1}{n} \sum _{i=1}^{n} (x_i-Z_i)^2$.

The data set for training is generated numerically. Based on the MNIST data set, we have resized $1000$ pictures to $256*256$, which are taken as $\{X_i\}$ for training. Random Gaussian noises are added to generate $\{\tilde {X}_i\}$, to simulate the statistical errors in GI under small number of samplings. By training with those numerical data, we get the CDAE model for denoising images. Then the performance of the model is tested, with some results shown in Fig. 2. The first row of Fig. 2 shows the temporary images which are obtained according to Eq. (1), with 300 reference patterns captured from the experimental setup shown below and the bucket detection values calculated from inner product between the patterns and the numerical binary objects. Those images are then treated as the input for testing. The denoised images are shown in the second row. For each blurry image, it takes about 0.13s for denoising (with CPU of intel core i7 8700k@3.7GHz and GPU of RTX2080Ti).

Then, the trained model is used for CBGI. Those temporary blurry images obtained from a small number of samplings are denoised by CDAE. With the output images, both trajectory and image of the moving object can be obtained, with the schematic diagram shown in Fig. 1(b). Since the images are denoised, PSCNR of the cross correlation is improved. In reverse, for certain PSCNR, a smaller number of samplings is required. Therefore, the required sampling time for each temporary image is reduced and CBGI can work for faster moving objects.

## 3. Experiments and results

To demonstrate our method, experiments with a standard GI setup are performed, as shown in Fig. 3. A laser of 532nm is headed onto a rotating ground glass(RGG) to generate pseudo-thermal source. After that there is a lens $L1$, with the focal length being 500mm, which is applied to achieve Fourier transform. The aperture $P$ is set on the Fourier plane of $L1$ for controlling field of view. Then the 4-$f$ system, consisting of two lenses $L2(f=125mm)$ and $L3(f=170mm)$, is employed to adjust the scale of speckles on the object plane. Then the beam is divided into the reference arm and the object arm with a polarization beam splitter (PBS). With the PBS, the intensity of illumination can be easier to be adjusted and the polarizaiton of light on the surface of DMD is set to match the device. In the reference arm, the speckle is captured directly by the camera CCD1. In the object arm, the patterns are imaged onto the surface of a digital mirror device (DMD), which is employed to simulate the scene to be imaged. The reflected light from DMD is collected by the camera CCD2, which serves as the bucket detector.

A digit "7" is displayed on the DMD, as the object. By the refreshing of DMD, the digit appears moving within an area of $512*512$ pixels, which is illuminated by the patterns. The trace of the object is composed of 41 positions, which is heart shaped as shown in Fig. 4. The object moves from one position to the next one in turn. The reference is recorded by a camera, with the size of $256*256$ pixels. The bucket detection is performed by another camera CCD2. The sampling frame rate is 25Hz in the experiments.

In order to estimate the performance and improvement of our method, we set two ways for comparison. Since the trajectory and the image of the object are achieved by estimating the displacement, the precision of such estimation will affect the quality of the outputs. To measure the precision of displacement estimation, mean deviation of the trajectory is calculated, as

with $n$ being the number of positions, $x_i$ being the experimentally estimated position at the $i^{th}$ transient, and $x_{0i}$ being the real position. The final reconstructed image is retrieved from the original detection data and the displacement information. The more accurate the moving information is, the better quality of the final reconstruction will be. Therefore, the quality of final reconstruction can be applied to compare the performance of these two methods more intuitively. We take CNR as the estimation of the imaging quality.Firstly, the experiments are performed under illumination of sufficient intensity. With signal-to-noise ratio (SNR) for all the detection being high enough, the statistical error is the dominant noise, since only small number of samplings are acquired. The statistic noise increases as the number of samplings decreases. It will influence the quality of blurry images, which will then affect the tracking precision and the final reconstruction. In the experiment, SNR of the bucket detection is calculated as

The above discussion is based on the assumption that intensity of the illumination is sufficient, with SNR of the detection being high enough. So that the noise is dominated by the statistic error which obeys Gaussian distribution. The quality of the final reconstruction will be better with a larger samplings, as is also shown from the results. However, if the intensity of illumination is not sufficient, the detection noise or background noise will affect the final results.

To test the performance of our method considering detection noise, we attenuate the illumination intensity in the object arm. With the photon flux being low, the exposure time and the gain of CCD2 is increased to obtain acceptable detection results. Therefore, the detection noise is also relatively increased. Then the moving object is tracked and imaged with CBGI and our method. At this time, the noise is mainly composed of statistic noise and noise of the detector. With the SNR of bucket detection being 0.55, the final images are shown in Fig. 4(e), with different number of samplings. The results of MDT and CNR are shown in Fig. 5. Comparing with those results under sufficient illumination, the precision of tracking and quality of final images decrease when the detection noise is involved. With 300 samplings, CBGI fails to reconstruct image of the moving object under such noise. While, our method provides good reconstruction of the trace and images, which fails until the number of samplings reaches 100. Under bucket detection of SNR=0.55, deep learning method is still available at the sampling rate of about 5.6%. From the calculated CNR, our method requires about one-third samplings for acceptable reconstruction compared to that of CBGI, with both statistical errors and detection noise considered.

It should be noticed that, the CDAE model is trained with simulated noise, since we are mainly concentrated on denoising the images obtained from small number of samplings. As is known, the performance of deep learning network depends much on the training set and the parameters. When the number of samplings goes up to 400 or 500, CDAE network is not working at the best settings. On the contrary, the quality of temporary images is relatively high and CBGI works well. Therefore CBGI performs better than our method with the number of samplings being 400 or 500, as shown in Fig. 5(a).

Besides, since the number of required samplings for each temporary image is reduced, the tolerant speed of the moving object can be higher. To overcome the motion blur, the displacement during the samplings time for each temporary image is expected to be smaller than the spatial resolution of the system. That is, the sampling rate of the system is expected to satisfy [18]

where $f$ is the sampling rate of the system, $N_C$ is the number of samplings for each temporary image, $v$ is the speed of the moving object, $l$ is the distance between the illumination system and the object, and $\theta _r$ is the angular resolution of the system. Then the tolerant speed of the object can be estimated. In our experiments, the average size of speckles is $292.19um$. The distance between the illumination system and the object(the distance between L3 and the DMD in the experimental setup) is $230mm$ and the angular resolution of the system is $1.27mrad$. Limited by the bucket detector we used, the sampling rate is $f=25Hz$. With $N_C=50$, we can get the tolerant speed $v_{max}=146.1um/s$ and the maximum angular speed $\omega _{max} = v_{max}/l$. In practice, we can use devices of much faster response. For example, using DMD of $20kHz$ refreshing rate to generate pseudo-thermal source, and using bucket detector of bandwidth higher than $20kHz$, the tolerant angular speed turns out to be $511mrad/s$ with the same experimental setup. Compared to CBGI, the number of required samplings is reduced by two thirds, therefore the tolerant speed is improved by three times.If the illumination intensity decreases further, to the level of few photons per pixel, the echoes will be very weak and even shot noise can affect the final results. We estimate the performance of our method for such cases by numerical simulation. The same size of the scene and the same moving object as the above experiments are used for simulation. The illumination patterns are assumed to be ideal, which satisfy the intensity distribution of pseudo-thermal light and are generated numerically. The photons are assumed to arrive at the object plane individually, with the total number of illumination photons being $m$ for each sampling. For each photon, it is unknown that which specific pixel it will arrive at. While, the probability can be determined by the intensity distribution of the illumination pattern, as

where $z_i$ is the corresponding pixel on the reference arm, $I_{z_i}$ is the recorded intensity at $z_i$, $M$ is the number of pixels of the scene/image. Then $P_{z_i}$ is the probability that a photon will arrive at the pixel $z_i$. In our simulation, the average size of speckles is set as one pixel. Namely, the average number of speckles equals to the number of pixel. With $m$ photons illuminating the scene, the average number of photons arriving at the object plane within the area of one speckle is Obviously, The bigger the $k$ is, the better quality of the reconstruction will be. To obtain the results of bucket detection, the object is set as binary and the loss of propagation is not considered. The detector is also taking as ideal, with the quantum efficiency set as 1. With 500 samplings and different values of $k$, the results of CBGI and our method are shown in Fig. 6. The first row shows results of convetional GI according to Eq. (1), for static object. The second row shows the results of CBGI, with the object moving as the above experiments. It can be seen that at very weak illumination, CBGI can not reconstruct any image of the object due to low SNR. Results of our method is shown in the third row, which can work until $k$ being as low as 1.## 4. Conclusion

In summary, we trained a CDAE network to denoise the blurry images of GI from fewer number of samplings. The results show that it can enhance the quality of images even when the sampling rate is as low as 3.7%. With the denoised images, the trace and clear image of a moving object are reconstructed via CBGI. Using our method, the required number of samplings for CBGI can be reduce to one-third. The performance of our method considering statistical errors, detection noise are tested. We also numerically simulated for the case of weak illumination, with the results can be helpful for understanding the effects of shot noise in GI. With our method, the performance of CBGI can be greatly improved, thus tracking and imaging via GI becomes closer to practice. Since the tracking is based on blurry images, it can also work for rotating objects. Further, this method of denoising can also be used for other tracking methods based on image, not only for GI.

## Funding

National Natural Science Foundation of China (11774431, 61701511).

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **T. B. Pittman, Y. H. Shih, D. V. Strekalov, and A. V. Sergienko, “Optical imaging by means of two-photon quantum entanglement,” Phys. Rev. A **52**(5), R3429–R3432 (1995). [CrossRef]

**2. **A. Valencia, G. Scarcelli, M. D’Angelo, and Y. Shih, “Two-photon imaging with thermal light,” Phys. Rev. Lett. **94**(6), 063601 (2005). [CrossRef]

**3. **J. Cheng and S. S. Han, “Incoherent coincidence imaging and its applicability in x-ray diffraction,” Phys. Rev. Lett. **92**(9), 093903 (2004). [CrossRef]

**4. **H. Yu, R. H. Lu, S. S. Han, H. L. Xie, G. H. Du, T. Q. Xiao, and D. M. Zhu, “Fourier-transform ghost imaging with hard x rays,” Phys. Rev. Lett. **117**(11), 113901 (2016). [CrossRef]

**5. **A. X. Zhang, Y. H. He, L. A. Wu, L. M. Chen, and B. B. Wang, “Tabletop x-ray ghost imaging with ultra-low radiation,” Optica **5**(4), 374–377 (2018). [CrossRef]

**6. **L. Olivieri, J. S. T. Gongora, L. Peters, V. Cecconi, A. Cutrona, J. Tunesi, R. Tucker, A. Pasquazi, and M. Peccianti, “Hyperspectral terahertz microscopy via nonlinear ghost imaging,” Optica **7**(2), 186–191 (2020). [CrossRef]

**7. **N. Radwell, K. J. Mitchell, G. M. Gibson, M. P. Edgar, R. Bowman, and M. J. Padgett, “Single-pixel infrared and visible microscope,” Optica **1**(5), 285–289 (2014). [CrossRef]

**8. **J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A **78**(6), 061802 (2008). [CrossRef]

**9. **Y. Bromberg, O. Katz, and Y. Silberberg, “Ghost imaging with a single detector,” Phys. Rev. A **79**(5), 053840 (2009). [CrossRef]

**10. **B. Q. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D computational imaging with single-pixel detectors,” Science **340**(6134), 844–847 (2013). [CrossRef]

**11. **M. Lyu, W. Wang, H. Wang, H. Wang, G. W. Li, N. Chen, and G. H. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. **7**(1), 17865 (2017). [CrossRef]

**12. **Y. C. He, G. Wang, G. X. Dong, S. T. Zhu, H. Chen, A. X. Zhang, and Z. Xu, “Ghost imaging based on deep learning,” Sci. Rep. **8**(1), 1–7 (2018). [CrossRef]

**13. **R. S. Aspden, D. S. Tasca, R. W. Boyd, and M. J. Padgett, “Epr-based ghost imaging using a single-photon-sensitive camera,” New J. Phys. **15**(7), 073032 (2013). [CrossRef]

**14. **P. A. Morris, R. S. Aspden, J. E. C. Bell, R. W. Boyd, and M. J. Padgett, “Imaging with a small number of photons,” Nat. Commun. **6**(1), 5913 (2015). [CrossRef]

**15. **R. S. Aspden, N. R. Gemmell, P. A. Morris, D. S. Tasca, L. Mertens, M. G. Tanner, R. A. Kirkwood, A. Ruggeri, A. Tosi, R. W. Boyd, G. S. Buller, R. H. Hadfield, and M. J. Padgett, “Photon-sparse microscopy: visible light imaging using infrared illumination,” Optica **2**(12), 1049–1052 (2015). [CrossRef]

**16. **X. L. Liu, J. H. Shi, X. Y. Wu, and G. H. Zeng, “Fast first-photon ghost imaging,” Sci. Rep. **8**(1), 1–8 (2018). [CrossRef]

**17. **W. L. Gong, C. Q. Zhao, H. Yu, M. L. Chen, W. D. Xu, and S. S. Han, “Three-dimensional ghost imaging lidar via sparsity constraint,” Sci. Rep. **6**(1), 26133 (2016). [CrossRef]

**18. **H. Li, J. Xiong, and G. H. Zeng, “Lensless ghost imaging for moving objects,” Opt. Eng. **50**(12), 127005 (2011). [CrossRef]

**19. **Z. H. Xu, W. Chen, J. Penuelas, M. Padgett, and M. J. Sun, “1000 fps computational ghost imaging using led-based structured illumination,” Opt. Express **26**(3), 2427–2434 (2018). [CrossRef]

**20. **Z. B. Zhang, J. Q. Ye, Q. W. Deng, and J. G. Zhong, “Image-free real-time detection and tracking of fast moving object using a single-pixel detector,” Opt. Express **27**(24), 35394–35401 (2019). [CrossRef]

**21. **S. Sun, J. H. Gu, H. Z. Lin, L. Jiang, and W. T. Liu, “Gradual ghost imaging of moving objects by tracking based on cross correlation,” Opt. Lett. **44**(22), 5594–5597 (2019). [CrossRef]

**22. **K. W. C. Chan, M. N. O’Sullivan, and R. W. Boyd, “Optimization of thermal ghost imaging: high-order correlations vs. background subtraction,” Opt. Express **18**(6), 5562–5573 (2010). [CrossRef]

**23. **F. Ferri, D. Magatti, L. Lugiato, and A. Gatti, “Differential ghost imaging,” Phys. Rev. Lett. **104**(25), 253603 (2010). [CrossRef]

**24. **B. Q. Sun, S. S. Welsh, M. P. Edgar, J. H. Shapiro, and M. J. Padgett, “Normalized ghost imaging,” Opt. Express **20**(15), 16892–16901 (2012). [CrossRef]

**25. **S. Sun, W. T. Liu, J. H. Gu, H. Z. Lin, L. Jiang, Y. K. Xu, and P. X. Chen, “Ghost imaging normalized by second-order coherence,” Opt. Lett. **44**(24), 5993–5996 (2019). [CrossRef]

**26. **T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, and A. Shiraki, “Computational ghost imaging using deep learning,” Opt. Commun. **413**, 147–151 (2018). [CrossRef]

**27. **M. Nishio, C. Nagashima, S. Hirabayashi, A. Ohnishi, K. Sasaki, T. Sagawa, M. Hamada, and T. Yamashita, “Convolutional auto-encoder for image denoising of ultra-low-dose ct,” Heliyon **3**(8), e00393 (2017). [CrossRef]

**28. **H. Wu, R. Z. Wang, G. P. Zhao, H. P. Xiao, J. Liang, D. D. Wang, X. B. Tian, L. L. Cheng, and X. M. Zhang, “Deep-learning denoising computational ghost imaging,” Opt. Lasers Eng. **134**, 106183 (2020). [CrossRef]

**29. **P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning, (2008), pp. 1096–1103.