Limited by long acquisition time of 2D ghost imaging, current ghost imaging systems are so far inapplicable for dynamic scenes. However, it’s been demonstrated that nature images are spatiotemporally redundant and the redundancy is scene dependent. Inspired by that, we propose a content-adaptive computational ghost imaging approach to achieve high reconstruction quality under a small number of measurements, and thus achieve ghost imaging of dynamic scenes. To utilize content-adaptive inter-frame redundancy, we put the reconstruction under an iterative reweighted optimization, with non-uniform weight computed from temporal-correlated frame sequences. The proposed approach can achieve dynamic imaging at 16fps with 64×64-pixel resolution.
© 2016 Optical Society of America
Since firstly demonstrated in experiment with quantumn entangled photon pair , ghost imaging has undergone a progression from quantum to classical [2–6] then to computational [7–9] scheme. Computational ghost imaging uses programmable illumination patterning and largely simplifies the implementation. Attributed to replacing the array sensor with a single-pixel detector, ghost imaging offers great advantages over traditional imaging techniques in signal-to-noise ratio and turbidity tolerance. Hence, ghost imaging holds great potential in multiple fields, such as remote sensing , radar detection , optical encryption [12,13] and turbulence robust imaging [14,15]. Application in fluorescence microscopy has also been demonstrated .
To achieve a saitisfying reconstruction quality of computational ghost imaging, a minimum demand of 1D measurements should be acquired. However, the sampling rate of current setup is largely limited by the spatial light modulator (SLM), thus restricting the imaging speed in dynamic scenes from realtime imaging. Therefore, ghost imaging on dynamic scenes under low rate of measurements is a task pressing for solution. Preliminary studies in computational ghost imaging of dynamic scenes usually simplify the task. For example, Magana-Loaiza et al.  present a compressive sensing protocol to track a moving object from static background, and Li et al.  report a method recovering the target moving at an unknown constant speed. These approaches either only reconstruct the relative changes of the scene or impose strong assumption on the object motion, which are inapplicable for recording general dynamics scenes.
Single pixel camera shares almost identical imaging and reconstruction scheme with computational ghost imaging. For single pixel imaging, several compressive sensing algorithms have been proposed for video reconstruction. The basic idea is to exploit temporal redundancy by applying 3D discrete wavelet transform (3D-DWT) and force sparse representation coefficients [19, 20]. An alternative option utilizes interframe smoothness together with 2D spatial redundancy for sparse representation [21,22]. Recently, Edgar et al.  build a system making full use of the available fastest SLM for an initial attempt on dynamic ghost imaging under this model. In their work, 32×32-pixel images at frame rate of 10Hz and 64 64-pixel at frame rate of 2.5Hz are achieved by minimizing spatial and temporal total variation of the target video, which we call three-dimensional total variation (3DTV) minimization.
Above spatiotemporal approaches achieve better performance than frame-by-frame reconstruction by utilizing the temporal redundancy among video frames. However, these methods reconstruct all the pixels of each frame in a non-discriminative way and neglect the uneven sparsity in different image areas, and produce different reconstruction reliability for different pixels . It’s been observed that temporal intensity changes among dynamic frames can be utilized to predict reconstruction reliability of image pixels and lend more insight into the reconstrction of each frame. Therefore, further exploiting multi-frame redundancy and introducing content-adaptive weights according to different reliability is more promising for making better use of information and gaining higher reconstruction quality.
In this letter, other than one-step reconstruction on each frame like mentioned methods, we put the reconstruction in an iteratively reweighted framework. Iteratively reweighted approach has been extensively studied and utilized to acquire better performance and stronger robustness in various computational tasks like linear regression and l1 minimization [25–27]. Operating in such a scheme can flexibly incorporate the reliability information computed from multi-frame redundancy into the reconstruction and achieve better results. Specifically, we introduce an adaptive weight factor and the weights are determined by the reliability distribution calculated from reconstructed frames in previous iteration. Through iteratively updating the reliability and optimizing the objective function, we demonstrate our method with higher reconstruction accuracy and efficiency than existing methods.
In this letter, we adopt compressive sensing based reconstruction algorithm for better efficiency [16–18, 23, 28]. The commonly used compressive sensing based approache takes only account of intra-frame spatial redundancy. Denoting the measurement matrix, target image, and the correlated measurements as A, x and y respectively, the objective function can be formulated as
Here D represents an operator transforming the target image x into a domain with sparse representation, such as total variation (TV), DCT, wavelet domain, etc. For image reconstruction tasks, total variation regulation is confirmed to preserve sharp edges or boundaries . Therefore in this paper, we use total variation for the regulation in the objective function.
In our content-adaptive system, temporal redundancy among multiple frames is exploited to reduce the requisite measurements. Instead of directly extending the TV minimization from 2D to 3D with temporal dimension, we calculate the pixel-wise reconstruction reliability from content of neighboring frames and conduct the reconstruction under an iteratively reweighted optimization. Based on the observation that correctly reconstructed pixels tend to share a high persistency and continuity among their counterparts in adjacent frames, we compute the consistency to the aligned neighboring frames as a metric of reliability. Specifically, as illustrated in Fig. 1, after one round of optimization, a sequence of updated video frames are retrieved. For each frame, we make alignment with its neighboring five frames through motion estimation. In implementation, we choose optical flow algorithm for motion estimation [30, 31], and conduct the alignment in a bi-directional way so as to increase robustness to lower reconstruction quality, especially for the initial iteration. An illustration of bi-directional mapping is presented in Fig. 1. For the sake of conciseness, we only illustrate the mapping process in first iteration as an example. Taking one frame from video sequence as current frame (outlined red), we first calculate the motion vectors from current frame to neighboring five frames and map it to the neighbors. Then we calculate the motion vectors from current-to-neighbor results to current frame and map them back. Eventually, we could get five forth-and-back corresponding counterparts through different neighboring frames for current frame. When the bi-directional mapping is completed, we can compute the pixel-wise reconstruction reliability by calculating the intensity variations among counterparts in forth-and-back aligned frames. In this paper, we symbolize the reliability distribution matrix with R, which is a diagonal matrix of the dimension N × N. The reliablity for each pixel in current frame can be expressed as
We choose to weight on the difference between current and previous reconstruction results to softly reduce the dimension of unknown variables. The idea is based on the thought that suppose a number of pixels are accurately reconstructed, their value should remain unchanged in next iteration’s optimization. Therefore the accurate pixels could be eliminated from unknown variables and the dimension of our optimization is reduced. Considering that the reliability computed in our algorithm is only a proper estimate for real accuracy, we introduce the reliability as a weighting parameter and produce a soft margin optimization. Particularly, for pixels with higher confidence reconstruction, we penalize larger intensity variations from previous result in current iteration. Mathematically, we add a confidence based constraint to the target image and the objective function can be formulated as
Here and denotes the reconstruction of current frame in previous and current iteration respectively, and Rk is the weighting matrix defined by the reliability map calculated from previous reconstruction. The coefficient µ is used for balancing two constraints: the first one is defined from the spatial redundancy and the second one is the soft dimension-reduction term which incorporates the information learned from multi-frame redundancy. We can rewrite the problem as
In this equation, the weight λ is used for balancing the data fidelity term and the two constraints in first two items. Theoretically, its setting should be reversely proportional to the noise level of measurements, and we choose it empirically in our experiments. The reconstruction now falls into a typical convex optimization problem and we can solve it by standard augmented Lagrangian method .
The complete reconstruction algorithm works in an iterative way, as illustrated in Fig. 1. Reliability distribution matrix R is initially set to zero. As the iteration proceeds, reconstructed frames get updated by Eq. (4) and reliability gets updated based on current reconstructed results by Eq. (2).
3. Simulation and experiment
To test the performance of our proposed ghost imaging approach, we first carry out numerical simulations on synthesized data. We use 64×64-pixel videos of different complexity as target dynamic scenes, and generate random binary patterns at the same resolution as spatially modulated illuminations. Here we collected 819 correlated measurements (i.e., 20% sub-Nyquist sampling rate) for each frame. We set the iterative number to be 6 empirically, which could generally obtain the reasonable good reconstruction quality. As for the two penalty parameters in Eq. (4), we found experimentally that our algorithm is minutely affected by their values as long as they fall into a proper wide range. Through experiment tests, we set µ = 500, λ = 400 and ε = 0.5. Our content-adaptive method is applied to three movie sequences with different complexities. The results are shown in Fig. 2. Here, we display the iterative evolution of the reconstruction for one single frame. From the evolution, we can see clearly that even though the initial reconstruction without considering the inter-frame redundancy is of low quality, our method can distinguish the pixels with higher reliability from multiple frames, and refine the reconstruction along the iteration. Comparing initial and final reconstructions, significant promotion in visual result can be observed: noises are largely suppressed, and thin structures are reconstructed with cleaner and sharper edges. For quantitative evaluation, peak signal-to-noise ratio (PSNR) of reconstructions are computed as the accuracy metric and plotted in Fig. 2. The ascending curves show that our iteratively reweighted scheme increases steadily in reconstruction quality, although with an unsatisfying initialization. Compared to the initial reconstruction (without scene adaptive reliability constraint), PSNR increases by 14.8dB for ‘String’ after six iterations, and 10.1dB for ‘Fan’ and 5.7dB for ‘Fish’. Scenes with higher sparsity (or lower complexity) such as ’String’ reveal larger improvement, while less sparse ones such as ’Fish’ improve less. This is reasonable for the reconstruction quality is proportional related to the sparse ratio of target object .
To further demonstrate the superiority of our method, we perform comparison with 3D-DWT reconstruction and 3DTV method. The sampling rate is first fixed to 0.2 and image sequences ‘Fan’ and ‘Fish’ are constructed by all approaches. Five reconstructed frames are shown in Fig. 3(a)(b) for illustration. We can see that under the same sampling rate, both 3D-DWT and 3DTV exhibit lower reconstruction quality than ours on three scenes with different complexity. The result is reasonable because our method incorporates the multi-frame redundancy adaptively in the reconstruction, which makes better utilization of the temporal redundancy than directly utilizing temporal smoothness across all frames or between adjacent frames. In addition, iteratively reweighted framework provides a potential for sustainable improvement.
The effects of sampling rate and noise on the performance are also experimentally discussed. Firstly, we testify the performance of our method together with 3D-DWT and 3DTV approach for different sampling rates ranging from 0.15 to 0.3 with an interval of 0.025, and provide the average PSNR over the whole sequence in Fig. 3(c) and 3(d). For both results, the improvement of our method is more conspicuous at a higher sampling rate. Because at extremely low sampling rate, the initial reconstruction may be of too low quality for reliable estimation of the confidence, and degenerate the final performance. The effect of the noise intensity on the performance of our method is investigated and compared with the performance of 3D-DWT and 3DTV. Gaussion white noise is added to synthesize measurements of different signal-to-noise ratios (SNRs). Figure 3(e) and 3(f) shows the PSNR of reconstructions with respect to different SNRs for both examples on all three methods. As shown in Fig. 3(e) and 3(f), as the SNR increases, the performance enhancement over 3D-DWT and 3DTV grows much larger. Through the comparison, we demonstrate that our method achieves better performance than 3D-DWT and 3DTV with noisy data over a large SNR range. Overall, our approach exhibits consistently higher performance, which reveals that our reconstruction frame could make better use of inter-frame redundancy. In other words, we can obtain the same quality with fewer measurements, and thus handle higher resolution or faster videos under the scheme of computational ghost imaging.
We then apply the approach on data captured by the experimental setup exhibited in Fig. 4. Our setup is developed under the standard computational ghost imaging scheme, as described in . Firstly, we generate collimated light beam by guiding the laser source through a beam expander. Then the uniform light beam is spatially modulated by the first digital micromirror device (DMD) with random binary patterns, before illuminating the second DMD that displays the target dynamic scene. Finally, the correlated 1D measurements between modulated patterns and target scene are collected by a bucket detector and digitalized using an acquisition card.
The spatial resolution of the illumination patterns is fixed to 64×64 and the first DMD switches patterns at 20 kHz. The acquisition board works at 2000kHz, a frequency much higher than pattern-switching rate for effective sensor noise suppression. The second DMD displays videos at 16 frames per second and for each frame 1229 random patterns (i.e., 30% subNyquist sampling rate) and 21 blank patterns (used for synchronization of the DMD and the detector) are projected. Figure 5 shows several frames from three reconstructed video sequences on our algorithm, 3DTV and 3D-DWT method. The sequence number of each presented frame is labeled above, and the whole reconstructed sequences could be referred to in the supplementary material. The results show that our algorithm performs elegantly on general dynamic scenes, including translation, rotation, or other non-rigid inter-frame changes. By comparison with 3DTV and 3D-DWT method, boundaries and details such as edges of the cube in (a) and dots in (c) are more sharply preserved, while flat surfaces such as thick lines in (b) and semicircle in (c) are recovered with lower noise.
In conclusion, we propose a content-adaptive ghost imaging approach for dynamic scenes and testify its effectiveness with both numerical simulation and experiment. The main idea of our algorithm is to distinguish reconstruction reliability for each pixel based on temporal redundancy among frames, and incorporate the reliability information in an iteratively reweighted optimization. Through comparison with other works on dynamic ghost imaging, we verify big improvement in reconstruction quality when only small number of measurements are available, therefore taking a leap for practical dynamic ghost imaging.
It’s worth noting that our content-adaptive reconstruction approach is a general scheme to gain performance improvement from non-adaptive-constraint to content-adaptive reconstruction. Such a scheme can incorporate advanced algorithms on either single frame reconstruction or video reconstruction. For further discussion, there have been researches on two protocols for improving the raw quality of ghost imaging: differential ghost imaging (DGI) [33, 35] and normalized ghost imaging (NGI) , which can help compensate influences from unstable illuminations. In the cases with light source fluctuations or environmental disturbances, these strategies can increase the accuracy of initial reconstruction and the data term in our optimization energy function, and thus improve the final reconstruction. If one uses a stable light source and captures the data in a dark room, as in our experiment, the improvement from these protocols are not that large. It is also worth noting that, our algorithm is a general approach not limited to computationally controllable illumination pattern, thus is also applicable to two-arm ghost imaging systems with a pseudo-thermal light source.
This work was supported by the projects of National Natural Science Foundation of China (NSFC) (Nos. 61327902 and 61120106003), and National Science Foundation (NSF) award 1115680.
References and links
6. B. I. Erkmen and J. H. Shapiro, “Unified theory of ghost imaging with gaussian-state light,” Phys. Rev. A 77(4), 043809 (2008). [CrossRef]
7. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]
8. Y. Bromberg, O. Katz, and Y. Silberberg, “Ghost imaging with a single detector,” Phys. Rev. A 79(5), 053840 (2009). [CrossRef]
9. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. Padgett, “3D computational imaging with single-pixel detectors,” Science 340(6134), 844847 (2013). [CrossRef]
10. R. E. Meyers, K. S. Deacon, and Y. Shih, “Turbulence-free ghost imaging,” Appl. Phys. Lett 98(11), 111115 (2011). [CrossRef]
11. C. Zhao, W. Gong, M. Chen, E. Li, H. Wang, W. Xu, and S. Han, “Ghost imaging lidar via sparsity constraints,” Appl. Phys. Lett 101(14), 141123 (2012). [CrossRef]
13. M. Tanha, R. Kheradmand, and S. Ahmadi-Kandjani, “Gray-scale and color optical encryption based on computational ghost imaging,” Appl. Phys. Lett 101(10), 101108 (2012). [CrossRef]
15. N. D. Hardy and J. H. Shapiro, “Reflective ghost imaging through turbulence,” Phys. Rev. A 84(6), 3474 (2011). [CrossRef]
16. S. Vincent, B. Jrome, C. Makhlad, M. H. Shams, C. Emmanuel, and D. Maxime, “Compressive fluorescence microscopy for biological and hyperspectral imaging,” in Proceedings of the National Academy of Sciences (Academic, 2012), pp. 1679–1687.
17. O. S. Magana-Loaiza, G. A. Howland, M. Malik, J. C. Howell, and R. W. Boyd, “Compressive object tracking using entangled photons,” Appl. Phys. Lett 102(23), 231104 (2013). [CrossRef]
18. E. Li, Z. Bo, M. Chen, W. Gong, and S. Han, “Ghost imaging of a moving target with an unknown constant speed,” Appl. Phys. Lett 104(25), 251120 (2014). [CrossRef]
19. M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk, “Compressive imaging for video representation and coding,” in Proceedings of Picture Coding Symposium (IEEE, 2006).
20. A. Thompson, “Compressive single-pixel imaging,” presented at the 2nd IMA Conference on Mathematics in Defence, Swindon, the United Kingdom, 20 October 2011.
21. R. F. Marcia and R. M. Willett, “Compressive coded aperture superresolution image reconstruction,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2008), pp. 833–836.
22. T. Goldstein, L. Xu, K. F. Kelly, and R. Baraniuk, “The STONE transform: Multi-resolution image enhancement and real-time compressive video,” http://arxiv.org/abs/1311.3405.
23. M. P. Edgar, G. M. Gibson, R. W. Bowman, B. Sun, N. Radwell, K. J. Mitchell, S. S. Welsh, and M. J. Padgett, “Simultaneous real-time visible and infrared video with single-pixel detectors,” Sci. Reports 5, 10669 (2015). [CrossRef]
25. P. W. Holland and R. E. Welsch, “Robust regression using iteratively reweighted least-squares,” Commun. Stat. Theory 6(9), 813–827 (1977). [CrossRef]
26. D. P. OLeary, “Robust regression computation using iteratively reweighted least squares,” Siam. J. Matrix. Anal. A 11(3), 466–480 (1990). [CrossRef]
27. E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by reweighted L1 minimization,” J. Fourier. Anal. Appl 14(5–6), 877–905 (2007). [CrossRef]
28. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” Appl. Phys. Lett 95(13), 131110 (2009). [CrossRef]
29. C. Li, “An efficient algorithm for total variation regularization with applications to the single pixel camera and compressive sensing,” Master Thesis, Rice University (2009).
30. S. S. Beauchemin and J. L. Barron, “The computation of optical flow,” Acm. Comput. Surv 27(3), 433–466 (1995). [CrossRef]
31. C. Liu, “Beyond pixels : exploring new representations and applications for motion analysis,” Mass. Inst. Technol (2010).
32. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends. Mach. Learn 3(1), 1–122 (2011). [CrossRef]
34. B. Sun, S. S. Welsh, M. P. Edgar, J. H. Shapiro, and M. J. Padgett, “Normalized ghost imaging,” Opt. Express 20(15), 16892–16901 (2012). [CrossRef]
35. B. Sun, M. Edgar, R. Bowman, L. Vittert, S. Welsh, A. Bowman, and M. Padgett, “Differential computational ghost imaging,” in Imaging and Applied Optics, OSA Technical Digest (online) (Optical Society of America, 2013), paper CTu1C.4. [CrossRef]