Abstract
Coherent diffraction imaging (CDI), as a lensless imaging technique, can achieve a high-resolution image with intensity and phase information from a diffraction pattern. To capture high-speed and high-spatial-resolution scenes, we propose a temporal compressive CDI system. A two-step algorithm using physics-driven deep-learning networks is developed for multi-frame spectra reconstruction and phase retrieval. Experimental results demonstrate that our system can reconstruct up to eight frames from a snapshot measurement. Our results offer the potential to visualize the dynamic process of molecules with large fields of view and high spatial and temporal resolutions.
© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
Coherent diffraction imaging (CDI) [1] has attracted significant interest in various fields such as material science [2] and biology [3], benefiting from the fact that a high-resolution image with intensity and phase information can be generated from a far-field coherent diffraction pattern. Although originally proposed for x rays, CDI is a general technique that can achieve much higher spatial resolution than its direct imaging counterparts. Existing CDI heavily relies on the prevalent phase retrieval (PR) approach [1,4], such as the “hybrid input–output”(HIO) [5] method, which recovers the intensity and phase information of an object from its power spectral density. Due to the ill-posed nature of PR, it is challenging to recover complicated objects without the supporting constraint, especially in high-detection-noise and low-dynamic-range scenarios [6–8]. Towards this end, various modifications [9,10] of CDI have been proposed to break the supporting constraint while retaining the convergence of reconstruction. Recently, coherent modulated imaging (CMI) [11] has adopted a modulator behind the imaged object to achieve dynamic-range modulated diffraction patterns, and then recover complicated images.
Compressive sensing (CS) [12] provides an effective solution to recover a high-dimensional signal from its low-dimensional measurements, providing a chance for large field-of- view (FoV) and high-spatial-resolution imaging. Specifically, high-dimensional image information is randomly modulated into a two-dimensional (2D) detectable plane through a group of coded patterns, which are then reconstructed from measurements using CS algorithms, such as temporal CS [13] and spectral CS [14]. Recently, deep learning [15] has flourished and been successfully applied to CS reconstruction. Inspired by CS and deep learning, the video snapshot compressive imaging (SCI) technique achieves high-speed frames from single-shot encoded measurement, and can enhance the frame rate of existing cameras up to one to two orders of magnitude [13,16]. While the video SCI technique can capture high-speed dynamic processes, its modulation process has so far mainly focused on the image domain (plane).
This Letter proposes a temporal compressive CDI (TC-CDI) method to achieve multi-frame images from single-shot encoded measurement in CDI, where modulation happens in the frequency domain. The proposed method is demonstrated with near-infrared light, where modulation is achieved by modulating the frequency of moving objects through a digital micro-mirror device (DMD) by projecting a group of coding patterns on the Fourier plane. Experimental results show that a compression ratio of up to eight can be achieved, which demonstrates the effectiveness of our proposed method, and potentially advances CDI to visualize the dynamic process of molecules with large FoVs and high spatial and temporal resolutions simultaneously.
Although compressive diffraction imaging has been reported before [17] to investigate the material properties of objects along the longitudinal location using a single-pixel energy sensitivity detector, our work differs from that since the compressive process of the proposed TC-CDI takes place in the temporal domain and modulation happens in the frequency domain. Furthermore, the reconstruction algorithm is based on a physics-driven deep-learning framework, which benefits the flexibility of optimization-based approaches and the high reconstruction quality of learning-based approaches.
The contributions of this work are two-fold: (1) the video SCI technique is generalized to frequency-domain modulation, and (2) the proposed physics-driven deep-learning algorithm enables TC-CDI to reconstruct high-quality video frames. Our work provides a novel paradigm to integrate CDI, deep learning, and PR to capture high-resolution high-dynamic scenes.
Figure 1 depicts a schematic diagram of TC-CDI. Without loss of generality, the moving object $\textbf{O}(\textbf{x},t)$ is illuminated by monochromatic light ${\textbf{U}_0}(\textbf{x})$, where $\textbf{x}$ denotes the spatial coordinate, and $t$ refers to the temporal coordinate. According to Fraunhofer diffraction, the light field ${\textbf{U}_d}(\textbf{x})$ is given by
Correspondingly, the detected intensity $\textbf{I}(\textbf{x})$ is
We define $\textbf{U}_d^\prime (:,:,t) = \parallel {\textbf{U}_d}(:,:,t){\parallel ^2}$, and vectorize ${\boldsymbol Y}$, $\textbf{U}_d^\prime $, and $\textbf{G}$, namely, $\textbf{y} = {\rm vec}({\boldsymbol Y}) \in {{\mathbb R}^{\textit{WH}}}$, $\textbf{u} = {\rm vec}(\textbf{U}_d^\prime) \in {{\mathbb R}^{\textit{WHT}}}$, and $\textbf{g} = {\rm vec}(\textbf{G}) \in {{\mathbb R}^{\textit{WH}}}$. Equation (4) turns out to be
where ${\boldsymbol \Phi} \in {{\mathbb R}^{WH \times WHT}}$ denotes the sensing matrix, which is a concatenation of diagonal matrices. Specifically, ${\boldsymbol \Phi} = [{{\boldsymbol \Phi}_1}, \ldots ,{{\boldsymbol \Phi}_T}]$, where ${{\boldsymbol \Phi}_t} = {\rm Diag}({\rm vec}(\textbf{M}(:,:,t)))$ is a diagonal matrix with its diagonal elements composed of ${\rm vec}(\textbf{M}(:,:,t))$.Next, a two-stage reconstruction model is proposed to recover image information from the measurement $\textbf{y}$. Specifically, as shown in Fig. 2, in the first stage, the frequency-domain frames are reconstructed through a physics-driven deep-unfolding structure based on a deep neural network (DNN). To be concrete, the reconstruction process can be modeled as the following optimization problem:
The optical setup of our TC-CDI system is shown in Fig. 3. A laser source with 780 nm center wavelength and 50 KHz spectral linewidth is coupled into a single-mode fiber. The output light from the fiber is then collimated by two achromatic doublet lenses AL1 ($f = 50\,\,{\rm mm}$) and AL2 ($f = 100\,\,{\rm mm}$) with a beam diameter of approximately $10\,\,{\rm mm}$. We adopt a classical convex lens to realize the Fourier transformation of the object. Specifically, the moving object is located at the front focal plane of the achromatic doublet lens AL3 ($f = 100\,\,{\rm mm}$), where a slit diaphragm ($0.56\,\,{\rm mm}$) behind the object is used to control the imaging FoV, and the detection plane is at the back focal plane of AL3. A 4F system, consisting of two biconvex lenses, AL4 ($f = 50\,\,{\rm mm}$) and AL5 ($f = 100\,\,{\rm mm}$), is used to extend the spatial spectrum distribution, which can well match the modulator DMD, (TI, $1024 \times 768\;{\rm pixels}$, $13.68\,\,{\unicode{x00B5}{\rm m}}$ pixel pitch). The modulation patterns are pre-stored in the DMD, and change periodically over time, following a random binary distribution, composed of $\{0,1\}$. The encoded measurement is projected onto the camera (MV-CA013-A0UM, $1024 \times 1280$ pixels, $4.8\,\,{\unicode{x00B5}{\rm m}}$ pixel pitch) through an imaging system consisting of AL6 ($f = 100\,\,{\rm mm}$) and the objective lens (OL, $4 \times$, ${\rm NA} = 0.2$).
To verify the advantage of the CDI technique in high-spatial-resolution imaging, we compare the imaging resolution of the CDI technique using the DNN-HIO algorithm with a direct imaging technique. As shown in Fig. 4(a), two selected sets of line pairs in the USAF 1951 resolution target are chosen as objects. The spatial resolutions of the two selected sets are 57.02 line pair ${\rm (lp)/mm}$ and $14.25\;{\rm lp/mm}$. As can be observed from Figs. 4(c) and 4(d), CDI through the proposed DNN-HIO algorithm has much higher spatial resolution than direct imaging. This is because the spatial resolution of direct imaging is limited by the pixel size of the detector, whereas the spatial resolution of CDI is limited by the detection area of the detector. Therefore, for the fine structures of the object in a small FoV, CDI generally has higher spatial resolution than direct imaging.
Next, we consider the dynamic scenario. As shown in Fig. 5(b), a fraction of the resolution target composed of numbers “2” and “3” is selected as the object. The object moves at a high speed and is captured by the imaging system within a snapshot (50 ms). The pixel number of measurement is $512 \times 512$. By using the proposed network, the spatial spectral images of $T = 8$ are recovered, and then the spatial images are reconstructed. As shown in Fig. 5(d), two frames are selected due to the simple object. Overall, the reconstruction results demonstrate that the proposed TC-CDI technique can achieve dynamic video frames of moving targets within a single shot.
Last, we adopt TC-CDI to visualize the motion of complicated objects. The moving logo of Westlake University is observed (see Visualization 1). Figure 6 shows the reconstruction results at compression ratio $T = 8$, namely, eight frames are recovered from single-shot measurement, where each frame is $512 \times 512\;{\rm pixels}$ in size, but we cropped $100 \times 100\;{\rm pixels}$ for better visualization. Clearly, different parts of the logo can be observed entering and leaving the FoV over time, which illustrates that the complete motion of the object can be visualized with a single exposure. In addition, due to deep learning, the proposed DNN-HIO algorithm has higher reconstruction performance than HIO. A detailed comparison with different PR algorithms is shown in Supplement 1, in which we also show results of $T = 20$ high-quality frames reconstructed from a snapshot measurement.
In summary, we have proposed an imaging approach integrating SCI into CDI. A system has been built, and we have developed a two-stage deep-learning model for the reconstruction. In the first stage, we estimate frequency-domain frames by a physics-driven network, and then feed these frames into the second stage for PR. Experimental results demonstrate the efficacy of the proposed system and algorithm.
Existing CDI techniques can achieve the finest spatial resolution of about 2 nm using x-ray beams [20], and exposure time remains in seconds. For example, the CMI technique demands an exposure time of 3 s [11], while 3D reconstruction through a partial CDI technique [21] requires 16 s. Our proposed TC-CDI technique achieved a spatial resolution of 17.5 µm using a 780 nm wavelength laser, and its frame rate reached 160 fps. Although the spatial resolution of our TC-CDI is on the $\unicode{x00B5}{\rm m}$ level due to the near-infrared source, its temporal resolution is significantly improved by using the modulation method in the frequency domain. Our future work will use other laser sources such as x ray to improve spatial resolution.
For detectors with fixed dynamic ranges, such as eight bits, it is challenging for TC-CDI to recover high-dynamic-range scenes, especially in the noisy case. In general, the quality of reconstructed images through TC-CDI declines as the dynamic range of the detector decreases. Although our TC-CDI technique adopts random patterns, the dynamic ranges of TC-CDI can be further improved by optimizing the modulation patterns that block the reflective light from the middle area.
Compared to existing techniques that allow objects to move only in the FoV but cannot be localized, our system can restore the motion of objects moving at high speed with information in and out of the FoV through a single exposure. Our method can also be used to improve the insufficient resolution of micro objects. The proposed method has broad applications in microscopic imaging and x-ray intensity CDI. In the future, more effort will be needed to improve the acquisition range and compression ratio. A joint network combining the temporal reconstruction and PR can also be built.
Funding
Lochn Optics.
Acknowledgment
We acknowledge the Research Center for Industries of the Future (RCIF) at Westlake University for supporting this work, and the funding from Lochn Optics. We thank Dr. Youzhen Gui at the Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, for providing the near-infrared, single-frequency laser in our experiments.
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this Letter may be obtained at [22].
Supplemental document
See Supplement 1 for supporting content.
REFERENCES
1. J. Miao, P. Charalambous, J. Kirz, and D. Sayre, Nature 400, 342 (1999). [CrossRef]
2. J. Miao, Y. Nishino, Y. Kohmura, B. Johnson, C. Song, S. H. Risbud, and T. Ishikawa, Phys. Rev. Lett. 95, 085503 (2005). [CrossRef]
3. H. Jiang, C. Song, C.-C. Chen, R. Xu, K. S. Raines, B. P. Fahimian, C.-H. Lu, T.-K. Lee, A. Nakashima, J. Urano, T. Ishikawa, F. Tamanoi, and J. Miao, Proc. Natl. Acad. Sci. USA 107, 11234 (2010). [CrossRef]
4. M. A. Pfeifer, G. J. Williams, I. A. Vartanyants, R. Harder, and I. K. Robinson, Nature 442, 63 (2006). [CrossRef]
5. J. R. Fienup, Appl. Opt. 21, 2758 (1982). [CrossRef]
6. M. M. Seibert, T. Ekeberg, F. R. Maia, et al., Nature 470, 78 (2011). [CrossRef]
7. X. Huang, J. Nelson, J. Steinbrener, J. Kirz, J. J. Turner, and C. Jacobsen, Opt. Express 18, 26441 (2010). [CrossRef]
8. A. Barty, J. Küpper, and H. N. Chapman, Annu. Rev. Phys. Chem. 64, 415 (2013). [CrossRef]
9. R. Horisaki, R. Egami, and J. Tanida, Opt. Express 24, 3765 (2016). [CrossRef]
10. I. Johnson, K. Jefimovs, O. Bunk, C. David, M. Dierolf, J. Gray, D. Renker, and F. Pfeiffer, Phys. Rev. Lett. 100, 155503 (2008). [CrossRef]
11. F. Zhang, B. Chen, G. R. Morrison, J. Vila-Comamala, M. Guizar-Sicairos, and I. K. Robinson, Nat. Commun. 7, 13367 (2016). [CrossRef]
12. D. L. Donoho, IEEE Trans. Inf. Theory 52, 1289 (2006). [CrossRef]
13. P. Llull, X. Liao, X. Yuan, J. Yang, D. Kittle, L. Carin, G. Sapiro, and D. J. Brady, Opt. Express 21, 10526 (2013). [CrossRef]
14. A. A. Wagadarikar, N. P. Pitsianis, X. Sun, and D. J. Brady, Opt. Express 17, 6368 (2009). [CrossRef]
15. G. Barbastathis, A. Ozcan, and G. Situ, Optica 6, 921 (2019). [CrossRef]
16. X. Yuan, D. J. Brady, and A. K. Katsaggelos, IEEE Signal Process. Mag. 38(2), 65 (2021). [CrossRef]
17. J. Greenberg, K. Krishnamurthy, and D. Brady, Opt. Lett. 39, 111 (2014). [CrossRef]
18. Z. Cheng, B. Chen, G. Liu, H. Zhang, R. Lu, Z. Wang, and X. Yuan, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 16246–16255.
19. K. Zhang, W. Zuo, and L. Zhang, IEEE Trans. Image Process. 27, 4608 (2018). [CrossRef]
20. Y. Takahashi, Y. Nishino, R. Tsutsumi, N. Zettsu, E. Matsubara, K. Yamauchi, and T. Ishikawa, Phys. Rev. B 82, 214102 (2010). [CrossRef]
21. J. Clark, X. Huang, R. Harder, and I. Robinson, Nat. Commun. 3, 993 (2012). [CrossRef]
22. Z. Chen, S. Zheng, Z. Tong, and X. Yuan, “Code for temporal compressive coherent diffraction imaging,” Github (2022), https://github.com/zsm1211.