Abstract
Compressed ultrafast photography (CUP) is a computational imaging technology capable of capturing transient scenes in picosecond scale with a sequence depth of hundreds of frames. Since the inverse problem of CUP is an ill-posed problem, it is challenging to further improve the reconstruction quality under the condition of high noise level and compression ratio. In addition, there are many articles adding an external charge-coupled device (CCD) camera to the CUP system to form the time-unsheared view because the added constraint can improve the reconstruction quality of images. However, since the images are collected by different cameras, slight affine transformation may have great impacts on the reconstruction quality. Here, we propose an algorithm that combines the time-unsheared image constraint CUP system with unsupervised neural networks. Image registration network is also introduced into the network framework to learn the affine transformation parameters of input images. The proposed algorithm effectively utilizes the implicit image prior in the neural network as well as the extra hardware prior information brought by the time-unsheared view. Combined with image registration network, this joint learning model enables our proposed algorithm to further improve the quality of reconstructed images without training datasets. The simulation and experiment results demonstrate the application prospect of our algorithm in ultrafast event capture.
© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Capturing transient scenes with high speeds is crucial to comprehend the physical phenomena of various ultrafast processes. The streak camera is an ultrafast imaging tool with high spatio-temporal resolution. However, due to its characteristic of converting time information into spatial information, its imaging field of view (FOV) is limited to a single line to avoid signal superposition. The event is required to follow the same spatio-temporal pattern in order to view the space-time evolution of the entire 2D dynamic scene. The whole spatial distribution can only be seen through repeated shooting, so the dynamic scene need to be a repeatable phenomenon. For non-repeatable phenomena, such as supernova explosions and synchrotron radiation, this imaging method is not appropriate. Compressed ultrafast photography (CUP) [1–3] completely opens the entrance slit of the streak camera and adopts pseudo-random binary masks in front of the slit to superimpose 2D dynamic scenes at different moments, which is a passive computational imaging technology with a frame rate of 10 trillion frames per second (fps) and a sequence depth of hundreds of frames [4]. Based on compressed sensing (CS) [5,6], CUP collects the data cube in a single shot and restores it into a dynamic scene through the reconstruction algorithm, which has great advantages on recording non-repeatable and self-luminous phenomena. However, the high compression ratio induced by the large sequence depth makes restoring the dynamic scene an ill-posed problem. For the reconstruction of CUP, the reconstruction accuracy of two-step iterative shrinkage/thresholding (TwIST) [1,7] algorithm is unsatisfactory, which prevents the practical applications of CUP.
Currently, the optimization efforts are divided into two categories, hardware-based and algorithm-based. In terms of hardware, it is effective to increase the sampling rate by increasing the number of channels, so as to achieve the purpose of improving the reconstruction quality, such as the complementary dual-channel lossless CUP proposed by Liang et al. [8], the dual-channel CUP proposed by Yao et al. [9] and the multi-channel coupled CUP proposed by Yao et al. [10]. The genetic algorithm [11] proposed by Yang et al. optimize the coding masks and can also be regarded as a means of hardware optimization. In the current recovery approaches, using prior information is crucial to solving ill-posed problems, including model-based optimization and data-driven deep learning (DL) algorithms. Model-based optimization narrows down the range of potential solutions by designing various hand-crafted regularizers, so as to obtain better results. For example, TV [12] assumes the sparsity of data on gradients, BM3D [13] uses non-local similarity of natural images, weighted nuclear norm minimization (WNNM) [14] models the image into a low-rank structure to promote the low-dimensionality of image, and plug-and-play (PnP) structure [15,16] is used to combine ADMM framework with a trained deep denoising network where the nonlinear prior possessed by the advanced image denoiser is implicitly embedded into the model. In addition, there is a reconstruction algorithm that uses space and intensity to constrain images [17]. However, these model-based optimizations are inadequate for a wide variety of CUP applications. For data-driven deep learning algorithms [18–23], it is usually necessary to build a suitable network structure and learn the nonlinear mapping between input and output by training a large number of datasets to learn prior information. However, this method is still limited in actual applications. First, the datasets required for training have a very expensive acquisition cost. secondly, the data-driven deep learning algorithms have a narrow field of applicability, and lack sufficient generalizability in the absence of transfer learning processing. For example, DUN-3DUnet and EfficientSCI proposed in [20,21] are designed for the coding pattern of CACTI [24] and are not suitable for the coding pattern of CUP. The changes in masks, the number of image frames can also result in retraining, which is expensive in training cost especially for large pixel size and high compression ratio. In addition, data-driven DL algorithms are more challenging applying in situations where it is not easy to obtain the training datasets, such as medical imaging and laser inertial confinement fusion. These schemes can improve the image fidelity to a certain extent, but there are still great challenges in complex dynamic scenes measurement.
It is worth noting that for those methods which increase the sampling rate by adding the number of channels, their utilization of multiple channels presuppositions that the images of all channels match exactly with each other. However, taking [19] as an example, the paper adopts an additional external charge-coupled device (CCD) camera to form the time-unsheared view. In this case of imaging with different cameras, the matching premise is not easy to satisfy. First of all, the pixel sizes of images taken by different cameras are generally not identical. Secondly, the image may be rotated when passing through optical components such as the improperly positioned reflector. Finally, since the image size of the cameras is typically larger than the scene size, it is generally necessary to manually crop the images before reconstruction, and the corresponding pixels of different cameras may not match and have an offset during the cropping procedure. The slight affine transformation leads to CCD camera constraints that not only do not help to improve the reconstruction accuracy, but also degrade the performance of the reconstruction algorithm obviously (section 3.4). Recent studies have revealed that even without datasets, the convolutional neural network (CNN) structure itself has some regularization ability to capture a large number of statistical priors of low-level images, which is called deep image prior (DIP) [25]. DIP employs random noise as the input and learns appropriate network parameters from degraded images, which has been proven to be an effective tool to solve the reconstruction problems in spectral imaging [26], SIM [27], low light imaging [28], coherent phase imaging [29] and other computational imaging technologies. Different from those data-driven DL algorithms, inspired by non-data-driven DL approaches [30] such as DIP [25] and MMES [31], and in order to solve the above image mismatch problem, a new approach is proposed. In this paper, the untrained neural network is combined with the CS model to present a time-unsheared image constraint unsupervised (TUICU) learning algorithm based on image registration (IR) to solve CUP problems, called TUICU-IR. The proposed algorithm adopts an unsupervised deep learning framework which utilizes autoencoder network to learn the encoding and decoding process of the underlying image. Spatial transformer network (STN) [32] is introduced to carry out joint unsupervised learning affine transformation of CCD image, so as to achieve the purpose of image registration between the images of different views. Meanwhile, the reconstruction accuracy can be further improved by more effectively incorporating the hardware improvement [19] (CCD camera) into the algorithm. In addition, the proposed unsupervised DL algorithm can be suitable for those scenes where it is difficult to obtain the training datasets. The simulation and experimental results demonstrate that the proposed algorithm outperforms the existing CUP reconstruction algorithms, and has strong noise robustness, achieving the state-of-the-art CUP reconstruction results.
2. Principles
The equipment diagram of the CCD camera constraint CUP system is shown in Fig. 1. Firstly, the dynamic scene passes through a beam splitter, which divides the light into two beams, one of which is directly imaged by the external CCD camera, and the other is spatially encoded by the coding plate in front of the streak camera, which can also be replaced by the digital micromirror device (DMD). The encoded dynamic scene enters the streak camera and is converted into an electron beam by the photocathode. When passing through the scanning electric field, the electrons with different time of flight (ToF) are sheared by the scanning voltage, thus resulting in different deflections according to the time of arrival. After being enhanced by the micro-channel plate (MCP), the electrons reach the phosphor screen and are converted into the optical signal which is collected by the internal CCD camera to form a single 2D snapshot. Mathematically, the process of the streak camera can be expressed as:
where ${\boldsymbol{I}}$ denotes the original dynamic scene, $x$, $y$ is the space dimension, and $t$ is the time dimension, ${\boldsymbol{E}}$ denotes the snapshot finally collected by the internal CCD camera of the streak camera after a series of processes, $\textbf{C}$ denotes the spatial encoding process, $\textbf {S}$ denotes the temporal shearing step of streak camera, and $\textbf {T}$ denotes the process of spatio-temporal integration in the internal CCD camera. ${\boldsymbol{n}}$ denotes the noises in the collection process. Expressing $\textbf {TSC}$ in terms of $\textbf {O}$, the equation becomes where ${\boldsymbol{E}}\in \mathbb {R}^{N_{xy}}$, $\textbf {O}\in \mathbb {R}^{N_{xy}\times N_{xyt}}$, ${\boldsymbol{I}}\in \mathbb {R}^{N_{xyt}}$, $\textbf {n}\in \mathbb {R}^{N_{xy}}$, and $N_{x}$, $N_{y}$ and $N_{t}$ denote the numbers of discretized pixels in the $x$, $y$ and $t$ coordinates, $N_{xy}$ and $N_{xyt}$ denote $N_{x}* N_{y}$ and $N_{x}* N_{y}* N_{t}$, respectively. Correspondingly, the projection angle of the dynamic scene on the external CCD camera is parallel to the $t$-direction coordinate, which can be expressed asIt can be seen from Eqs. (2) and (3) that the number of elements in ${\boldsymbol{I}}$ is much larger than the number of elements in ${\boldsymbol{E}}$ and ${\boldsymbol{{{E}}}}_{ccd}$. Obviously, solving the inverse problem of the streak camera’s signal acquisition process is the ill-posed problem. According to CS theory [5], the original dynamic scene can be obtained by solving the following least squares optimization problem:
For the regularization term, total variation (TV) [33] is a widely used image prior that promotes sparsity of the gradient domain. There are many applications [1,33] that demonstrate the effectiveness of TV. The TV form of 3D images is as follows
Currently, supervised DL performs well because neural networks are built to learn the network parameters from a large number of datasets, aiming to solve the following problems through extensive training:
where $\Theta$ denotes the parameters in the neural network and ${f_\Theta }(\cdot )$ represents the CNN parameterized by $\Theta$. Ulyanov et al. utilized CNN-based networks in a different way [25]. They discovered that the signal component of the image is subjected to a low impedance when passing through the CNN structure while the noise component is subjected to a high impedance, which can be employed for image restoration. In this case, the optimization for DIP can be expressed as:DIP does not require any training datasets, adopts a random noise as the input, and initializes the network with random parameters. By iterative optimization, the output target is closer to the measured value, and $\left \| { {\boldsymbol{E}} - \textbf {O}{f_\Theta }( {\boldsymbol{Z}})} \right \|_2^2$ is declined. However, if we apply Eq. (9) directly to solve CUP problem, we will find that it is prone to overfitting and optimal solution can’t be obtained. By comparing the DIP objective function with our CS optimization problem, we can find that $\left \| { {\boldsymbol{E}} - \textbf {O}{f_\Theta }( {\boldsymbol{Z}})} \right \|_2^2$ in Eq. (9) actually corresponds to the data fidelity term in Eq. (5). Therefore, by combining the hardware system (CCD camera) with our algorithm, the DIP form of our optimization task can be represented as:
The relationship between the compressed image captured by the streak camera and the spatiotemporal information in the dynamic scene is established through the implicit prior of DIP. Additionally, the time-unsheared image taken by the external CCD camera is used to supervise the neural network and the image fidelity can be improved under the condition of satisfying the external CCD camera constraints. The weight of the external CCD camera, that is $\rho$ in Eq. (10), is generally set to 0.1. In addition, we combine the DIP optimization function with the traditional TV algorithm [34,35], so as to promote the sparsity of the image gradient on the basis of the application of DIP. This allows for a smoother final image. Therefore, Eq. (10) becomes:
This optimization problem can be realized by various autoencoder structures. As shown in Fig. 2, the autoencoder adopted in this paper is composed of multiple fully connected layers. By embedding the patched images into the autoencoder, these vectorized patches will be mapped to the low-dimensional space and then restored back, and the noise will be suppressed, so as to achieve the purpose of image restoration. It is worth mentioning that the difference between the input and output of the autoencoder is adopted as a term with a tradeoff coefficient $\tau$ for balancing the losses when calculating the loss function.
The schematic diagram of the streak camera in CUP system is shown in Fig. 3. Assuming the image is sheared in the $x$ direction, it is evident that this shearing effect is linearly independent from the $y$ direction, which indicates that the $\textbf {S}$-process (shearing process) in the above operation has no effect on the integration results of the $x$-axis direction in the measurement. $\textbf {T}_x$ is used to indicate the integration process along the $x$-axis, then the above process can be expressed as
It can be noted that the $\textbf {C}$ and $\textbf {T}$ processes are projected parallel to the $t$-axis, and there is no effect in the direction of the $x$ and $y$-axis. Therefore, the order of the two processes has no effect on the final results of Fig. 3 without $\textbf {S}$-process, so it is easy to obtain the following equation:
By combining Eq. (12) and Eq. (13), we can get:
As shown in Fig. 4, the results of the two processes are the same. $\textbf {TSC}\boldsymbol {I}(x,y,t)$ can be obtained from the compressed image collected by the streak camera, that is ${\boldsymbol{E}}$ in Eq. (1). In addition, $\textbf {T}\boldsymbol {I}(x,y,t)$ can be obtained from the external CCD camera, that is ${\boldsymbol{E}}_{ccd}$ in Eq. (3). $\textbf {C}$ in Fig. 4(b) is to encode the image using the encoding mask of the streak camera. The energy deposition in the $x$-axis direction of the compressed image captured by the streak camera is the same as that of the time-unsheared image captured by the external CCD camera after encoding. Equation (14) can be rewritten as
Equation (15) applies when there is no noise and the two images match exactly, but in practice, it is difficult for the two images to match perfectly due to rotations and shifts as well as inconsistent pixel sizes between the cameras. We can perform affine transformation on the time-unsheared image ${\boldsymbol{E}}_{ccd}$ collected by the external CCD camera, so as to achieve the following purpose,
However, transforming the images without damage to the image features is also a challenging problem. Fortunately, the STN network [32,36] provides us with ideas.
In linear algebra, operations such as translation, scaling, and rotation of images can be represented by matrix operations
STN [32] is a spatial transformation network which enables explicitly spatial transformation operations on images. STN network is divided into three parts. The first part is localization network which collects the feature map of input and generates the parameters of the spatial transformations through the hidden layer, that is, a, b, c, d, e, f. The second part is grid generator, which constructs a sampling network based on the predicted affine transformation parameters, and calculates the position of output pixels from each pixel in the input image by Eq. (17), which is essentially a coordinate mapping. Obviously, the pixel coordinate index obtained in this part, that is, $x^{\prime }$ and $y^{\prime }$, are not necessarily all integers. Then the third part is sampler which fills each pixel in the image according to the sampling network obtained in the second part (using bilinear interpolation, which takes into account the values of neighbouring pixels.).
STN network is initially designed as a component of supervised learning to improve the generalization of the model, aiming to improve the spatial invariance of CNN. By using STN, differences of the image location have little effect on subsequent operations, so that the tasks such as image classification can be better realized. In this context, we take it as a part of the untrained neural network, utilizing its ability to spatially transform images without changing image features, using ${\boldsymbol{E}}_{ccd}$ as the input of the network and Eq. (16) as the loss function. The network learns the affine transformation parameters between the time-unsheared image and the compressed image in an unsupervised manner and performs various affine transformation operations on the image to achieve the purpose of image registration. Our algorithm is called time-unsheared image constraint unsupervised image registration (TUICU-IR) algorithm, and the flowchart of TUICU-IR is shown in Fig. 5.
In general, the loss function is expressed as the mean square error (MSE), which is the form of L2-norm. However, the utilization of this form is based on the assumption that the data errors are Gaussian distributed and linear, under which the MSE can be derived by incorporating the probability density function into the maximum likelihood estimation (MLE). However, when there are large outliers, the effectiveness of the algorithm based on MSE loss will deteriorate significantly. Correntropy is a nonlinear measure of local similarity [37], and here we employ a loss function that is robust to large outliers called the correntropy-induced loss function (CLF) [38]. The empirical CLF between two samples A and B can be calculated as
In addition, it can be seen from Fig. 6(a) that there are variations in the quantity of frames superimposed on each column in the compressed image collected by the streak camera. The number of frames superimposed on the regions a and c is less than the number of frames superimposed on the region b. Therefore, when we calculate the first term of the loss function in Eq. (11), we also assign different weights according to the column units. The weight is proportional to the number of frames superimposed. The schematic representation of the relationship between the relative weights and the column index is observed in Fig. 6(b).
3. Results and discussion
3.1 Comparison with state-of-the-art methods
The proposed TUICU-IR algorithm is implemented in python, employing the Adam optimizer [40] with a learning rate of 0.015 and the iterative optimization times are set to 2000 on the computer with Intel Core i5-12400K CPU and NVIDIA Geforce GTX 3090 GPU. In addition, we adopt CLF as the loss function and $\beta$=1. We have chosen four motion scenes from the public datasets, Drop and Aerial, each containing 16 consecutive images with 256*256 pixels, Runner and Crash, each containing 24 consecutive images with 256*256 pixels. In addition, during our simulation procedure, we set the transmittance of the coding plate as 0.8, and the transmittance of the opaque region as 0.2 in order to be closer to the actual situation. Inspired by [41], we designed a coding plate with a sampling rate of 30%, indicating that 30% of the plate area is transparent and 70% is opaque. The compressed image is obtained through a forward process as illustrated in Fig. 3. Specifically, each image is encoded by the same pseudo-random binary mask with elements {0.2, 0.8}. Subsequently, every encoded image of the dataset is horizontally displaced by 1 pixel relative to the previous image, emulating the shearing operation of the streak camera. Finally, the projection integration is performed along the $t$-axis direction and the 2D measurement of the streak camera can be acquired. The external CCD measurement can be acquired with only the projection integration step. In the simulation procedure, we also use GAP-TV [12], FFDNet [42], FastDVDnet [15] and DeSCI [14] for comparison. It is worth mentioning that only our proposed algorithm uses two images, while all other algorithms use the streak camera image only. For the parameters of these algorithms, we take the default values in the literature. Peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) [43] are used as metrics to evaluate the quality of 3D data cube reconstruction. Table 1 presents the reconstruction results on the four datasets without any noise, and the best results are highlighted in bold font. It can be seen that our algorithm outperforms all other CUP reconstruction algorithms in terms of PSNR and SSIM.
We selected a single frame from each of the four datasets to evaluate the performance of different algorithms. Figure 7 displays the reconstruction results by different algorithms, together with the ground truths for comparison. From the presented results, GAP-TV, FFDNet, and DeSCI lost many image details, and generate various degrees of artifacts and blurriness. FFDNet even recovers some spatial structures that do not exist. For FastDVDnet, although the overall reconstruction quality appears to be satisfactory, from the enlarged image blocks in our red box, it can be seen that this is attributed to the fact that FastDVDnet is too smooth for the image processing, resulting in a subpar performance in capturing fine details. Our proposed algorithm is able to recover most of the spatial details, and the slight blurring in some local areas is due to the excessive number of reconstructed frames.
3.2 Image registration analysis
We construct four different modes of unregistered situations, named M1, M2, M3, M4, considering pixel size mismatch, pixel size mismatch + rotation, pixel size mismatch + pixel shifting and pixel size mismatch + rotation + pixel shifting in the real scenario respectively. The unregistered cases were all performed on the time-unsheared image collected by the external CCD image. M1 mode refers to the reduction of the time-unsheared image to 0.9375 times its original size using the "imresize" function in MATLAB. M2 mode indicates a 3$^{\circ }$ rotation on the basis of M1, which is realized by the "imrotate" function in MATLAB. M3 mode indicates a 3-pixel offset from M1. M4 mode indicates a 1$^{\circ }$ rotation on the basis of M3. Taking the Drop dataset as an example, Fig. 8 displays the effect of these mismatching modes.
Figure 9 displays the PSNR between the original time-unsheared image and the results registered by our image registration algorithm on these 4 different modes. Additionally, the comparison is conducted under several different noise conditions, including no noise, $\sigma _0$ = 5, 10, and 20 additive white Gaussian noise (AWGN), Poission noise and Poission noise with $\sigma _0$ = 5 AWGN respectively. It is worth noting that the same noise level is added to the time-unsheared image and the compressed image in each simulation. The horizontal lines in each graph indicate the PSNR between the original time-unsheared image and the noisy time-unsheared image with different noise conditions respectively. It is evident that when $\sigma _0$ = 10 or 20, PSNR of most images registered by the algorithm is higher than that of images adding AWGN with the same noise level directly in all modes. For the other noise conditions, PSNR of some images registered by the algorithm is still higher than that of images with the same noise level directly. This provides sufficient evidence to demonstrate the effectiveness of the image registration algorithm. Figure 10 shows the comparison of the Drop dataset before and after being registered with $\sigma _0$=10 AWGN in mismatching mode 2, together with the original time-unsheared image and time-unsheared image adding $\sigma _0$ = 10 AWGN directly for comparison.
3.3 Noise robustness analysis
In order to be closer to the actual situation, we also take into account the presence of noise in the image reconstruction process. AWGN with noise levels of $\sigma _0$ = 5, 10, 20, Poission noise and Poission noise with $\sigma _0$ = 5 AWGN is added to both the compressed image obtained by the streak camera and the time-unsheared image obtained by the external CCD camera simultaneously. It is worth noting that the noise level of both images is identical which implies that if the noise with $\sigma _0$ =10 is added to ${\boldsymbol{E}}$ in Eq. (1), then the noise with the same noise level will be added to ${\boldsymbol{{{E}}}}_{ccd}$ in Eq. (3). In order to demonstrate the effectiveness of our image registration algorithm in the reconstruction process, we also take the 4 unregistered modes into account. The iteration count of our algorithm is set to 1500. Table 2 presents the results of GAP-TV, FFDNet, FastDVDnet, DeSCI and our proposed TUICU (undamaged time-unsheared image) and TUICU-IR (four unregistered modes) algorithms with different noise conditions on the four datasets. It is evident that the reconstruction results of TUICU-IR have little difference or even small superiority compared to TUICU that with undamaged time-unsheared image. In addition, the proposed algorithms are more robust to noise, while reconstruction accuracy of other algorithms decreases rapidly with increasing noise level.
It is worth mentioning that the results of FFDNet and FastDVDnet are obtained under the carefully adjusted parameters. We select 2 frames each from the reconstruction results of Crash and Aerial datasets with noise level $\sigma _0$ = 5 and compare the performance of different algorithms, including the TUICU-IR algorithm with image registration, together with the ground truth for comparison. Figure 11 shows the reconstruction results of different algorithms with noise level $\sigma _0$ = 5. It is evident that even with unregistered ${\boldsymbol{{{E}}}}_{ccd}$, the reconstruction results of the TUICU-IR algorithm with the registration step show little difference compared to the TUICU algorithm, and may even reconstruct more details than the TUICU algorithm that using undamaged ${\boldsymbol{{{E}}}}_{ccd}$ (Crash#19 and Aerial#12). In summary, our proposed algorithm has higher reconstruction accuracy and stronger robustness to various kinds of noise compared with traditional algorithms.
3.4 Ablation study
We perform several different ablation studies. Firstly, we evaluate the impact of the image registration algorithm on CS reconstruction results. Specifically, we test the results that the unregistered time-unsheared image collected by the external CCD is directly applied to our TUICU algorithm without registration, even if this destruction is only a rotation of 1$^{\circ }$. As shown in Fig. 12, the distinction is not obvious between rotation of 1$^{\circ }$ or not visually. We apply the TUICU algorithm directly with the unregistered time-unsheared image, and the results are shown in Table 3, together with the results of TUICU algorithm when $\rho$ = 0 in Eq. (11), which refers to the unsupervised CUP reconstruction algorithm without external CCD image constraints.
It can be seen from Table 3 that the direct application of the TUICU algorithm on the unregistered time-unsheared image may even not be as good as the results without the time-unsheared image constraints. As shown in Fig. 13, we select a single frame from the reconstruction results of Drop and Runner datasets respectively under the condition of noiseless, together with the ground truths and the error map. It is evident that without the registration step, the reconstruction results are prone to produce more errors in the areas that the time-unsheared image not matching the actual data, leading to a quick reduction of SSIM value.
We also evaluate the reconstruction results by MSE instead of CLF as the loss function. Table 4 displays the reconstruction results of TUICU-IR algorithm using MSE as the loss function with AWGN $\sigma _0$ = 10 and 20 in M1 mode. It is worth noting that the CLF is still being utilized as the loss function in the image registration step, otherwise the registration operation may fail.
Compared with Table 2, we can find that the PSNR (SSIM) of the reconstruction results using CLF as the loss function with noise level $\sigma _0$ = 10 and 20 has an improvement of 0.17dB (0.0239), 0.24dB (0.0578) respectively relative to the reconstruction results using MSE as the loss function.
In addition, as shown in Table 5, we evaluate the reconstruction results by using the general compressed image loss function instead of our proposed weighted compressed image loss function of Fig. 6(b) on the condition of noise level $\sigma _0$ =10 and 20 in M1 mode.
Compared with Table 2, we can find that the PSNR (SSIM) of the reconstruction results using our proposed weighted compressed image loss function with noise level $\sigma _0$ = 10 and 20 has an improvement of 0.17dB (0.0239), 0.24dB (0.0578) respectively relative to the reconstruction results using the general compressed image loss function.
3.5 Experiment
In order to verify the reliability of the proposed algorithm in practical experiments, the nanosecond laser pulse illuminating on our manually created "E" is imaged. The experimental system is shown in Fig. 1. The laser pulse is divided into two beams by a beam splitter, with one beam directly towards the streak camera and the other towards the external CCD camera. We successfully measure the spatiotemporal intensity evolution of the nanosecond laser pulse by combining our proposed TUICU-IR algorithm with a time-unsheared image constraint CUP system.
The images collected by the two cameras are shown in Fig. 14. There is an obvious mismatch between the two images. As shown in Fig. 14, the correlation coefficient between the two vectors obtained by integrating along the $x$-axis direction is 0.987.
The laser with a full width at half maximum (FWHM) of 15 ns is employed, and 43 frames are reconstructed with an interval of 0.89ns each frame. The movie of the results reconstructed by our proposed algorithm is available in Visualization 1. We selected a representative frame at intervals of 5 frames, with a frame spacing of 4.45 ns. Figure 15(a) shows the comparison of the reconstruction results by different methods. It can be seen that our algorithm is able to well demonstrates the spatio-temporal evolution of the laser pulse, while the other algorithms exhibit noticeable offset traces and lack apparent spatio-temporal intensity evolution. Figure 15(b) displays the relationship between normalized intensity and time for the reconstructed images by different methods. The FWHM values obtained from the curves of GAP-TV, FFDNet, FastDVDnet and our proposed TUICU-IR algorithms are 22.36 ns, 21.46 ns, 22.73 ns and 15.03ns respectively, which indicates that the FWHM of our proposed algorithm is closest to the real situation.
4. Conclusion
We propose a time-unsheared image constraint unsupervised DL image registration algorithm for CUP. The proposed method can register the mismatched time-unsheared image with the compressed image, then the registered time-unsheared image and the compressed image are integrated into an end-to-end unsupervised DL framework that can obtain the reconstruction of a single compressed image by using the CNN model as the image prior. Extensive simulations and experiments demonstrate the superior performance of the proposed method compared with the widely used CUP reconstruction algorithms and the method has demonstrated strong robustness to noise. Furthermore, more prior information can be integrated into the proposed unsupervised DL framework to improve the reconstruction accuracy, thus providing a better platform for CUP reconstruction of ultrafast events.
Funding
National Natural Science Foundation of China (11975184).
Disclosures
The authors declare no conflict of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
References
1. L. Gao, J. Liang, C. Li, et al., “Single-shot compressed ultrafast photography at one hundred billion frames per second,” Nature 516(7529), 74–77 (2014). [CrossRef]
2. D. Qi, S. Zhang, and C. Yang, “Single-shot compressed ultrafast photography: a review,” Adv. Photonics 2(01), 1 (2020). [CrossRef]
3. P. Wang, J. Liang, and L. V. Wang, “Single-shot ultrafast imaging attaining 70 trillion frames per second,” Nat. Commun. 11(1), 2091 (2020). [CrossRef]
4. J. Liang, L. Zhu, and L. V. Wang, “Single-shot real-time femtosecond imaging of temporal focusing,” Light: Sci. Appl. 7(1), 42–475 (2018). [CrossRef]
5. I. Orovic, V. Papic, and C. Ioana, “Compressive sensing in signal processing: Algorithms and transform domain formulations,” Math Probl. Eng. 2016, 1–16 (2016). [CrossRef]
6. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory 52(2), 489–509 (2006). [CrossRef]
7. J. M. Bioucas-Dias and M. A. T. Figueiredo, “A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. on Image Process. 16(12), 2992–3004 (2007). [CrossRef]
8. J. Liang, C. Ma, and L. Zhu, “Single-shot real-time video recording of a photonic mach cone induced by a scattered light pulse,” Sci. Adv. 3(1), e1601814 (2017). [CrossRef]
9. Z. M. Yao, L. Sheng, and Y. Song, “Dual-channel compressed ultrafast photography for z-pinch dynamic imaging,” Rev. Sci. Instrum. 94(3), 035106 (2023). [CrossRef]
10. J. Yao, D. Qi, and C. Yang, “Multichannel-coupled compressed ultrafast photography,” J. Opt. 22(8), 085701 (2020). [CrossRef]
11. C. Yang, D. Qi, and X. Wang, “Optimizing codes for compressed ultrafast photography by the genetic algorithm,” Optica 5(2), 147–151 (2018). [CrossRef]
12. X. Yuan, “Generalized alternating projection based total variation minimization for compressive sensing,” in IEEE International Conference on Image Processing, (2016), pp. 2539–2543.
13. K. Dabov, A. Foi, V. Katkovnik, et al., “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Trans. on Image Process. 16(8), 2080–2095 (2007). [CrossRef]
14. Y. Liu, X. Yuan, and J. Suo, “Rank minimization for snapshot compressive imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2990–3006 (2019). [CrossRef]
15. X. Yuan, Y. Liu, and J. Suo, “Plug-and-play algorithms for video snapshot compressive imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 7093–7111 (2022). [CrossRef]
16. C. Z. Jin, D. Qi, and J. Yao, “Weighted multi-scale denoising via adaptive multi-channel fusion for compressed ultrafast photography,” Opt. Express 30(17), 31157–31170 (2022). [CrossRef]
17. L. Zhu, Y. Chen, and J. Liang, “Space- and intensity-constrained reconstruction for compressed ultrafast photography,” Optica 3(7), 694–697 (2016). [CrossRef]
18. A. Zhang, J. Wu, and J. Suo, “Single-shot compressed ultrafast photography based on u-net network,” Opt. Express 28(26), 39299–39310 (2020). [CrossRef]
19. Y. Ma, X. Feng, and L. Gao, “Deep-learning-based image reconstruction for compressed ultrafast photography,” Opt. Lett. 45(16), 4400–4403 (2020). [CrossRef]
20. Z. Wu, J. Zhang, and C. Mou, “Dense deep unfolding network with 3d-cnn prior for snapshot compressive imaging,” in IEEE International Conference on Computer Vision (ICCV), (2021).
21. L. Wang, M. Cao, and X. Yuan, “Efficientsci: Densely connected network with space-time factorization for large-scale video snapshot compressive imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), pp. 18477–18486.
22. C. Yang, S. Zhang, and X. Yuan, “Ensemble learning priors driven deep unfolding for scalable video snapshot compressive imaging,” in IEEE European Conference on Computer Vision (ECCV), (2022).
23. Y. Li, M. Qi, R. Gulve, et al., “End-to-end video compressive sensing using anderson-accelerated unrolled networks,” in 2020 IEEE international conference on computational photography (ICCP), (2020), pp. 1–12.
24. P. Llull, X. Liao, and X. Yuan, “Coded aperture compressive temporal imaging,” Opt. Express 21(9), 10526–10545 (2013). [CrossRef]
25. D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Deep image prior,” Int. J. Comput. Vis. 128(7), 1867–1888 (2020). [CrossRef]
26. Z. Meng, Z. Yu, K. Xu, et al., “Self-supervised neural networks for spectral snapshot compressive imaging,” in IEEE/CVF International Conference on Computer Vision, (2021), pp. 2602–2611.
27. Y. He, Y. Yao, and Y. He, “Untrained neural network enhances the resolution of structured illumination microscopy under strong background and noise levels,” Adv. Photonics Nexus 2(04), 046005 (2023). [CrossRef]
28. H. Lee, K. Sohn, and D. Min, “Unsupervised low-light image enhancement using bright channel prior,” IEEE Signal Process. Lett. 27, 251–255 (2020). [CrossRef]
29. F. Wang, Y. Bian, and H. Wang, “Phase imaging with an untrained neural network,” Light, Science & Applications 9(1), 77 (2020). [CrossRef]
30. A. Qayyum, I. Ilahi, F. Shamshad, et al., “Untrained neural network priors for inverse imaging problems: A survey,” IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 6511–6536 (2022). [CrossRef]
31. T. Yokota, H. Hontani, Q. Zhao, et al., “Manifold modeling in embedded space: An interpretable alternative to deep image prior,” IEEE Trans. Neural Netw. Learning Syst. 33(3), 1022–1036 (2022). [CrossRef]
32. M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” arXiv, arXiv:abs1506.02025 (2015). [CrossRef]
33. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1-4), 259–268 (1992). [CrossRef]
34. J. Liu, Y. Sun, X. Xu, et al., “Image restoration using total variation regularized deep image prior,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2018), pp. 7715–7719.
35. M. Qiao, X. Liu, and X. Yuan, “Snapshot temporal compressive microscopy using an iterative algorithm with untrained neural networks,” Opt. Lett. 46(8), 1888–1891 (2021). [CrossRef]
36. J. Nie, L. Zhang, W. Wei, et al., “Unsupervised deep hyperspectral super-resolution with unregistered images,” in 2020 IEEE International Conference on Multimedia and Expo (ICME), (2020), pp. 1–6.
37. W. Liu, P. P. Pokharel, and J. C. Príncipe, “Correntropy: Properties and applications in non-gaussian signal processing,” IEEE Trans. Signal Process. 55(11), 5286–5298 (2007). [CrossRef]
38. L. Chen, H. Qu, and J. hong Zhao, “Efficient and robust deep learning with correntropy-induced loss function,” Neural Comput. Appl. 27(4), 1019–1031 (2016). [CrossRef]
39. V. N. Vapni, “The nature of statistical learning theory,” in Springer, (1995).
40. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv abs/1412.6980 (2014). [CrossRef]
41. M. Zhao and S. Jalali, “Theoretical analysis of binary masks in snapshot compressive imaging systems,” in 2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton), (2023), pp. 1–8.
42. X. Yuan, Y. Liu, J. Suo, et al., “Plug-and-play algorithms for large-scale snapshot compressive imaging,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 1444–1454.
43. Z. Wang, A. C. Bovik, H. R. Sheikh, et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]