Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Unsupervised reconstruction with a registered time-unsheared image constraint for compressed ultrafast photography

Open Access Open Access

Abstract

Compressed ultrafast photography (CUP) is a computational imaging technology capable of capturing transient scenes in picosecond scale with a sequence depth of hundreds of frames. Since the inverse problem of CUP is an ill-posed problem, it is challenging to further improve the reconstruction quality under the condition of high noise level and compression ratio. In addition, there are many articles adding an external charge-coupled device (CCD) camera to the CUP system to form the time-unsheared view because the added constraint can improve the reconstruction quality of images. However, since the images are collected by different cameras, slight affine transformation may have great impacts on the reconstruction quality. Here, we propose an algorithm that combines the time-unsheared image constraint CUP system with unsupervised neural networks. Image registration network is also introduced into the network framework to learn the affine transformation parameters of input images. The proposed algorithm effectively utilizes the implicit image prior in the neural network as well as the extra hardware prior information brought by the time-unsheared view. Combined with image registration network, this joint learning model enables our proposed algorithm to further improve the quality of reconstructed images without training datasets. The simulation and experiment results demonstrate the application prospect of our algorithm in ultrafast event capture.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Capturing transient scenes with high speeds is crucial to comprehend the physical phenomena of various ultrafast processes. The streak camera is an ultrafast imaging tool with high spatio-temporal resolution. However, due to its characteristic of converting time information into spatial information, its imaging field of view (FOV) is limited to a single line to avoid signal superposition. The event is required to follow the same spatio-temporal pattern in order to view the space-time evolution of the entire 2D dynamic scene. The whole spatial distribution can only be seen through repeated shooting, so the dynamic scene need to be a repeatable phenomenon. For non-repeatable phenomena, such as supernova explosions and synchrotron radiation, this imaging method is not appropriate. Compressed ultrafast photography (CUP) [13] completely opens the entrance slit of the streak camera and adopts pseudo-random binary masks in front of the slit to superimpose 2D dynamic scenes at different moments, which is a passive computational imaging technology with a frame rate of 10 trillion frames per second (fps) and a sequence depth of hundreds of frames [4]. Based on compressed sensing (CS) [5,6], CUP collects the data cube in a single shot and restores it into a dynamic scene through the reconstruction algorithm, which has great advantages on recording non-repeatable and self-luminous phenomena. However, the high compression ratio induced by the large sequence depth makes restoring the dynamic scene an ill-posed problem. For the reconstruction of CUP, the reconstruction accuracy of two-step iterative shrinkage/thresholding (TwIST) [1,7] algorithm is unsatisfactory, which prevents the practical applications of CUP.

Currently, the optimization efforts are divided into two categories, hardware-based and algorithm-based. In terms of hardware, it is effective to increase the sampling rate by increasing the number of channels, so as to achieve the purpose of improving the reconstruction quality, such as the complementary dual-channel lossless CUP proposed by Liang et al. [8], the dual-channel CUP proposed by Yao et al. [9] and the multi-channel coupled CUP proposed by Yao et al. [10]. The genetic algorithm [11] proposed by Yang et al. optimize the coding masks and can also be regarded as a means of hardware optimization. In the current recovery approaches, using prior information is crucial to solving ill-posed problems, including model-based optimization and data-driven deep learning (DL) algorithms. Model-based optimization narrows down the range of potential solutions by designing various hand-crafted regularizers, so as to obtain better results. For example, TV [12] assumes the sparsity of data on gradients, BM3D [13] uses non-local similarity of natural images, weighted nuclear norm minimization (WNNM) [14] models the image into a low-rank structure to promote the low-dimensionality of image, and plug-and-play (PnP) structure [15,16] is used to combine ADMM framework with a trained deep denoising network where the nonlinear prior possessed by the advanced image denoiser is implicitly embedded into the model. In addition, there is a reconstruction algorithm that uses space and intensity to constrain images [17]. However, these model-based optimizations are inadequate for a wide variety of CUP applications. For data-driven deep learning algorithms [1823], it is usually necessary to build a suitable network structure and learn the nonlinear mapping between input and output by training a large number of datasets to learn prior information. However, this method is still limited in actual applications. First, the datasets required for training have a very expensive acquisition cost. secondly, the data-driven deep learning algorithms have a narrow field of applicability, and lack sufficient generalizability in the absence of transfer learning processing. For example, DUN-3DUnet and EfficientSCI proposed in [20,21] are designed for the coding pattern of CACTI [24] and are not suitable for the coding pattern of CUP. The changes in masks, the number of image frames can also result in retraining, which is expensive in training cost especially for large pixel size and high compression ratio. In addition, data-driven DL algorithms are more challenging applying in situations where it is not easy to obtain the training datasets, such as medical imaging and laser inertial confinement fusion. These schemes can improve the image fidelity to a certain extent, but there are still great challenges in complex dynamic scenes measurement.

It is worth noting that for those methods which increase the sampling rate by adding the number of channels, their utilization of multiple channels presuppositions that the images of all channels match exactly with each other. However, taking [19] as an example, the paper adopts an additional external charge-coupled device (CCD) camera to form the time-unsheared view. In this case of imaging with different cameras, the matching premise is not easy to satisfy. First of all, the pixel sizes of images taken by different cameras are generally not identical. Secondly, the image may be rotated when passing through optical components such as the improperly positioned reflector. Finally, since the image size of the cameras is typically larger than the scene size, it is generally necessary to manually crop the images before reconstruction, and the corresponding pixels of different cameras may not match and have an offset during the cropping procedure. The slight affine transformation leads to CCD camera constraints that not only do not help to improve the reconstruction accuracy, but also degrade the performance of the reconstruction algorithm obviously (section 3.4). Recent studies have revealed that even without datasets, the convolutional neural network (CNN) structure itself has some regularization ability to capture a large number of statistical priors of low-level images, which is called deep image prior (DIP) [25]. DIP employs random noise as the input and learns appropriate network parameters from degraded images, which has been proven to be an effective tool to solve the reconstruction problems in spectral imaging [26], SIM [27], low light imaging [28], coherent phase imaging [29] and other computational imaging technologies. Different from those data-driven DL algorithms, inspired by non-data-driven DL approaches [30] such as DIP [25] and MMES [31], and in order to solve the above image mismatch problem, a new approach is proposed. In this paper, the untrained neural network is combined with the CS model to present a time-unsheared image constraint unsupervised (TUICU) learning algorithm based on image registration (IR) to solve CUP problems, called TUICU-IR. The proposed algorithm adopts an unsupervised deep learning framework which utilizes autoencoder network to learn the encoding and decoding process of the underlying image. Spatial transformer network (STN) [32] is introduced to carry out joint unsupervised learning affine transformation of CCD image, so as to achieve the purpose of image registration between the images of different views. Meanwhile, the reconstruction accuracy can be further improved by more effectively incorporating the hardware improvement [19] (CCD camera) into the algorithm. In addition, the proposed unsupervised DL algorithm can be suitable for those scenes where it is difficult to obtain the training datasets. The simulation and experimental results demonstrate that the proposed algorithm outperforms the existing CUP reconstruction algorithms, and has strong noise robustness, achieving the state-of-the-art CUP reconstruction results.

2. Principles

The equipment diagram of the CCD camera constraint CUP system is shown in Fig. 1. Firstly, the dynamic scene passes through a beam splitter, which divides the light into two beams, one of which is directly imaged by the external CCD camera, and the other is spatially encoded by the coding plate in front of the streak camera, which can also be replaced by the digital micromirror device (DMD). The encoded dynamic scene enters the streak camera and is converted into an electron beam by the photocathode. When passing through the scanning electric field, the electrons with different time of flight (ToF) are sheared by the scanning voltage, thus resulting in different deflections according to the time of arrival. After being enhanced by the micro-channel plate (MCP), the electrons reach the phosphor screen and are converted into the optical signal which is collected by the internal CCD camera to form a single 2D snapshot. Mathematically, the process of the streak camera can be expressed as:

$${\boldsymbol{E}}(x^{\prime},y^{\prime})= \textbf{TSC}\boldsymbol{I}(x,y,t)+{\boldsymbol{n}},$$
where ${\boldsymbol{I}}$ denotes the original dynamic scene, $x$, $y$ is the space dimension, and $t$ is the time dimension, ${\boldsymbol{E}}$ denotes the snapshot finally collected by the internal CCD camera of the streak camera after a series of processes, $\textbf{C}$ denotes the spatial encoding process, $\textbf {S}$ denotes the temporal shearing step of streak camera, and $\textbf {T}$ denotes the process of spatio-temporal integration in the internal CCD camera. ${\boldsymbol{n}}$ denotes the noises in the collection process. Expressing $\textbf {TSC}$ in terms of $\textbf {O}$, the equation becomes
$${\boldsymbol{E}}(x^{\prime},y^{\prime})= \textbf{O}\boldsymbol{I}(x,y,t)+{\boldsymbol{n}},$$
where ${\boldsymbol{E}}\in \mathbb {R}^{N_{xy}}$, $\textbf {O}\in \mathbb {R}^{N_{xy}\times N_{xyt}}$, ${\boldsymbol{I}}\in \mathbb {R}^{N_{xyt}}$, $\textbf {n}\in \mathbb {R}^{N_{xy}}$, and $N_{x}$, $N_{y}$ and $N_{t}$ denote the numbers of discretized pixels in the $x$, $y$ and $t$ coordinates, $N_{xy}$ and $N_{xyt}$ denote $N_{x}* N_{y}$ and $N_{x}* N_{y}* N_{t}$, respectively. Correspondingly, the projection angle of the dynamic scene on the external CCD camera is parallel to the $t$-direction coordinate, which can be expressed as
$${\boldsymbol{{{E}}}}_{ccd}(x^{\prime},y^{\prime})= \textbf{T}\boldsymbol{I}(x,y,t)+{\boldsymbol{n}},$$
where ${\boldsymbol{{{E}}}}_{ccd}$ represents the scene collected by the external CCD camera, ${\boldsymbol{I}}$ denotes the original dynamic scene and $\textbf {T}$ denotes the process of spatio-temporal integration in the external CCD camera.

 figure: Fig. 1.

Fig. 1. The equipment diagram of the CCD camera constraint CUP system, CCD: charge-coupled device, MCP: micro-channel plate.

Download Full Size | PDF

It can be seen from Eqs. (2) and (3) that the number of elements in ${\boldsymbol{I}}$ is much larger than the number of elements in ${\boldsymbol{E}}$ and ${\boldsymbol{{{E}}}}_{ccd}$. Obviously, solving the inverse problem of the streak camera’s signal acquisition process is the ill-posed problem. According to CS theory [5], the original dynamic scene can be obtained by solving the following least squares optimization problem:

$$\hat{{\boldsymbol{I}}}=\mathop {\arg \min }_{{\boldsymbol{I}}}\left\| {{\boldsymbol{E}} - \textbf{O}{\boldsymbol{I}}} \right\|_2^2 + \lambda \mathit{R}({\boldsymbol{I}}),$$
where the fidelity term ensures consistency with the measurement, and $\mathit {R}( {\boldsymbol{I}})$ is the regularization term that constraints the signal space to the proper range. $\lambda$ denotes the coefficient of the regularization term, which is used to balance the fidelity term and the regularization term. As for the information acquired by the external CCD camera, we can regard it as both the fidelity term and the prior information. Therefore, we add the constraints of the external CCD camera to the Eq. (4), and it becomes
$$\hat{{\boldsymbol{I}}}=\mathop {\arg \min }_{{\boldsymbol{I}}}\left\| {{\boldsymbol{E}} - \textbf{O}{\boldsymbol{I}}} \right\|_2^2 +{\rho \left\| {{\boldsymbol{{{E}}}}_{ccd}} - \textbf{T}{\boldsymbol{I}} \right\|_2^2}+ \lambda \mathit{R}({\boldsymbol{I}}),$$
where $\rho$ represents the weight of the external CCD camera constraint.

For the regularization term, total variation (TV) [33] is a widely used image prior that promotes sparsity of the gradient domain. There are many applications [1,33] that demonstrate the effectiveness of TV. The TV form of 3D images is as follows

$${TV_{3D}}\left( {\boldsymbol{x}} \right) = \sum_P^{} {\sqrt {\left( {{{\boldsymbol{D}}_{\boldsymbol{h}}}{\boldsymbol{x}}} \right){{\left( P \right)}^2} + \left({{{\boldsymbol{D}}_{\boldsymbol{v}}}{\boldsymbol{x}}} \right){{\left( P \right)}^2} + \left( {{{\boldsymbol{D}}_{\boldsymbol{t}}}{\boldsymbol{x}}} \right){{\left( P \right)}^2}} },$$
where ${{ {\boldsymbol{D}}_ {\boldsymbol{h}}} {\boldsymbol{x}}}$, ${{ {\boldsymbol{D}}_ {\boldsymbol{v}}} {\boldsymbol{x}}}$ and ${{ {\boldsymbol{D}}_ {\boldsymbol{t}}} {\boldsymbol{x}}}$ denotes the discrete derivative operator in the horizontal, vertical, and temporal directions respectively, and $P$ traverses all voxels along the three dimensions respectively.

Currently, supervised DL performs well because neural networks are built to learn the network parameters from a large number of datasets, aiming to solve the following problems through extensive training:

$$\hat \Theta = \mathop {\arg \min }_\Theta Loss({f_\Theta }(y),x),$$
where $\Theta$ denotes the parameters in the neural network and ${f_\Theta }(\cdot )$ represents the CNN parameterized by $\Theta$. Ulyanov et al. utilized CNN-based networks in a different way [25]. They discovered that the signal component of the image is subjected to a low impedance when passing through the CNN structure while the noise component is subjected to a high impedance, which can be employed for image restoration. In this case, the optimization for DIP can be expressed as:
$$\hat \Theta = \mathop {\arg \min }_\Theta E({f_\Theta }({\boldsymbol{Z}}),{{\boldsymbol{x}}_0}), \quad {s.t.} \quad {\hat {{\boldsymbol{x}}} = {f_{\hat \Theta }}({\boldsymbol{Z}})},$$
where $E({f_\Theta }({\boldsymbol{Z}}),{{\boldsymbol{x}}_0})$ is a function corresponding to a specific task, and ${\boldsymbol{Z}}$ represents a random input noise. For our CS recovery task, Eq. (8) can be written as:
$$\hat \Theta = \mathop {\arg \min }_\Theta \left\| {{\boldsymbol{E}} - \textbf{O}{f_\Theta }({\boldsymbol{Z}})} \right\|_2^2, \quad {s.t.} \quad {\hat {{\boldsymbol{I}}} = {f_{\hat \Theta }}({\boldsymbol{Z}})}.$$

DIP does not require any training datasets, adopts a random noise as the input, and initializes the network with random parameters. By iterative optimization, the output target is closer to the measured value, and $\left \| { {\boldsymbol{E}} - \textbf {O}{f_\Theta }( {\boldsymbol{Z}})} \right \|_2^2$ is declined. However, if we apply Eq. (9) directly to solve CUP problem, we will find that it is prone to overfitting and optimal solution can’t be obtained. By comparing the DIP objective function with our CS optimization problem, we can find that $\left \| { {\boldsymbol{E}} - \textbf {O}{f_\Theta }( {\boldsymbol{Z}})} \right \|_2^2$ in Eq. (9) actually corresponds to the data fidelity term in Eq. (5). Therefore, by combining the hardware system (CCD camera) with our algorithm, the DIP form of our optimization task can be represented as:

$$\hat \Theta = \mathop {\arg \min }_\Theta \left( {\left\| {{\boldsymbol{E}} - \textbf{O}{f_\Theta }({\boldsymbol{Z}})} \right\|_2^2 + \rho \left\| {{{{\boldsymbol{E}}}_{ccd}} - \textbf{T}{f_\Theta }({\boldsymbol{Z}})} \right\|_2^2 + \lambda R\left( {{f_\Theta }({\boldsymbol{Z}})} \right)} \right), \quad {s.t.} \quad {\hat {{\boldsymbol{I}}} = {f_{\hat \Theta }}({\boldsymbol{Z}})}.$$

The relationship between the compressed image captured by the streak camera and the spatiotemporal information in the dynamic scene is established through the implicit prior of DIP. Additionally, the time-unsheared image taken by the external CCD camera is used to supervise the neural network and the image fidelity can be improved under the condition of satisfying the external CCD camera constraints. The weight of the external CCD camera, that is $\rho$ in Eq. (10), is generally set to 0.1. In addition, we combine the DIP optimization function with the traditional TV algorithm [34,35], so as to promote the sparsity of the image gradient on the basis of the application of DIP. This allows for a smoother final image. Therefore, Eq. (10) becomes:

$$\hat \Theta = \mathop {\arg \min }_\Theta \left( {\left\| {{\boldsymbol{E}} - \textbf{O}{f_\Theta }({\boldsymbol{Z}})} \right\|_2^2 + \rho \left\| {{{{\boldsymbol{E}}}_{ccd}} - \textbf{T}{f_\Theta }({\boldsymbol{Z}})} \right\|_2^2 + \lambda TV_{3D}\left( {{f_\Theta }({\boldsymbol{Z}})} \right)} \right), \quad {s.t.} \quad {\hat {{\boldsymbol{I}}} = {f_{\hat \Theta }}({\boldsymbol{Z}})}.$$

This optimization problem can be realized by various autoencoder structures. As shown in Fig. 2, the autoencoder adopted in this paper is composed of multiple fully connected layers. By embedding the patched images into the autoencoder, these vectorized patches will be mapped to the low-dimensional space and then restored back, and the noise will be suppressed, so as to achieve the purpose of image restoration. It is worth mentioning that the difference between the input and output of the autoencoder is adopted as a term with a tradeoff coefficient $\tau$ for balancing the losses when calculating the loss function.

 figure: Fig. 2.

Fig. 2. Network architecture of the autoencoder, $W_1$, $W_2$, $W_3$, $W_4$: parameters of every two layers, $X_1$, $X_2$, $X_3$, $X_4$, $X_5$: the elements of input, $X_{1}^{\prime }$, $X_2^{\prime }$, $X_3^{\prime }$, $X_4^{\prime }$, $X_5^{\prime }$: the elements of output.

Download Full Size | PDF

The schematic diagram of the streak camera in CUP system is shown in Fig. 3. Assuming the image is sheared in the $x$ direction, it is evident that this shearing effect is linearly independent from the $y$ direction, which indicates that the $\textbf {S}$-process (shearing process) in the above operation has no effect on the integration results of the $x$-axis direction in the measurement. $\textbf {T}_x$ is used to indicate the integration process along the $x$-axis, then the above process can be expressed as

$${\textbf{T}_x}\textbf{TSC}\boldsymbol{I}(x,y,t) = {\textbf{T}_x}\textbf{TC}\boldsymbol{I}(x,y,t).$$

 figure: Fig. 3.

Fig. 3. The schematic diagram of the streak camera in CUP system, $\textbf {C}$: coding, $\textbf {S}$: shearing, $\textbf {T}$: spatiotemporal integration.

Download Full Size | PDF

It can be noted that the $\textbf {C}$ and $\textbf {T}$ processes are projected parallel to the $t$-axis, and there is no effect in the direction of the $x$ and $y$-axis. Therefore, the order of the two processes has no effect on the final results of Fig. 3 without $\textbf {S}$-process, so it is easy to obtain the following equation:

$$\textbf{TC}\boldsymbol{I}(x,y,t) = \textbf{CT}\boldsymbol{I}(x,y,t).$$

By combining Eq. (12) and Eq. (13), we can get:

$${\textbf{T}_x}\textbf{TSC}\boldsymbol{I}(x,y,t) = {\textbf{T}_x}\textbf{CT}\boldsymbol{I}(x,y,t).$$

As shown in Fig. 4, the results of the two processes are the same. $\textbf {TSC}\boldsymbol {I}(x,y,t)$ can be obtained from the compressed image collected by the streak camera, that is ${\boldsymbol{E}}$ in Eq. (1). In addition, $\textbf {T}\boldsymbol {I}(x,y,t)$ can be obtained from the external CCD camera, that is ${\boldsymbol{E}}_{ccd}$ in Eq. (3). $\textbf {C}$ in Fig. 4(b) is to encode the image using the encoding mask of the streak camera. The energy deposition in the $x$-axis direction of the compressed image captured by the streak camera is the same as that of the time-unsheared image captured by the external CCD camera after encoding. Equation (14) can be rewritten as

$${\textbf{T}_x}{\boldsymbol{E}} = {\textbf{T}_x}\textbf{C}{\boldsymbol{E}}_{ccd}.$$

Equation (15) applies when there is no noise and the two images match exactly, but in practice, it is difficult for the two images to match perfectly due to rotations and shifts as well as inconsistent pixel sizes between the cameras. We can perform affine transformation on the time-unsheared image ${\boldsymbol{E}}_{ccd}$ collected by the external CCD camera, so as to achieve the following purpose,

$${\hat {{\boldsymbol{E}}_{ccd}}} = \mathop {\arg \min }_{{{\boldsymbol{E}}_{ccd}}} \left\| {{\textbf{T}_x}\textbf{C}{{\boldsymbol{E}}_{ccd}} - {\textbf{T}_x}{\boldsymbol{E}}} \right\|_2^2.$$

However, transforming the images without damage to the image features is also a challenging problem. Fortunately, the STN network [32,36] provides us with ideas.

 figure: Fig. 4.

Fig. 4. (a) Process of ${\textbf {T}_x}\textbf {TSC}\boldsymbol {I}(x,y,t)$. (b) Process of ${\textbf {T}_x}\textbf {CT}\boldsymbol {I}(x,y,t)$.

Download Full Size | PDF

In linear algebra, operations such as translation, scaling, and rotation of images can be represented by matrix operations

$$\left[ \begin{array}{l} x^{\prime}\\ y^{\prime} \end{array} \right] = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right]\left[ \begin{array}{l} x\\ y \end{array} \right] + \left[ \begin{array}{l} e\\ f \end{array} \right]$$
where $x$ and $y$ represents the pixel index before affine transformation, $x^{\prime }$ and $y^{\prime }$ represents the pixel index after affine transformation, $a,b,c,d$ represents the scaling and rotation parameters of affine transformation, and $e,f$ represents the translation parameters of affine transformation.

STN [32] is a spatial transformation network which enables explicitly spatial transformation operations on images. STN network is divided into three parts. The first part is localization network which collects the feature map of input and generates the parameters of the spatial transformations through the hidden layer, that is, a, b, c, d, e, f. The second part is grid generator, which constructs a sampling network based on the predicted affine transformation parameters, and calculates the position of output pixels from each pixel in the input image by Eq. (17), which is essentially a coordinate mapping. Obviously, the pixel coordinate index obtained in this part, that is, $x^{\prime }$ and $y^{\prime }$, are not necessarily all integers. Then the third part is sampler which fills each pixel in the image according to the sampling network obtained in the second part (using bilinear interpolation, which takes into account the values of neighbouring pixels.).

STN network is initially designed as a component of supervised learning to improve the generalization of the model, aiming to improve the spatial invariance of CNN. By using STN, differences of the image location have little effect on subsequent operations, so that the tasks such as image classification can be better realized. In this context, we take it as a part of the untrained neural network, utilizing its ability to spatially transform images without changing image features, using ${\boldsymbol{E}}_{ccd}$ as the input of the network and Eq. (16) as the loss function. The network learns the affine transformation parameters between the time-unsheared image and the compressed image in an unsupervised manner and performs various affine transformation operations on the image to achieve the purpose of image registration. Our algorithm is called time-unsheared image constraint unsupervised image registration (TUICU-IR) algorithm, and the flowchart of TUICU-IR is shown in Fig. 5.

 figure: Fig. 5.

Fig. 5. Flowchart of the TUICU-IR algorithm. Loss: Loss Function, $\Psi ( {\boldsymbol{Z}})$: the image obtained after spatiotemporal integration, $\Phi ( {\boldsymbol{Z}})$: the image obtained after coding, shearing and spatiotemporal integration, $\textbf {T}_x$: integration along the $x$-axis, ${\boldsymbol{E}}_{ccd}$: image collected by the external CCD camera, ${\boldsymbol{E}}$: image collected by the streak camera.

Download Full Size | PDF

In general, the loss function is expressed as the mean square error (MSE), which is the form of L2-norm. However, the utilization of this form is based on the assumption that the data errors are Gaussian distributed and linear, under which the MSE can be derived by incorporating the probability density function into the maximum likelihood estimation (MLE). However, when there are large outliers, the effectiveness of the algorithm based on MSE loss will deteriorate significantly. Correntropy is a nonlinear measure of local similarity [37], and here we employ a loss function that is robust to large outliers called the correntropy-induced loss function (CLF) [38]. The empirical CLF between two samples A and B can be calculated as

$$CLF(A,B) = \beta \left[ {1 - \frac{1}{N}\sum\nolimits_{i = 1}^N {{\kappa _\sigma }({a_i},{b_i})} } \right],$$
where $a$ and $b$ denotes the elements in $A$ and $B$, $\kappa _\sigma$ represents kernel function that satisfies Mercer’s Theorem [39]. The Gaussian kernel is the most commonly-used kernel for correntropy [37,38]. It is defined by
$${\kappa _\sigma }({a_i},{b_i}) = exp \left[ { - \frac{{{{({a_i} - {b_i})}^2}}}{{2{\sigma ^2}}}} \right],$$
where $\sigma$ denotes the kernel bandwidth. Due to the steps of $\textbf {TSC}$, Gaussian noise imposed on compressed images can no longer be simply treated as Gaussian noise when restored to the dynamic scene. The value of CLF is between [0, $\beta$], and CLF can play an important role in the case of non-Gaussian noise.

In addition, it can be seen from Fig. 6(a) that there are variations in the quantity of frames superimposed on each column in the compressed image collected by the streak camera. The number of frames superimposed on the regions a and c is less than the number of frames superimposed on the region b. Therefore, when we calculate the first term of the loss function in Eq. (11), we also assign different weights according to the column units. The weight is proportional to the number of frames superimposed. The schematic representation of the relationship between the relative weights and the column index is observed in Fig. 6(b).

 figure: Fig. 6.

Fig. 6. (a) The projection of the frames along the $t$-axis direction. (b) The relationship between the relative weights and the column index in our algorithm.

Download Full Size | PDF

3. Results and discussion

3.1 Comparison with state-of-the-art methods

The proposed TUICU-IR algorithm is implemented in python, employing the Adam optimizer [40] with a learning rate of 0.015 and the iterative optimization times are set to 2000 on the computer with Intel Core i5-12400K CPU and NVIDIA Geforce GTX 3090 GPU. In addition, we adopt CLF as the loss function and $\beta$=1. We have chosen four motion scenes from the public datasets, Drop and Aerial, each containing 16 consecutive images with 256*256 pixels, Runner and Crash, each containing 24 consecutive images with 256*256 pixels. In addition, during our simulation procedure, we set the transmittance of the coding plate as 0.8, and the transmittance of the opaque region as 0.2 in order to be closer to the actual situation. Inspired by [41], we designed a coding plate with a sampling rate of 30%, indicating that 30% of the plate area is transparent and 70% is opaque. The compressed image is obtained through a forward process as illustrated in Fig. 3. Specifically, each image is encoded by the same pseudo-random binary mask with elements {0.2, 0.8}. Subsequently, every encoded image of the dataset is horizontally displaced by 1 pixel relative to the previous image, emulating the shearing operation of the streak camera. Finally, the projection integration is performed along the $t$-axis direction and the 2D measurement of the streak camera can be acquired. The external CCD measurement can be acquired with only the projection integration step. In the simulation procedure, we also use GAP-TV [12], FFDNet [42], FastDVDnet [15] and DeSCI [14] for comparison. It is worth mentioning that only our proposed algorithm uses two images, while all other algorithms use the streak camera image only. For the parameters of these algorithms, we take the default values in the literature. Peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) [43] are used as metrics to evaluate the quality of 3D data cube reconstruction. Table 1 presents the reconstruction results on the four datasets without any noise, and the best results are highlighted in bold font. It can be seen that our algorithm outperforms all other CUP reconstruction algorithms in terms of PSNR and SSIM.

Tables Icon

Table 1. Average PSNR (dB) and SSIM on the datasets by different methods (noiseless).

We selected a single frame from each of the four datasets to evaluate the performance of different algorithms. Figure 7 displays the reconstruction results by different algorithms, together with the ground truths for comparison. From the presented results, GAP-TV, FFDNet, and DeSCI lost many image details, and generate various degrees of artifacts and blurriness. FFDNet even recovers some spatial structures that do not exist. For FastDVDnet, although the overall reconstruction quality appears to be satisfactory, from the enlarged image blocks in our red box, it can be seen that this is attributed to the fact that FastDVDnet is too smooth for the image processing, resulting in a subpar performance in capturing fine details. Our proposed algorithm is able to recover most of the spatial details, and the slight blurring in some local areas is due to the excessive number of reconstructed frames.

 figure: Fig. 7.

Fig. 7. Reconstruction results by different algorithms, together with the ground truths for comparison. The sub-image at the bottom right corner is the enlarged area in the corresponding red box.

Download Full Size | PDF

3.2 Image registration analysis

We construct four different modes of unregistered situations, named M1, M2, M3, M4, considering pixel size mismatch, pixel size mismatch + rotation, pixel size mismatch + pixel shifting and pixel size mismatch + rotation + pixel shifting in the real scenario respectively. The unregistered cases were all performed on the time-unsheared image collected by the external CCD image. M1 mode refers to the reduction of the time-unsheared image to 0.9375 times its original size using the "imresize" function in MATLAB. M2 mode indicates a 3$^{\circ }$ rotation on the basis of M1, which is realized by the "imrotate" function in MATLAB. M3 mode indicates a 3-pixel offset from M1. M4 mode indicates a 1$^{\circ }$ rotation on the basis of M3. Taking the Drop dataset as an example, Fig. 8 displays the effect of these mismatching modes.

 figure: Fig. 8.

Fig. 8. The effect of 4 different mismatching modes on the time-unsheared image, together with the original image.

Download Full Size | PDF

Figure 9 displays the PSNR between the original time-unsheared image and the results registered by our image registration algorithm on these 4 different modes. Additionally, the comparison is conducted under several different noise conditions, including no noise, $\sigma _0$ = 5, 10, and 20 additive white Gaussian noise (AWGN), Poission noise and Poission noise with $\sigma _0$ = 5 AWGN respectively. It is worth noting that the same noise level is added to the time-unsheared image and the compressed image in each simulation. The horizontal lines in each graph indicate the PSNR between the original time-unsheared image and the noisy time-unsheared image with different noise conditions respectively. It is evident that when $\sigma _0$ = 10 or 20, PSNR of most images registered by the algorithm is higher than that of images adding AWGN with the same noise level directly in all modes. For the other noise conditions, PSNR of some images registered by the algorithm is still higher than that of images with the same noise level directly. This provides sufficient evidence to demonstrate the effectiveness of the image registration algorithm. Figure 10 shows the comparison of the Drop dataset before and after being registered with $\sigma _0$=10 AWGN in mismatching mode 2, together with the original time-unsheared image and time-unsheared image adding $\sigma _0$ = 10 AWGN directly for comparison.

 figure: Fig. 9.

Fig. 9. The registration effect of our image registration algorithm on 4 modes with different noise levels. The horizontal line represents the PSNR between the noisy image and the ground truth. (a) Aerial, (b) Crash, (c) Drop, (d) Runner.

Download Full Size | PDF

 figure: Fig. 10.

Fig. 10. From left to right is the original time-unsheared image, the time-unsheared image with $\sigma _0$ = 10 AWGN before registration in mismatching mode 2, the time-unsheared image with $\sigma _0$ = 10 AWGN after registration in mismatching mode 2, and the time-unsheared image with $\sigma _0$ = 10 AWGN.

Download Full Size | PDF

3.3 Noise robustness analysis

In order to be closer to the actual situation, we also take into account the presence of noise in the image reconstruction process. AWGN with noise levels of $\sigma _0$ = 5, 10, 20, Poission noise and Poission noise with $\sigma _0$ = 5 AWGN is added to both the compressed image obtained by the streak camera and the time-unsheared image obtained by the external CCD camera simultaneously. It is worth noting that the noise level of both images is identical which implies that if the noise with $\sigma _0$ =10 is added to ${\boldsymbol{E}}$ in Eq. (1), then the noise with the same noise level will be added to ${\boldsymbol{{{E}}}}_{ccd}$ in Eq. (3). In order to demonstrate the effectiveness of our image registration algorithm in the reconstruction process, we also take the 4 unregistered modes into account. The iteration count of our algorithm is set to 1500. Table 2 presents the results of GAP-TV, FFDNet, FastDVDnet, DeSCI and our proposed TUICU (undamaged time-unsheared image) and TUICU-IR (four unregistered modes) algorithms with different noise conditions on the four datasets. It is evident that the reconstruction results of TUICU-IR have little difference or even small superiority compared to TUICU that with undamaged time-unsheared image. In addition, the proposed algorithms are more robust to noise, while reconstruction accuracy of other algorithms decreases rapidly with increasing noise level.

Tables Icon

Table 2. Average PSNR and SSIM on the datasets by different methods (with noise). Undamaged: using undamaged time-unsheared image and using the proposed TUICU algorithm. M1/M2/M3/M4: using time-unsheared image on the condition of mismatched modes and using the proposed TUICU-IR algorithm. P&G: Poisson noise and additive white Gaussian noise ($\sigma _0$=5).

It is worth mentioning that the results of FFDNet and FastDVDnet are obtained under the carefully adjusted parameters. We select 2 frames each from the reconstruction results of Crash and Aerial datasets with noise level $\sigma _0$ = 5 and compare the performance of different algorithms, including the TUICU-IR algorithm with image registration, together with the ground truth for comparison. Figure 11 shows the reconstruction results of different algorithms with noise level $\sigma _0$ = 5. It is evident that even with unregistered ${\boldsymbol{{{E}}}}_{ccd}$, the reconstruction results of the TUICU-IR algorithm with the registration step show little difference compared to the TUICU algorithm, and may even reconstruct more details than the TUICU algorithm that using undamaged ${\boldsymbol{{{E}}}}_{ccd}$ (Crash#19 and Aerial#12). In summary, our proposed algorithm has higher reconstruction accuracy and stronger robustness to various kinds of noise compared with traditional algorithms.

 figure: Fig. 11.

Fig. 11. Reconstruction results of dataset Aerial and Crash by different algorithms with noise level $\sigma _0$ = 5, together with the ground truths for comparison. Undamaged: using undamaged time-unsheared image and using TUICU algorithm, M1/M2: using time-unsheared image with mismatched modes and using TUICU-IR algorithm. The sub-image at the bottom right corner is the enlarged area in the corresponding red box.

Download Full Size | PDF

3.4 Ablation study

We perform several different ablation studies. Firstly, we evaluate the impact of the image registration algorithm on CS reconstruction results. Specifically, we test the results that the unregistered time-unsheared image collected by the external CCD is directly applied to our TUICU algorithm without registration, even if this destruction is only a rotation of 1$^{\circ }$. As shown in Fig. 12, the distinction is not obvious between rotation of 1$^{\circ }$ or not visually. We apply the TUICU algorithm directly with the unregistered time-unsheared image, and the results are shown in Table 3, together with the results of TUICU algorithm when $\rho$ = 0 in Eq. (11), which refers to the unsupervised CUP reconstruction algorithm without external CCD image constraints.

 figure: Fig. 12.

Fig. 12. The effect of rotating 1$^{\circ }$ on dataset Drop and Runner, the error map is placed on the right.

Download Full Size | PDF

Tables Icon

Table 3. Average PSNR (dB) and SSIM of the datasets by TUICU with different noise levels on condition of unregistered time-unsheared image or non-usage of time-unsheared image, (’w/o’ means without).

It can be seen from Table 3 that the direct application of the TUICU algorithm on the unregistered time-unsheared image may even not be as good as the results without the time-unsheared image constraints. As shown in Fig. 13, we select a single frame from the reconstruction results of Drop and Runner datasets respectively under the condition of noiseless, together with the ground truths and the error map. It is evident that without the registration step, the reconstruction results are prone to produce more errors in the areas that the time-unsheared image not matching the actual data, leading to a quick reduction of SSIM value.

 figure: Fig. 13.

Fig. 13. Reconstruction results (noiseless) of dataset Runner and Drop by TUICU algorithm on condition of unregistered time-unsheared image or non-usage of time-unsheared image, together with the ground truth and the error map, (’w/o’ means without).

Download Full Size | PDF

We also evaluate the reconstruction results by MSE instead of CLF as the loss function. Table 4 displays the reconstruction results of TUICU-IR algorithm using MSE as the loss function with AWGN $\sigma _0$ = 10 and 20 in M1 mode. It is worth noting that the CLF is still being utilized as the loss function in the image registration step, otherwise the registration operation may fail.

Tables Icon

Table 4. Average PSNR (dB) and SSIM of the datasets by TUICU-IR algorithm using MSE as the loss function in M1 mode (with noise level $\sigma _0$=10 and 20).

Compared with Table 2, we can find that the PSNR (SSIM) of the reconstruction results using CLF as the loss function with noise level $\sigma _0$ = 10 and 20 has an improvement of 0.17dB (0.0239), 0.24dB (0.0578) respectively relative to the reconstruction results using MSE as the loss function.

In addition, as shown in Table 5, we evaluate the reconstruction results by using the general compressed image loss function instead of our proposed weighted compressed image loss function of Fig. 6(b) on the condition of noise level $\sigma _0$ =10 and 20 in M1 mode.

Tables Icon

Table 5. Average PSNR (dB) and SSIM of the datasets by TUICU-IR algorithm using the general compressed image loss function instead of the weighted compressed image loss function in M1 mode (with noise level $\sigma _0$=10 and 20).

Compared with Table 2, we can find that the PSNR (SSIM) of the reconstruction results using our proposed weighted compressed image loss function with noise level $\sigma _0$ = 10 and 20 has an improvement of 0.17dB (0.0239), 0.24dB (0.0578) respectively relative to the reconstruction results using the general compressed image loss function.

3.5 Experiment

In order to verify the reliability of the proposed algorithm in practical experiments, the nanosecond laser pulse illuminating on our manually created "E" is imaged. The experimental system is shown in Fig. 1. The laser pulse is divided into two beams by a beam splitter, with one beam directly towards the streak camera and the other towards the external CCD camera. We successfully measure the spatiotemporal intensity evolution of the nanosecond laser pulse by combining our proposed TUICU-IR algorithm with a time-unsheared image constraint CUP system.

The images collected by the two cameras are shown in Fig. 14. There is an obvious mismatch between the two images. As shown in Fig. 14, the correlation coefficient between the two vectors obtained by integrating along the $x$-axis direction is 0.987.

 figure: Fig. 14.

Fig. 14. The registration effect of the proposed image registration algorithm in experiment.

Download Full Size | PDF

The laser with a full width at half maximum (FWHM) of 15 ns is employed, and 43 frames are reconstructed with an interval of 0.89ns each frame. The movie of the results reconstructed by our proposed algorithm is available in Visualization 1. We selected a representative frame at intervals of 5 frames, with a frame spacing of 4.45 ns. Figure 15(a) shows the comparison of the reconstruction results by different methods. It can be seen that our algorithm is able to well demonstrates the spatio-temporal evolution of the laser pulse, while the other algorithms exhibit noticeable offset traces and lack apparent spatio-temporal intensity evolution. Figure 15(b) displays the relationship between normalized intensity and time for the reconstructed images by different methods. The FWHM values obtained from the curves of GAP-TV, FFDNet, FastDVDnet and our proposed TUICU-IR algorithms are 22.36 ns, 21.46 ns, 22.73 ns and 15.03ns respectively, which indicates that the FWHM of our proposed algorithm is closest to the real situation.

 figure: Fig. 15.

Fig. 15. (a)Comparison of the reconstruction results by different methods. (b)The relationship between normalized intensities and time for reconstruction results by different methods.

Download Full Size | PDF

4. Conclusion

We propose a time-unsheared image constraint unsupervised DL image registration algorithm for CUP. The proposed method can register the mismatched time-unsheared image with the compressed image, then the registered time-unsheared image and the compressed image are integrated into an end-to-end unsupervised DL framework that can obtain the reconstruction of a single compressed image by using the CNN model as the image prior. Extensive simulations and experiments demonstrate the superior performance of the proposed method compared with the widely used CUP reconstruction algorithms and the method has demonstrated strong robustness to noise. Furthermore, more prior information can be integrated into the proposed unsupervised DL framework to improve the reconstruction accuracy, thus providing a better platform for CUP reconstruction of ultrafast events.

Funding

National Natural Science Foundation of China (11975184).

Disclosures

The authors declare no conflict of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. Gao, J. Liang, C. Li, et al., “Single-shot compressed ultrafast photography at one hundred billion frames per second,” Nature 516(7529), 74–77 (2014). [CrossRef]  

2. D. Qi, S. Zhang, and C. Yang, “Single-shot compressed ultrafast photography: a review,” Adv. Photonics 2(01), 1 (2020). [CrossRef]  

3. P. Wang, J. Liang, and L. V. Wang, “Single-shot ultrafast imaging attaining 70 trillion frames per second,” Nat. Commun. 11(1), 2091 (2020). [CrossRef]  

4. J. Liang, L. Zhu, and L. V. Wang, “Single-shot real-time femtosecond imaging of temporal focusing,” Light: Sci. Appl. 7(1), 42–475 (2018). [CrossRef]  

5. I. Orovic, V. Papic, and C. Ioana, “Compressive sensing in signal processing: Algorithms and transform domain formulations,” Math Probl. Eng. 2016, 1–16 (2016). [CrossRef]  

6. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory 52(2), 489–509 (2006). [CrossRef]  

7. J. M. Bioucas-Dias and M. A. T. Figueiredo, “A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. on Image Process. 16(12), 2992–3004 (2007). [CrossRef]  

8. J. Liang, C. Ma, and L. Zhu, “Single-shot real-time video recording of a photonic mach cone induced by a scattered light pulse,” Sci. Adv. 3(1), e1601814 (2017). [CrossRef]  

9. Z. M. Yao, L. Sheng, and Y. Song, “Dual-channel compressed ultrafast photography for z-pinch dynamic imaging,” Rev. Sci. Instrum. 94(3), 035106 (2023). [CrossRef]  

10. J. Yao, D. Qi, and C. Yang, “Multichannel-coupled compressed ultrafast photography,” J. Opt. 22(8), 085701 (2020). [CrossRef]  

11. C. Yang, D. Qi, and X. Wang, “Optimizing codes for compressed ultrafast photography by the genetic algorithm,” Optica 5(2), 147–151 (2018). [CrossRef]  

12. X. Yuan, “Generalized alternating projection based total variation minimization for compressive sensing,” in IEEE International Conference on Image Processing, (2016), pp. 2539–2543.

13. K. Dabov, A. Foi, V. Katkovnik, et al., “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Trans. on Image Process. 16(8), 2080–2095 (2007). [CrossRef]  

14. Y. Liu, X. Yuan, and J. Suo, “Rank minimization for snapshot compressive imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2990–3006 (2019). [CrossRef]  

15. X. Yuan, Y. Liu, and J. Suo, “Plug-and-play algorithms for video snapshot compressive imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 44(10), 7093–7111 (2022). [CrossRef]  

16. C. Z. Jin, D. Qi, and J. Yao, “Weighted multi-scale denoising via adaptive multi-channel fusion for compressed ultrafast photography,” Opt. Express 30(17), 31157–31170 (2022). [CrossRef]  

17. L. Zhu, Y. Chen, and J. Liang, “Space- and intensity-constrained reconstruction for compressed ultrafast photography,” Optica 3(7), 694–697 (2016). [CrossRef]  

18. A. Zhang, J. Wu, and J. Suo, “Single-shot compressed ultrafast photography based on u-net network,” Opt. Express 28(26), 39299–39310 (2020). [CrossRef]  

19. Y. Ma, X. Feng, and L. Gao, “Deep-learning-based image reconstruction for compressed ultrafast photography,” Opt. Lett. 45(16), 4400–4403 (2020). [CrossRef]  

20. Z. Wu, J. Zhang, and C. Mou, “Dense deep unfolding network with 3d-cnn prior for snapshot compressive imaging,” in IEEE International Conference on Computer Vision (ICCV), (2021).

21. L. Wang, M. Cao, and X. Yuan, “Efficientsci: Densely connected network with space-time factorization for large-scale video snapshot compressive imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), pp. 18477–18486.

22. C. Yang, S. Zhang, and X. Yuan, “Ensemble learning priors driven deep unfolding for scalable video snapshot compressive imaging,” in IEEE European Conference on Computer Vision (ECCV), (2022).

23. Y. Li, M. Qi, R. Gulve, et al., “End-to-end video compressive sensing using anderson-accelerated unrolled networks,” in 2020 IEEE international conference on computational photography (ICCP), (2020), pp. 1–12.

24. P. Llull, X. Liao, and X. Yuan, “Coded aperture compressive temporal imaging,” Opt. Express 21(9), 10526–10545 (2013). [CrossRef]  

25. D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Deep image prior,” Int. J. Comput. Vis. 128(7), 1867–1888 (2020). [CrossRef]  

26. Z. Meng, Z. Yu, K. Xu, et al., “Self-supervised neural networks for spectral snapshot compressive imaging,” in IEEE/CVF International Conference on Computer Vision, (2021), pp. 2602–2611.

27. Y. He, Y. Yao, and Y. He, “Untrained neural network enhances the resolution of structured illumination microscopy under strong background and noise levels,” Adv. Photonics Nexus 2(04), 046005 (2023). [CrossRef]  

28. H. Lee, K. Sohn, and D. Min, “Unsupervised low-light image enhancement using bright channel prior,” IEEE Signal Process. Lett. 27, 251–255 (2020). [CrossRef]  

29. F. Wang, Y. Bian, and H. Wang, “Phase imaging with an untrained neural network,” Light, Science & Applications 9(1), 77 (2020). [CrossRef]  

30. A. Qayyum, I. Ilahi, F. Shamshad, et al., “Untrained neural network priors for inverse imaging problems: A survey,” IEEE Trans. Pattern Anal. Mach. Intell. 45(5), 6511–6536 (2022). [CrossRef]  

31. T. Yokota, H. Hontani, Q. Zhao, et al., “Manifold modeling in embedded space: An interpretable alternative to deep image prior,” IEEE Trans. Neural Netw. Learning Syst. 33(3), 1022–1036 (2022). [CrossRef]  

32. M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” arXiv, arXiv:abs1506.02025 (2015). [CrossRef]  

33. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1-4), 259–268 (1992). [CrossRef]  

34. J. Liu, Y. Sun, X. Xu, et al., “Image restoration using total variation regularized deep image prior,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2018), pp. 7715–7719.

35. M. Qiao, X. Liu, and X. Yuan, “Snapshot temporal compressive microscopy using an iterative algorithm with untrained neural networks,” Opt. Lett. 46(8), 1888–1891 (2021). [CrossRef]  

36. J. Nie, L. Zhang, W. Wei, et al., “Unsupervised deep hyperspectral super-resolution with unregistered images,” in 2020 IEEE International Conference on Multimedia and Expo (ICME), (2020), pp. 1–6.

37. W. Liu, P. P. Pokharel, and J. C. Príncipe, “Correntropy: Properties and applications in non-gaussian signal processing,” IEEE Trans. Signal Process. 55(11), 5286–5298 (2007). [CrossRef]  

38. L. Chen, H. Qu, and J. hong Zhao, “Efficient and robust deep learning with correntropy-induced loss function,” Neural Comput. Appl. 27(4), 1019–1031 (2016). [CrossRef]  

39. V. N. Vapni, “The nature of statistical learning theory,” in Springer, (1995).

40. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv abs/1412.6980 (2014). [CrossRef]  

41. M. Zhao and S. Jalali, “Theoretical analysis of binary masks in snapshot compressive imaging systems,” in 2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton), (2023), pp. 1–8.

42. X. Yuan, Y. Liu, J. Suo, et al., “Plug-and-play algorithms for large-scale snapshot compressive imaging,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 1444–1454.

43. Z. Wang, A. C. Bovik, H. R. Sheikh, et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]  

Supplementary Material (1)

NameDescription
Visualization 1       The movie of the results reconstructed by our proposed algorithm in the experiment.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (15)

Fig. 1.
Fig. 1. The equipment diagram of the CCD camera constraint CUP system, CCD: charge-coupled device, MCP: micro-channel plate.
Fig. 2.
Fig. 2. Network architecture of the autoencoder, $W_1$, $W_2$, $W_3$, $W_4$: parameters of every two layers, $X_1$, $X_2$, $X_3$, $X_4$, $X_5$: the elements of input, $X_{1}^{\prime }$, $X_2^{\prime }$, $X_3^{\prime }$, $X_4^{\prime }$, $X_5^{\prime }$: the elements of output.
Fig. 3.
Fig. 3. The schematic diagram of the streak camera in CUP system, $\textbf {C}$: coding, $\textbf {S}$: shearing, $\textbf {T}$: spatiotemporal integration.
Fig. 4.
Fig. 4. (a) Process of ${\textbf {T}_x}\textbf {TSC}\boldsymbol {I}(x,y,t)$. (b) Process of ${\textbf {T}_x}\textbf {CT}\boldsymbol {I}(x,y,t)$.
Fig. 5.
Fig. 5. Flowchart of the TUICU-IR algorithm. Loss: Loss Function, $\Psi ( {\boldsymbol{Z}})$: the image obtained after spatiotemporal integration, $\Phi ( {\boldsymbol{Z}})$: the image obtained after coding, shearing and spatiotemporal integration, $\textbf {T}_x$: integration along the $x$-axis, ${\boldsymbol{E}}_{ccd}$: image collected by the external CCD camera, ${\boldsymbol{E}}$: image collected by the streak camera.
Fig. 6.
Fig. 6. (a) The projection of the frames along the $t$-axis direction. (b) The relationship between the relative weights and the column index in our algorithm.
Fig. 7.
Fig. 7. Reconstruction results by different algorithms, together with the ground truths for comparison. The sub-image at the bottom right corner is the enlarged area in the corresponding red box.
Fig. 8.
Fig. 8. The effect of 4 different mismatching modes on the time-unsheared image, together with the original image.
Fig. 9.
Fig. 9. The registration effect of our image registration algorithm on 4 modes with different noise levels. The horizontal line represents the PSNR between the noisy image and the ground truth. (a) Aerial, (b) Crash, (c) Drop, (d) Runner.
Fig. 10.
Fig. 10. From left to right is the original time-unsheared image, the time-unsheared image with $\sigma _0$ = 10 AWGN before registration in mismatching mode 2, the time-unsheared image with $\sigma _0$ = 10 AWGN after registration in mismatching mode 2, and the time-unsheared image with $\sigma _0$ = 10 AWGN.
Fig. 11.
Fig. 11. Reconstruction results of dataset Aerial and Crash by different algorithms with noise level $\sigma _0$ = 5, together with the ground truths for comparison. Undamaged: using undamaged time-unsheared image and using TUICU algorithm, M1/M2: using time-unsheared image with mismatched modes and using TUICU-IR algorithm. The sub-image at the bottom right corner is the enlarged area in the corresponding red box.
Fig. 12.
Fig. 12. The effect of rotating 1$^{\circ }$ on dataset Drop and Runner, the error map is placed on the right.
Fig. 13.
Fig. 13. Reconstruction results (noiseless) of dataset Runner and Drop by TUICU algorithm on condition of unregistered time-unsheared image or non-usage of time-unsheared image, together with the ground truth and the error map, (’w/o’ means without).
Fig. 14.
Fig. 14. The registration effect of the proposed image registration algorithm in experiment.
Fig. 15.
Fig. 15. (a)Comparison of the reconstruction results by different methods. (b)The relationship between normalized intensities and time for reconstruction results by different methods.

Tables (5)

Tables Icon

Table 1. Average PSNR (dB) and SSIM on the datasets by different methods (noiseless).

Tables Icon

Table 2. Average PSNR and SSIM on the datasets by different methods (with noise). Undamaged: using undamaged time-unsheared image and using the proposed TUICU algorithm. M1/M2/M3/M4: using time-unsheared image on the condition of mismatched modes and using the proposed TUICU-IR algorithm. P&G: Poisson noise and additive white Gaussian noise ( σ 0 =5).

Tables Icon

Table 3. Average PSNR (dB) and SSIM of the datasets by TUICU with different noise levels on condition of unregistered time-unsheared image or non-usage of time-unsheared image, (’w/o’ means without).

Tables Icon

Table 4. Average PSNR (dB) and SSIM of the datasets by TUICU-IR algorithm using MSE as the loss function in M1 mode (with noise level σ 0 =10 and 20).

Tables Icon

Table 5. Average PSNR (dB) and SSIM of the datasets by TUICU-IR algorithm using the general compressed image loss function instead of the weighted compressed image loss function in M1 mode (with noise level σ 0 =10 and 20).

Equations (19)

Equations on this page are rendered with MathJax. Learn more.

E ( x , y ) = TSC I ( x , y , t ) + n ,
E ( x , y ) = O I ( x , y , t ) + n ,
E c c d ( x , y ) = T I ( x , y , t ) + n ,
I ^ = arg min I E O I 2 2 + λ R ( I ) ,
I ^ = arg min I E O I 2 2 + ρ E c c d T I 2 2 + λ R ( I ) ,
T V 3 D ( x ) = P ( D h x ) ( P ) 2 + ( D v x ) ( P ) 2 + ( D t x ) ( P ) 2 ,
Θ ^ = arg min Θ L o s s ( f Θ ( y ) , x ) ,
Θ ^ = arg min Θ E ( f Θ ( Z ) , x 0 ) , s . t . x ^ = f Θ ^ ( Z ) ,
Θ ^ = arg min Θ E O f Θ ( Z ) 2 2 , s . t . I ^ = f Θ ^ ( Z ) .
Θ ^ = arg min Θ ( E O f Θ ( Z ) 2 2 + ρ E c c d T f Θ ( Z ) 2 2 + λ R ( f Θ ( Z ) ) ) , s . t . I ^ = f Θ ^ ( Z ) .
Θ ^ = arg min Θ ( E O f Θ ( Z ) 2 2 + ρ E c c d T f Θ ( Z ) 2 2 + λ T V 3 D ( f Θ ( Z ) ) ) , s . t . I ^ = f Θ ^ ( Z ) .
T x TSC I ( x , y , t ) = T x TC I ( x , y , t ) .
TC I ( x , y , t ) = CT I ( x , y , t ) .
T x TSC I ( x , y , t ) = T x CT I ( x , y , t ) .
T x E = T x C E c c d .
E c c d ^ = arg min E c c d T x C E c c d T x E 2 2 .
[ x y ] = [ a b c d ] [ x y ] + [ e f ]
C L F ( A , B ) = β [ 1 1 N i = 1 N κ σ ( a i , b i ) ] ,
κ σ ( a i , b i ) = e x p [ ( a i b i ) 2 2 σ 2 ] ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.