Multi-phase FZA lensless imaging via diffusion model

Wenbo Wan; Huihui Ma; Zijie Mei; Huilin Zhou; Yuhao Wang; Yuhao Wang; Qiegen Liu

doi:10.1364/OE.490140

1. Introduction

Traditional optical systems achieve point-to-point imaging from the objects to the image plane through bulky and expensive optical lenses [1]. As an emerging imaging technology, the lensless imaging systems replace the lenses with a light-weight planar encoding device, or directly places the target on the surface of the detector, which can effectively reduce the system size and cost. Lensless imaging reduces the complexity of optical system design by transforming aberration optimization problems of lenses into computational imaging problems [2]. In the past few years, a variety of coherent lensless systems have been proposed, such as coherent diffraction imaging [3,4] and on-chip microscopy [5,6]. However, the imaging quality of these systems is closely dependent on the coherence of the light source, which greatly limits the application of imaging systems.

As a common light source in life, incoherent light can effectively avoid problems such as speckle noise caused by optical components in coherent light imaging, and can also cause interference and holographic imaging of objects through the design of the optical path, so that holographic technology has been applied in more fields. Incoherent lensless imaging technology originated in the early 1960s. Incoherent correlation holography technology, proposed by Mertz et al. [7], overcomes the limitations of using coherent light sources in the holographic recording process. It has been widely adopted in various fields and has become a hot research topic [8–12]. Rosen et al. proposed a Fresnel Incoherent Correlation Holography in 2007 [13]. This technology utilizes the correlation between the object light wave information and the Fresnel zone aperture (FZA) to record incoherent holograms. The use of incoherent light can expand the application of holographic technology to more fields.

Since the image sensor is only sensitive to the intensity of incident light, the phase information is easily lost during the image acquisition process, which will cause the twin image of the object to appear in the reconstructed image [14,15]. The traditional back propagation (BP) reconstruction algorithm is disturbed by twin image and has poor reconstruction quality. Over the past few decades, researchers have tried various ways to eliminate twin image [16]. Traditional methods for removing twin image include off-axis holography, phase shift, and compression sensing (CS). Wu et al. [2] used single-phase FZA mask plate for encoding under incoherent light illumination, and proposed an iterative reconstruction method based on CS, which is an embedded hologram-based view that leverages compressed sensing to remove the undesirable effects of twin images. However, the reconstruction quality of CS-based reconstruction methods is affected by calibration errors as well as manually selected parameters. Ma et al. [17] propose an explicit-restriction convolutional framework for lensless imaging, by tailoring its framework to specific factors, better perceived image quality can be achieved. This proposed framework can be extended to lensless imaging systems with different masks or structures. Moreover, it is difficult to obtain sufficient information from the encoding method of a single FZA phase, and the independent reconstruction strategy of each color channel is difficult to ensure the color fidelity.

In order to solve the problems of the above methods, Shimano et al. [18] acquired results using at least four coded images of band plates with different phases to achieve the reconstruction of non-twinning images [19,20]. Nakamura et al. arranged four FZAs with different phases on one mask to acquire multiple encoded images with different phases to eliminate the influence of twin images brought by coaxial holograms through color-channel synthesis [21]. However, their approach is to synthesize the modulation of multiple phases into one point-spread function, which does not effectively use the multiphase encoded information. Chen et al. [22] proposed the linear least-mean-square-error (LMSE), which based on wave optics to achieve super-resolution reconstruction by synthesizing high-resolution wavefront propagation functions. However, the multiphase FZA coding of this system is implemented by a phase-modulated spatial light modulator, which greatly increases the cost of lensless imaging systems. Although these two multiphase FZA lensless imaging have better image reconstruction quality, obtaining efficient high-dimensional image gradient distribution prior information and constraining the reconstruction process can further improve the quality of image reconstruction.

Deep learning technology provides new ideas for computational optical imaging. Wu et al. [23] present a prototype of a lensless camera that uses a deep neural network (DNN) to realize rapid reconstruction for FZA imaging. The network is designed with an error feedback mechanism, which realizes the self-correction of features to recover the image detail. Compared with traditional ADMMs, this method has comparable reconstruction quality, and the reconstruction time is two orders of magnitude faster. In recent years, deep generative learning models [24–27] based on score matching generative models [24,25] and Denoising Diffusion Probabilistic Model (DDPM) [26,27] have attracted widespread interest, which can generate high-quality samples without adversarial training. Song et al. [24,28] proposed a new model framework that incorporates generalized discrete fraction matching methods and diffusion models into the same framework. The model constructs the observed samples through the probability density function, and shows excellent performance in modeling the data distribution using known input invariance. Modeled with gradients of logarithmic probability densities, there is no need to normalize constants and learn directly using score matching [18]. The score model disturbs the data distribution by adding random noise in the forward process. To sample from the data distribution, a neural network can be trained to estimate the gradient of the logarithmic data distribution, i.e., the score function. The observed samples are then modeled by traversing the gradient of the target probability density function by Langevin dynamics [24]. The generative model maximizes the similarity of the data during the learning process of fitting parameters, so that prior information can be learned.

In order to solve the deficiencies of the existing technology, this paper proposes a multi-phase FZA lensless imaging algorithms based on diffusion model (MLDM). The training process uses a score-based generative model to continuously superimpose random noise on the original image to obtain a noisy image. The training part mainly obtains random samples of prior distribution by simulating the inverse of the stochastic process, and the result is to obtain the data gradient of the prior distribution. In this work, the MLDM algorithm is used to decode and reconstruct the encoded image, which uses multi-channel data distribution to encode image samples and sample data from the conditional distribution of the given observation data. The MLDM algorithm effectively characterizes the three-channel data distribution and its inline relationship of color images, thereby reconstructing high-quality images. During the reconstruction process, noisy images are predicted using the score function of the training process, where a fidelity step is added to enhance the consistency of the data [29]. The correction process uses annealing Langevin Dynamics, and the obtained noise reduction image then enters the prediction process iteration, and the data prior term and fidelity term are updated in each loop.

The rest of the manuscript is organized as follows. The second section introduces the principles of multi-phase lensless imaging and the background of score-based diffusion models. Detailed procedures and algorithms are described in Section 3. The experimental results and specifications regarding the implementation are given in Section 4. A discussion of the experimental results is arranged in Section 5. A conclusion is given in Section 6.

2. Preliminary

2.1 Lensless imaging principle

As shown in Fig. 1, lensless imaging directly applies an image sensor to record scene information encoded by an FZA mask, and then reconstructs the scene image through computer algorithms.

Fig. 1. The main processes of lensless imaging include optical encoding and computer decoding processes.

Download Full Size | PDF

The amplitude transfer function of an FZA is as follows [18]:

(1)$$t({x_p},{y_p};\phi ) = \frac{1}{2}[1 + \cos \{ \frac{\pi }{{r_1^2}}({x_p}^2 + {y_p}^2) + \phi \} ]$$

where ${r_1}$ is a constant and represents the innermost radius of the FZA, $({x_p},{y_p})$ is the spatial coordinate located on the FZA plane. $\phi$ corresponds to the initial phase of an FZA. It is important to note that intensity transmittance is often approximated as amplitude transmittance to represent the mask function [2,7]. The geometric shadow of an FZA on the sensor can be represented by the following equation:

(2)$$T(x,y;\phi ) = \frac{1}{2}[1 + \cos \{ \frac{\pi }{{{{(1 + M)}^2}r_1^2}}({x^2} + {y^2}) + \phi \} ]$$

where $M = d/z$ stands for the magnification of system, d represents the distance between the FZA and the sensor, and z represents the distance between the object and the FZA. $(x,y)$ is the spatial coordinate located on the sensor plane. According to Euler's formula, Eq. (2) can be converted as follows:

(3)$$T(x,y;\phi ) = \frac{1}{2} + \frac{1}{4}[h(x,y;\phi ) + {h^\ast }(x,y;\phi )]$$

(4)$$h(x,y;\phi ) = \textrm{exp} [\frac{{i\pi }}{{r_1^2{{(1 + M)}^2}}}({x^2} + {y^2}) + i\phi ]$$

where ${h^\ast }(x,y;\phi )$ is the conjugate of $h(x,y;\phi )$. When $r_1^2 = \lambda d$, $h(x,y;\phi )$ has the same effect as the Fresnel propagation kernel function $\textrm{exp} (i[\pi /\lambda d{(1 + M)^2}]({x^2} + {y^2}) + i\phi )$, where $\lambda$ stands for wavelength.

The set of all point sources from the object modulated by the FZA will be captured by the sensor under illumination of incoherent light. The imaging process can also be represented by the convolution of the target image and the FZA transmission function:

(5)$$I(x,y;\phi ) = T(x,y;\phi ) \otimes O(x,y)$$

where I represents the intensity of a captured coded image, ${\otimes}$ is the convolution operator, $O(x,y)$ is the intensity of the magnified subject on the sensor plane. Equation (6) can be obtained from Eq. (3) and Eq. (5):

(6)$$\begin{aligned} I(x,y;\phi ) &= \frac{1}{2}O(x,y) + \frac{1}{4}O(x,y) \otimes [h(x,y;\phi ) + \textrm{ }{h^\ast }(x,y;\phi )]\\ \textrm{ } &= C + \frac{1}{4}[U(x,y;\phi ) + {U^\ast }(x,y;\phi )]\\ \textrm{ } &= C + \frac{1}{2}\textrm{Re} \{ U(x,y;\phi )\} \end{aligned}$$

where $C = O(x,y)/2$ is a constant that can be removed by filtering out the direct current component. $U(x,y;\phi ) = h(x,y;\phi ) \otimes O(x,y)$ can be thought of as a diffracted wavefront image of the incident light field with wavelength $\lambda$ propagating at a distance d. ${U^\ast }(x,y;\phi )$ and $U(x,y;\phi )$ are conjugate relationships. In the Fourier frequency domain, Eq. (6) can be regarded as follows:

(7)$$I = \frac{1}{2}\textrm{Re} \{ {\mathrm{{\cal F}}^{ - 1}}{\cal H}{\cal F}O\}$$

where $I \in {{\mathbb R}^{{N_{xy}}}}$ denotes the image captured by the sensor, $O \in {{\mathbb R}^{{N_{xy}}}}$ is the target image. ${\cal F}$ and ${\mathrm{{\cal F}}^{ - 1}}$ are the Fourier transform operator and the inverse Fourier transform operator, respectively, and ${\cal H}$ indicates the operator that multiplies by the transfer function $H = i\textrm{exp} [ - i\pi r_1^2{(1 + M)^2}({u^2} + {v^2}) + i\phi ]$.

Under conditions of multi-phase imaging, a lensless encoding imaging system can be thought of as a linear transformation process which can be expressed as:

(8)$${I_k}\textrm{ = }{H_k}O$$

where ${H_k}$ represents the matrix of transfer functions encoded in the k-th phase ${\phi _k}$, ${I_k}$ is the projection of the target image on the sensor after the k-th phase modulation. In the process of acquiring image information, because the image sensor is only sensitive to the intensity of the incident light. Only the intensity information of the image can be acquired during the image acquisition process and the phase information is lost. Which will lead to the appearance of twin images [30,31]. The existence of twin images is an inherent problem in coaxial holograms, it is manifested as a conjugate image of an object in the reconstruction image superimposed on the original image, resulting in a decrease in the quality of the reconstruction image. The key to solving this problem is to obtain useful prior information and use it to constrain the reconstruction process.

2.2 Score-based SDE

Generative models have achieved great success in the field of generating realistic and diverse data samples [24,32]. The forward stochastic differential equation (SDE) of the generative model achieves the purpose of perturbing the data distribution by adding Gaussian noise. The generative model can be trained to estimate the gradient of the logarithmic data distribution, and then used to numerically solve the reverse SDE to generate data from the noise to achieve the purpose of sampling from the data distribution.

Further elaborate on the above process, a continuous diffusion process $\{ x(t)\} _{t = 0}^T$ with $x(t) \in {{\mathbb R}^{{M_x}}}$ is indexed by a continuous-time variable $t \in [0,T]$. Where $x(0)\sim {p_0}$, ${p_0}$ represents the data distribution of the target image; and $x(T)\sim {p_T}$, ${p_T}$ is the prior distribution. The diffusion process can be modeled as the solution of the following SDE:

(9) $$dx = f(x,t)dt + g(t)dw$$

where $f \in {{\mathbb R}^{{M_x}}}$ and $g \in {\mathbb R}$ correspond to the drift and diffusion coefficients of $x(t)$, respectively. $w \in {{\mathbb R}^{{M_x}}}$ induces the Brownian motion.

Different SDEs can be constructed by selecting different functions f and g. Since Variance Exploding (VE) SDE results in higher sample qualities, Song et al. developed the following VE-SDE:

(10)$$f = 0,\textrm{ }g = \sqrt {d[{\sigma ^2}(t)]/dt} dw$$

where $\sigma (t) > 0$ is a monotonically increasing function, which is usually set to a geometric progression [24,28]. Starting from samples of $x(T)\sim {p_T}$ and reversing the process, these samples $x(0)\sim {p_T}$ can be obtained. As we all know, the reverse of the above process is also a diffusion process [33]. Therefore, it can be expressed as reverse time VE-SDE:

(11)$$\begin{aligned} dx &= [f(x,t) - g{(t)^2}{\nabla _x}\log {p_t}(x)]dt + g(t)d\bar{w}\\ \textrm{ } &={-} \frac{{d[{\sigma ^2}(t)]}}{{dt}}{\nabla _x}\log {p_t}(x) + \sqrt {\frac{{d[{\sigma ^2}(t)]}}{{dt}}} d\bar{w} \end{aligned}$$

Since all true ${\nabla _x}\log {p_t}(x)$ of t is uncertain, it can be estimated by training a score network ${S_\theta }(x,t) \simeq {\nabla _x}\log {p_t}(x)$ that changes over time. Then, samples can be generated by a variety of digital SDE solvers, such as the Euler-Maruyama method and the stochastic Runge-Kutta method [34].

Getting the score function of all t is necessary to solve Eq. (11), and estimating it using a time-conditioned neural network. ${S_\theta }(x,t) \simeq {\nabla _x}\log {p_t}(x)$ is a good way to do so, all weights and biases generated in interpolated and denoising networks are included in the training parameter $\theta $. Notably, the true score is unknown and it can be replaced by the unknown ${\nabla _x}\log {p_t}(x)$ with ${\nabla _x}\log {p_t}(x(t)|x(0))$, where ${\nabla _x}\log {p_t}(x(t)|x(0))$ is a Gaussian perturbation kernel centered on $x(0)$. Under certain conditions, training ${S_\theta }(x,t)$ with the ability to match denoising scores will definitely satisfy ${S_{\hat{\theta }}}(x,t) = {\nabla _x}\log {p_t}(x(t))$. Formally, the score network ground parameter $\theta$ is optimized using the following equation:

(12)$$\hat{\theta } = \mathop {\arg \min }\limits_\theta {\mathrm{\mathbb{E}}_t}\{ \lambda (t){\mathrm{\mathbb{E}}_{x(0)}}{\mathrm{\mathbb{E}}_{x(t)|x(0)}}[||{S_\theta }(x(t),t) - {\nabla _{x(t)}}\log {p_t}(x(t)|x(0))|{|^2}]\} $$

where t is sampled uniformly on $[0,T]$. ${p_t}(x(t)|x(0))$ is a Gaussian perturbation nucleus centered on $x(0)$, and ${\nabla _{x(t)}}\log {p_t}(x(t)|x(0))$ represents the gradient of the perturbing nucleus. When the network model trained by Eq. (12) satisfies ${S_\theta }(x(t),t) \simeq {\nabla _{x(t)}}\log {p_t}(x(t))$, it can be brought into the reverse time VE-SDE to solve Eq. (11) and simulated for sampling.

3. Proposed MLDM model

In the lensless imaging system, incoherent light illumination is used to transmit countless point light source lines in the scene through the FZA. These point sources are projected on the sensor and incoherently superimposed to finally form an encoded image. However, in the process of image acquisition, the image sensor can only acquire the intensity information of the image and lose the phase information, which leads to the appearance of twin image of objects in the reconstructed image [30,31]. On the problem of aforementioned, this work uses the correlation of different channels in the color space to realize the extraction of color multi-channel information, which solves the problem that the traditional single-channel reconstruction does not consider the correlation between channels. When using high-dimensional prior information for image processing, the prior information can be introduced into the processing model as a priori distribution, so as to realize the optimization of the processing process such as image repair and enhancement. Secondly, inspired by the phase-shifting techniques in the past few decades [18,19], different phase encoded image information is used in the iterative reconstruction process to further eliminate the twin effect.

As analyzed above, score-based generative model can obtain richer information and high-quality results by estimating the prior distribution of image data. Therefore, the learning of prior information is modeled using a VE-SDE, which learns more effective image distribution features. Two different schemes are proposed to solve the twin image problem in lensless imaging.

MLDM-I: Due to the high correlation of red (R), green (G), and blue (B) components in natural scene images, using components of other channels can effectively compensate for the lost information of a certain channel, thereby improving image quality reconstruction. Therefore, the distribution of RGB objects is modeled using a score-based generative model to learn prior information about the characteristics of the image distribution. Finally, the encoded information of different phases and the learned prior distribution are used in the iterative reconstruction process to improve the image reconstruction quality.

MLDM-II: In order to avoid restricting the data to low data density areas [35], this experiment effectively solves the above problems by improving the diffusion model, i.e., increasing the dimensionality of the processing objects. By broadening the input channel of the network and learning information representation in high-dimensional situations, it is easier to achieve the diversity of generation and further improve the quality of image reconstruction. Details will be presented in the following subsections.

3.1 MLDM-I: reconstruction in RGB object

The multi-phase lensless imaging problem boils down to solving Eq. (8), which is an underdetermined inverse problem. To solve this problem, various priors are incorporated into regularized objective function. The regularized objective function is expressed as:

(13)$$O = \arg \mathop {\min }\limits_O \sum\limits_{k = 1}^K {{\nu _k}||{H_k}O - {I_k}|{|^2}} + \gamma R(O)$$

where the first and second terms are the data consistency term and the regularization term, respectively. K is the number of FZA mask phases. ${\nu _k}$ is the weight value of the k-th phase encoding information, $\textrm{||}g|{|^2}$ represents the ${\ell _2}$ norm and $\gamma$ is the factor that balances the data consistency term and the regularization term. With a suitable regularization term, the objective function can converge rapidly and acquire a good result.

Therefore, a score-based model trained on a target image dataset are used to introduce the learned priors into the lensless imaging inverse problem. Specifically, the images in the training dataset are fed one by one to the score-matching generative model for training. The model perturbs the data distribution by slowly adding Gaussian noise using an SDE. A continuous time-dependent score function ${S_\theta }(O,t)$ is trained to estimate the gradient ${\nabla _O}\log {p_t}(O(t))$ of the logarithmic data distribution by using the denoising score matching method of VE-SDE, Eq. (12) is further expressed as:

(14)$$\hat{\theta } = \arg \mathop {\min }\limits_\theta {\mathrm{\mathbb{E}}_t}\{ \lambda (t){\mathrm{\mathbb{E}}_{O(0)}}{\mathrm{\mathbb{E}}_{O(t)|O(0)}}[||{S_\theta }(O(t),t) - {\nabla _{O(t)}}\log {p_t}(O(t)|O(0))|{|^2}]\} $$

Through the above training, the reverse time VE-SDE can be solved once the trained score network ${S_\theta }(O,t)$ is brought into Eq. (11):

(15)$$dO ={-} \frac{{d[{\sigma ^2}(t)]}}{{dt}}{S_\theta }(O,t) + \sqrt {\frac{{d[{\sigma ^2}(t)]}}{{dt}}} d\bar{w}$$

Comprehensively, the SDE bidirectional process is described in Fig. 2. Firstly, the forward VE-SDE diffuses the image data distribution $O(0) \sim {p_0}$ into a prior distribution $O(T) \sim {p_T}$. Secondly, to find the score of the distribution at each intermediate time step, one can estimate a score function with a time-conditional neural network ${S_\theta }(O(t),t) \simeq {\nabla _O}\log {p_t}(O(t))$. Finally, by reversing the forward process and starting from samples of $O(T) \sim {p_t}$, samples $O(0) \sim {p_0}$ can be obtained.

Fig. 2. Forward and reverse-time process of VE-SDE on RGB object.

Download Full Size | PDF

Then, Euler discrete method is used to solve VE-SDE numerically [24]. A predictor-corrector sampler is introduced to correct errors in the evolution of the discretized direction VE-SDE. The score-matching-based generative model first obtains a preliminary predicted reconstructed image from the obtained prior distribution of the target image by numerically solving the reverse time SDE, and then corrects this preliminary prediction result by using annealed Langevin as a corrector [36]. At the same time, when solving the reverse time SDE and annealing Langevin sampling, a data consistency is required after each unconditional sampling update step.

Specifically, the predictor refers to a numerical solver for the reverse-time SDE, which is used for giving an estimate of the sample at the next time step. In order to effectively enhance the consistency of the data, the variable ${V_i}$ is inserted into the DC formulation after the predictor updates it.

(16)$${V_i} = {O_{i + 1}} + (\sigma _{i + 1}^2 - \sigma _{i + 1}^2){S_\theta }({O_{i + 1}},{\sigma _{i + 1}}) + \sqrt {\sigma _{i + 1}^2 - \sigma _i^2} z$$

(17)$${O_i} = \arg \mathop {\min }\limits_O \sum\limits_{k = 1}^K {{\nu _k}||{H_k}{O_{i + 1}} - {I_k}|{|^2}} + \gamma ||{O_{i + 1}} - {V_i}|{|^2}$$

where $i = N - 1,\textrm{ } \cdots ,\textrm{ }1,\textrm{ }0$ is the number of discretization steps for the reverse-time SDE, ${\sigma _i}$ is the noise schedule at the i-th iteration, and $z \sim {\mathbb N}(0,1)$ is the standard normal, ${\nu _k}$ is the weight value of different phase encoding information in objective function. Specifically, the second minimization in Eq. (17) is a standard least square, which can be solved as follows:

(18)$${O_i} = [\sum\limits_{k = 1}^K {{\nu _k}{H_k}^T({O_{i + 1}} - {I_k})} + \gamma ({O_{i + 1}} - {V_i})]/\textrm{[}\sum\limits_{k = 1}^K {{\nu _k}{H_k}^T} {H_k} + \gamma \textrm{] }$$

The variable ${O_i}$ obtained in Eq. (18) is input into the corrector, and the Langevin Markov chain correction algorithm is used to correct the gradient rising direction, and then the data consistency item is inserted into it.

(19)$${U_{i,j}} = {O_{i,j - 1}} + {\varepsilon _i}{S_\theta }({O_{i,j - 1}},{\sigma _i}) + \sqrt {2{\varepsilon _i}} z$$

(20)$${O_{i,j}} = \arg \mathop {\min }\limits_O \sum\limits_{k = 1}^K {{\nu _k}||{H_k}{O_{i,j - 1}} - {I_k}|{|^2}} + \gamma ||{O_{i,j - 1}} - {U_{i,j}}|{|^2}$$

where $i = N - 1,\textrm{ } \cdots ,\textrm{ }0$ and $j = 1,\textrm{ }2,\textrm{ } \cdots ,\textrm{ }M$ represents the iteration of outer and inner loops. Equation (20) can be further reduced as:

(21)$${O_{i,j}} = \sum\limits_{k = 1}^K {{\nu _k}{H_k}^T} ({O_{i,j - 1}} - {I_k}) + \gamma ({O_{i,j - 1}} - {U_{i,j}})/\sum\limits_{k = 1}^K {{\nu _k}{H_k}^T} {H_k} + \gamma$$

Due to the inherent characteristics of lensless imaging, the reconstructed image has an obvious grid phenomenon, and there is a certain gap between the reconstructed image and the original image. To address this issue, total variation (TV) is applied to the reconstruction results.

(22)$${O_i} = ||{O_i}|{|_{TV}} = \sum\limits_i^{{N_{xy}}} {\sqrt {|\Delta _i^h{O_i}{|^2} + |\Delta _i^v{O_i}{|^2}} }$$

where $\Delta _i^h$ and $\Delta _i^v$ denote the horizontal and vertical first-order local difference operations, respectively. $N\textrm{xy}$ is the total number of pixels.

Figure 3 shows the flowchart of the MLDM-I algorithm proposed in this work. The whole iterative reconstruction procedure consists of a two-level loop. In the iterative process of the outer loop, each loop updates the data prior items and data fidelity items of the prediction and correction process, and finally performs TV denoising. The number of outer loops is the number of discretization steps of the reverse time SDE. The correction process of the inner loop is performed by annealing the Langevin iteration. Comprehensively, the pseudo-code of the reconstruction algorithm of MLDM-I is shown in Algorithm 1.

Fig. 3. Reconstruction flow chart of MLDM-I algorithm. Top: Training stage to learn the gradient distribution of RGB object via denoising score matching. Bottom: Iterate between numerical SDE solver and data-consistency step to achieve reconstruction.

Download Full Size | PDF

oe-31-12-20595-i001

3.2 MLDM-II: reconstruction in synthetic object

Inspired by the appealing performance of homotopic gradients of generative density priors over the naive score-based model NCSN, this experiment concludes that prior information learned from high-dimensional tensor is more effective for image reconstruction than the low-dimensional counterpart [37]. Therefore, replication technology is used for promoting the denoising score matching in generative model VE-SDE. In detail, a channel-copy transformation ${O^\ast } = B(O)$ is utilized to establish a $N$-channel higher-dimensional tensor. In our case dealing with lensless imaging, transforming $B(O)$ forms a high-dimensional tensor ${O^\ast }$. In this work ${O^\ast } = {(O,O)^T}$ with the setting of $N = 2$. The goal of achieving ${O^\ast }$ stacking is to create an object in high-dimensional manifold [38] that is favorable for subsequent network learning. This approach helps to avoid potential difficulties for both accuracy in score estimation and sampling with Langevin dynamics. Subsequently, the MLDM-II is trained with ${O^\ast }$ in high-dimensional space as network input and the parameterized ${S_\theta }({O^\ast },t)$ is obtained in the upper part of Fig. 5. The Eq. (14) is rewritten as:

(23)$$\hat{\theta } = \arg \mathop {\min }\limits_\theta {\mathrm{\mathbb{E}}_t}\{ \lambda (t){\mathrm{\mathbb{E}}_{{O^\ast }(0)}}{\mathrm{\mathbb{E}}_{{O^\ast }(t)|{O^\ast }(0)}}[||{S_\theta }({O^\ast }(t),t) - {\nabla _{{O^\ast }(t)}}\log {p_t}({O^\ast }(t)|{O^\ast }(0))|{|^2}]\} $$

Once the network is trained with Eq. (23), the approximation ${S_\theta }({O^\ast },t) \simeq {\nabla _{{O^\ast }}}\log {p_t}({O^\ast }(t))$ can be plugged to solve the reverse SDE in Eq. (11):

(24)$$d{O^\ast } ={-} \frac{{d[{\sigma ^2}(t)]}}{{dt}}{S_\theta }({O^\ast },t) + \sqrt {\frac{{d[{\sigma ^2}(t)]}}{{dt}}} d\bar{w}$$

The Eq. (24) can be numerically solved with Euler discretization. This involves discretizing t in range [0, 1], which is uniformly separated into N intervals such that $0 = {t_0} < \cdots < {t_N} = 1$, $\Delta t = 1/N$. Additionally, the direction of gradient ascent can be corrected with Langevin Markov Chain Monte Carlo algorithm [36].

Here, the detail for implementing Eq. (24) is further given. An alternating optimization algorithm is used to minimize the decoupling of prior information items and data-consistency items. A predictor is used to estimate the predicted value of the sample at the next time step, and a DC step is added to enhance the consistency of the data.

(25)$$V_i^\ast{=} O_{i + 1}^\ast{+} (\sigma _{i + 1}^2 - \sigma _{i + 1}^2){S_\theta }(O_{i + 1}^\ast ,{\sigma _{i + 1}}) + \sqrt {\sigma _{i + 1}^2 - \sigma _i^2} z$$

(26)$$O_i^\ast{=} \arg \mathop {\min }\limits_O ||\left( {\begin{array}{{cc}} {{\nu_1}{H_1}}&0\\ 0&{{\nu_2}{H_2}} \end{array}} \right)O_{i + 1}^\ast{-} \left( \begin{array}{l} {I_1}\\ {I_2} \end{array} \right)|{|^2} + \gamma ||O_{i + 1}^\ast{-} V_i^\ast |{|^2}$$

where $\left( {\begin{array}{{cc}} {{\nu_1}{H_1}}&0\\ 0&{{\nu_2}{H_2}} \end{array}} \right)$ is the weight matrix of different phase Fresnel diffraction propagation kernel functions, $\left( \begin{array}{l} {I_1}\\ {I_2} \end{array} \right)$ is the matrix of different phase-encoded images.

Equation (26) can be further reduced as:

(27)$$O_i^\ast{=} \frac{{{{\left( {\begin{array}{{cc}} {{\nu_1}{H_1}}&0\\ 0&{{\nu_2}{H_2}} \end{array}} \right)}^T}(O_{i + 1}^\ast{-} \left( \begin{array}{l} {I_1}\\ {I_2} \end{array} \right)) + \gamma (O_{i + 1}^\ast{-} V_i^\ast )E}}{{{{\left( {\begin{array}{{cc}} {{\nu_1}{H_1}}&0\\ 0&{{\nu_2}{H_2}} \end{array}} \right)}^T}\left( {\begin{array}{{cc}} {{\nu_1}{H_1}}&0\\ 0&{{\nu_2}{H_2}} \end{array}} \right) + \gamma E}}$$

where E is the identity matrix. Then, the score-based MCMC approach [36] corrects the marginal distribution of the estimated sample, playing the role of a corrector. Inserting the DC step after the corrector.

(28)$$U_{i,j}^\ast{=} O_{i,j - 1}^\ast{+} {\varepsilon _i}{S_\theta }(O_{i,j - 1}^\ast ,{\sigma _i}) + \sqrt {2{\varepsilon _i}} z$$

(29)$$O_{i,j}^\ast{=} \arg \mathop {\min }\limits_O ||\left( {\begin{array}{{cc}} {{\nu_1}{H_1}}&0\\ 0&{{\nu_2}{H_2}} \end{array}} \right)O_{i,j - 1}^\ast{-} \left( \begin{array}{l} {I_1}\\ {I_2} \end{array} \right)|{|^2} + \gamma ||O_{i,j - 1}^\ast{-} U_{i,j}^\ast |{|^2}$$

Next, perform a TV operation on the reconstruction result:

(30)$$O_i^\ast{=} ||O_i^\ast |{|_{TV}} = \sum\limits_i^{{N_{xy}}} {\sqrt {|\Delta _i^hO_i^\ast {|^2} + |\Delta _i^vO_i^\ast {|^2}} }$$

After the virtual variable $O_i^\ast $ is updated, it becomes an RGB three-channel data output via channel-average operator.

Annealed Langevin dynamics is used to further sample from the high-dimensional noisy data distribution with multi-view noise. To intuit the procedure of annealed Langevin dynamics, the intermediate samples are provided in Fig. 4, where each row shows how samples evolve from pure random noise to high-resolution images.

Fig. 4. Iterative generation of Langevin dynamics for annealing synthetic objects.

Download Full Size | PDF

Fig. 5. Reconstruction flow chart of MLDM-II algorithm. Top: Training stage to learn the gradient distribution of synthetic object via denoising score matching. Bottom: The lower part is there construction flow chart used for encoding image calculation and decoding.

Download Full Size | PDF

oe-31-12-20595-i002

The flowchart of training phase and iterative reconstruction phase for MLDM-II is shown in Fig. 5. Specifically, this experiment first trains a score-based generative model to capture image prior distribution. Then, at the inference stage, the numerical SDE solver and data-consistency step is iteratively updated to achieve the image reconstruction. Since the Langevin dynamics updating can guarantee algorithms convergence, the overall MLDM-II algorithm will be convergent after finite iterations. The Algorithm 2 describes the reconstruction algorithm in detail.

4. Experiments

4.1 Data specification

Since there are currently no public datasets for FZA lensless imaging, all the training and testing in this experiment were carried out on the LSUN dataset [39]. LSUN is a large color image dataset with 10 scene categories and 20 object categories, each containing about one million labeled images. The outdoor scene church and indoor scene bedroom are chosen among them, then observe the training and testing effects of the network model on two different training sets, respectively. In addition, in order to further verify the performance of MLDM, 100 images of each dataset were selected as experimental test datasets.

4.2 Model training and parameter selection

The bedroom and church datasets given by the LSUN dataset were used for training, separately. Each image was reshaped into $256 \times 256$ pixels as preprocessing, and its pixel values were scaled to [0,1]. The data distribution was perturbed by Gaussian noise with noise values ranging from 0.01 to 380. Meanwhile, to avoid numerical problems, the parameter $\varepsilon$ was set to be 10⁻⁵. The signal to noise ratio (SNR) was set to be 0.16. The maximum value of gradient clipping was set to be 1.0. The parameters were exponential moving average. The flow rate was set to be 0.999 and each dataset was trained 500,000 times. For the PC sampling algorithm $N = 1000$, $M = 1$ iterations were used in this experiment. The largest peak signal to noise ratio (PSNR) and its corresponding structural similarity index measure (SSIM) were taken in this experiment, then the average of the 100 sets of data were calculated. A fixed learning rate of 0.00001 was used in the simulation experiment and Adam optimizer with ${\beta _1} = 0.9$ and ${\beta _2} = 0.999$ was utilized to optimize the network. The training and testing experiments were performed with NVIDIA 1080Ti GPU, 12 GB.

4.3 Quantitative indices

Due to the visual characteristics of the human eye, evaluation results often depend on the subjective perception of the person. For the effective quantitative evaluation of the experiments in this work, peak signal to noise ratio (PSNR) and structural similarity index measure (SSIM) are chosen as metrics.

The PSNR is based on the error between corresponding pixel points in dB. This indicator is based on error-sensitive image quality evaluation, the larger the value, the smaller the distortion. It is an objective evaluation index of images, defined by MSE as follows:

(31)$$\textrm{MSE} = \frac{1}{{H \times W}}\sum\limits_{i = 1}^H {\sum\limits_{j = 1}^W {{{({\hat{O}(i,j) - O(i,j)} )}^2}} }$$

(32)$$\textrm{PSNR}(\hat{O},O)\textrm{ = 10} \times \textrm{lo}{\textrm{g}_{\textrm{10}}}\frac{{{{\max }_{(i,j)}}\{{{{\hat{O}}^2}(i,j)} \}}}{{\textrm{MSE}(\hat{O},O)}}$$

where $\hat{O}$ and O represent the target image and the reconstructed image respectively. $H \times W$ is the size of image $\hat{O}$ and O. $\hat{O}(i,j)$ and $O(i,j)$ are the pixels of $\hat{O}$ and O at position $(i,j)$, respectively. The larger the PSNR value, the better the quality of the reconstructed image.

The SSIM is an index to measure the structural similarity of two images. It measures the similarity between images from three aspects: width, contrast and structure:

(33)$$\textrm{SSIM}(\hat{O},O)\textrm{ = }\frac{{(2{\mu _{\hat{O}}}{\mu _O} + {c_1})(2{\sigma _{\hat{O}O}} + {c_2})}}{{(\mu _{\hat{O}}^2 + \mu _O^2 + {c_1})(\sigma _{\hat{O}}^2 + \sigma _O^2 + {c_2})}}$$

where ${\mu _{\hat{O}}}$ and ${\mu _O}$ represent the mean of the image. ${\sigma _{\hat{O}}}$ and ${\sigma _O}$ represent the variance of the image. ${\sigma _{\hat{O}O}}$ represent the covariance of the image. ${c_1}$ and ${c_2}$ are constants used to maintain stability. The value range of SSIM is $[0,1]$, the larger the value is, the smaller the image distortion is and the more acceptable to the human eye.

4.4 Simulation results

In the simulation experiments, a multiphase FZA mask was used as an encoder, the radius of the innermost area of the mask $r1$ was set to be 0.4 mm, and two phases of 0 and π/2 were used for encoding. The FZA mask was placed 3 mm away from the sensor. The sensor had a pixel count of 256 × 256 with a pixel pitch of 0.014 mm. A target with dimensions of 200 mm × 200 mm was placed 300 mm away from the FZA mask. Each pixel recorded a superposition of images from multiple apertures on the encoding mask. MLDM-I and MLDM-II were used to reconstruct the simulated data, respectively. In these experiments, the weight coefficients $v1$ and $v2$ used by MLDM-I were 0.05 and 1, and the weight coefficients $v1$ and $v2$ used by MLDM-II were 0.1 and 1.5, respectively.

Experiments on the LSUN bedroom dataset. For visual comparison, Fig. 6 selects two different sets of experimental results from the LSUN bedroom dataset for display, where each set of experimental results includes the ground truth, and the images reconstructed by the BP [2], CS [2], LSGM [40], MLDM-I and MLDM-II respectively. It can be seen intuitively that the reconstruction quality of the BP is the worst, and an obvious twin image effect can be observed. Besides, compared with the ground truth, the reconstruction results of the CS have obvious color fidelity, and image clarity is not well done. For example, in the second rows, the CS reconstruction results have obvious color gaps with the ground truth, with obvious color diffusion in the middle and discoloration of the areas at the edges of the images. Compared with BP and CS, LSGM is much better in the clarity and color accuracy of the reconstructed images, but there is still a relatively obvious gridding effect. Obviously, the image quality of the fifth and sixth column in Fig. 6 is significantly better than the other three sets of results, and the reconstruction results of MLDM-II are the best among the all algorithms.

Fig. 6. Visual comparison of reconstruction images on the LSUN bedroom dataset. (a) Ground Truth (b) BP (c) CS (d) LSGM (e) MLDM-I (f) MLDM-II.

Download Full Size | PDF

For further quantitative analysis, the average PSNR and SSIM of the all algorithms are shown in Table 1. It can be seen from the Table 1 that the average PSNR of the traditional BP is only 7.40 dB. In contrast, the other algorithms can achieve better results in an iterative manner. Comparing the data in the Table 1, the PSNR of CS and LSGM are nearly 10.28 dB and 17.06 dB higher than that of the BP. The MLDM-II proposed in this work has the best imaging quality. After iterative correction, its PSNR can reach 29.91 dB, and the SSIM can reach 0.941, which are 5.45 dB and 0.23 higher than that of the LSGM, respectively. In addition, from the reconstruction results of the fifth and sixth columns of Fig. 6, it can be concluded that the MLDM proposed in this work has a good reconstruction effect in both three and six channels. As shown in Table 1, PSNR can reach more than 29 dB and SSIM can reach more than 0.93. However, compared with MLDM-I, the PSNR and SSIM of MLDM-II reconstruction images will still have certain improvements, which proves that the increase in the number of channels can indeed improve the quality of the reconstructed images to some extent.

Table 1. Comparison of test results on LSUN bedroom dataset.

View Table | View all tables in this article

Experiments on the LSUN church dataset. To better verify the robustness of the MLDM, this experiment also selected the LSUN church dataset with richer grayscale, brightness and structural changes of the images for the experiment.

Figure 7 shows the comparison results of the five algorithms in this experiment on the LSUN church dataset. Same as the experiment in the previous section, this experiment shows two different groups of reconstructed images, each group contains the ground truth and the reconstruction results of the BP, CS, LSGM, MLDM-I and MLDM-II. Similar to the experimental results on the LSUN bedroom dataset, the reconstruction results of the BP are the worst, the images reconstructed by the CS always have colors that are not related to the images. For example, obvious purple and green twin image can be observed in the CS reconstruction results of the first row in Fig. 7, especially in the middle and the areas at the edges of the images. Although the reconstruction results of the CS do not affect the recognition of the image, the image quality is still blurred. Compared with CS and BP, LSGM has a great improvement in image quality and definition, but the MLDM-I and MLDM-II proposed in this experiment have better reconstruction effect than LSGM. And the gridding effect of MLDM-I is better solved than LSGM. Comparing the reconstruction results of the LSGM and MLDM-II in the second row in Fig. 7, it can be found that the MLDM-II has higher definition. According to the PSNR and SSIM given in Fig. 7, the quantitative evaluation indexes of the reconstruction of the MLDM-II are also the highest. Comparing the reconstruction results of each group in Fig. 7, the images in the last column are better than the images in the second, third, fourth and fifth columns in terms of image color, brightness and sharpness. Therefore, the MLDM-II is the best performer. It can clearly reconstruct the image details while ensuring the color accuracy, which is closer to the ground truth.

Fig. 7. Visual comparison of reconstruction images on the LSUN church dataset. (a) Ground Truth (b) BP (c) CS (d) LSGM (e) MLDM-I (f) MLDM-II.

Download Full Size | PDF

Quantitative comparison of PSNR and SSIM in Table 2 shows that among the five algorithms, MLDM-II has the highest PSNR and SSIM, and the average PSNR on the LSUN church dataset is close to the results on the LSUN bedroom dataset. After changing to a dataset with richer grayscale, brightness and structural changes, the evaluation indicators of the BP are almost unchanged. In conclusion, the MLDM-II proposed in this experiment perform relatively stable and the best in both indicators, with the highest PSNR reaching 28.87 dB and SSIM value reaching 0.933. Compared with the CS and LSGM, the average PSNR of the MLDM-II is 13.52 dB and 4.51 dB higher, and the average SSIM of the MLDM-II is 0.682 and 0.149 higher, respectively. Therefore, the method proposed in this experiment reconstructs the best image, not only can maintain the color, brightness and clarity of images, but also can significantly remove the artifact noise of twin image.

Table 2. Comparison of test results on LSUN church dataset.

View Table | View all tables in this article

Combining the two sets of experimental results, it can be concluded that on different datasets, compared with the BP, CS, LSGM, MLDM-I and MLDM-II, the MLDM-II can achieve higher reconstruction quality and the model performance is more stable.

4.5 Experimental validation

This section experimentally verified the proposed concept. As shown in Fig. 8, a self-designed multi-phase FZA mask is placed tightly onto the surface of the CMOS sensor (QHY163M, Light Speed Vision (Beijing) Co., Ltd, China) to capture the target displayed on the LCD screen. The resolution of the CMOS is 4650 × 3522. A chrome layer with a thickness of 0.1 mm is plated onto a substrate with dimensions of 127 × 127 × 2.3 mm. The entire mask plate is divided into 6 rows and 5 columns, resulting in a total of 30 regions, with each area measuring 20 × 20 mm. The r₁ parameters of FZA range from 0.2 mm to 0.8 mm, while the phases of FZA include 0, π/2, π, and 3π/2. According to the thickness of CMOS cover glass and FZA mask, the distance between the FZA and the sensor is 3.4 mm.

Fig. 8. The multi-phase lensless imaging system. The insets show the self-designed FZA mask and CMOS sensor, respectively

Download Full Size | PDF

Since this system uses a mono-chrome camera, the three-channel data is acquired by exposure of the three-channel image separately. The imaging target was loaded on the LCD screen at a distance of 300 mm from the FZA zone plate. The gathered coded image was saved onto the local computer. The 0 and π/2 phase-modulated sub-images, each with dimensions of 768 × 768, were cropped from the original images acquired by CMOS. They are then reduced to 256 × 256 pixels using pixel averaging and fed into the network for image reconstruction through the proposed MLDM methods.

To further verify the performance of the proposed MLDM algorithms, three images with significant color distributions and a mono-chrome image were used as imaging targets for the practical experiments. The $r1$ parameter of the FZA was chosen to be 0.4 mm and the phase was chosen to be 0 and π/2. The BP, CS, LSGM and ADMM algorithms are also used for comparison. Figure 9 shows the comparative experimental results of different methods on optical experiment. The BP algorithm produced the lowest quality results among all algorithms used, with significant information loss and severe color distortion. While the CS algorithm showed notable improvement compared to BP, significant color distortion was still present, along with uneven color distribution, excessive noise, overly smooth images, and poor sharpness. LSGM reconstruction improved clarity but still resulted in a noticeable grid effect, despite a reduction in uneven color distribution and noise. The grid effect of ADMM is improved compared to LSGM, but the clarity is not as good as LSGM, and artifacts are noticeable at the edges. The MLDM-I and MLDM-II proposed in this experiment exhibited excellent performance in restoring the target images, eliminating the twin image effect, and improving image clarity and color saturation. The reconstructed images of MLDM-II had more noticeable noise at the edges and lower clarity compared to MLDM-I, this can be attributed to the increase in noise introduced by the additional channels. Overall, the MLDM proposed in this experiment effectively eliminated the twin image effect in lensless imaging systems and produced reconstructed images of higher quality than other algorithms.

Fig. 9. Visual comparison of reconstructed images on optical experiment. (a) Ground Truth (b) BP (c) CS (d) LSGM (e) ADMM (f) MLDM-I (g) MLDM-II.

Download Full Size | PDF

To quantitatively analyze the performance of the system, the image reconstruction results of the practical experiment for PSNR and SSIM were calculated. As shown in Table 3, MLDM has the best reconstruction effect, with the highest PSNR values of 24.35 dB. In the comparison of different algorithms, BP has the worst reconstruction effect. The PSNR and SSIM of MLDM-I are 3.15 dB, 0.267 higher than CS in the third image, and the PSNR and SSIM of MLDM-I are 4.50 dB, 0.194 higher than LSGM in the second image. In the reconstruction results of all four images, the PSNRs of MLDM-I are higher than that of ADMM, especially in the second image, where the PSNR of MLDM-I is 5.30 dB higher than that of ADMM.

Table 3. PSNR and SSIM of different methods on the experiments

View Table | View all tables in this article

In the comparison of SSIM, the SSIMs of ADMM in the first and second images are higher than the other algorithms, while the SSIMs of MLDM-I are higher than the other algorithms in the third and fourth images. This is because the reconstruction effect of ADMM is dark, and the surrounding edges of the image to be reconstructed are black after pinhole imaging, which leads to the SSIM of ADMM reconstruction results being higher than other algorithms. However, from the perspective of the reconstructed images, the visual effect of MLDM is significantly better than that of ADMM. In summary, the MLDM proposed in this experiment can reconstruct higher-quality multi-channel color images compared with traditional algorithms.

5. Discussion

5.1 Effect of the number of FZA phases on the reconstruction results

In order to test the influence of the phase number on the quality of the reconstruction image, this section shows the reconstruction results of MLDM-I and MLDM-II under different FZA phase encodings. The LSUN church dataset were selected for simulation experiments. The objects were encoded with FZA phases of (0, π/2), (0, π), (π/2, 3π/2), and (0, π/2, π, 3π/2), individually.

As shown in Fig. 10, the first column is ground truth. The second to fourth columns are reconstruction images under bi-phase conditions and the fifth column are the reconstruction images under fourth-phase conditions. It can be clearly seen from the third column that the reconstructed results of the bi-phase of (0, π) have obvious color difference compared with the ground truth, and there is obvious gridding effect. In the fourth column, the color difference problem has been solved, and the color of the reconstructed image is basically the same as the ground truth. From the results in the second column to the fourth column of Fig. 10, it can be seen that under the bi-phase encoding, the phases of (0, π/2) has the best reconstruction performance, and the color fidelity and image clarity are better than other bi-phase reconstructed images. It can be seen from the results shown in the fifth column of Fig. 10 that the quality of the reconstructed images of the fourth-phase encoding method is better than that of the reconstructed images under the bi-phase condition.

Fig. 10. Visual comparison of reconstructed images with different phase numbers on the LSUN church dataset. (a) Ground Truth (b) 0, π/2 (c) 0, π (d) π/2, 3π/2 (e) 0, π/2, π, 3π/2.

Download Full Size | PDF

As shown in Table 4, the phases of (0, π/2) achieves the best reconstruction results in the bi-phase, and the fourth-phase reconstruction result has the highest PSNR and SSIM, which are 0.07 dB and 0.001 higher than the phases of (0, π/2) in the first image. In the second image, the fourth-phase reconstruction result still has the highest SSIM, while the phases of (0, π/2) has the highest PSNR. It can be seen that there is little difference between the reconstruction effects of the phases of (0, π/2) in the bi-phase and the fourth-phase.

Table 4. PSNR and SSIM of different number of phases on the LSUN church dataset

View Table | View all tables in this article

In conclusion, increasing the number of phases can improve the quality of recovered image, with the fourth phase producing the best result. Although there is a slight improvement when compared with the bi-phase of (0, π/2), it is important to consider the practicality of the approach, as increasing the number of phases also increases the computational cost. Therefore, the bi-phase of (0, π/2) is a better choice in practical scenarios.

5.2 Resolution analysis

In order to verify the minimum resolution that the proposed system can achieve, an USAF-1951 target was used for testing, the image was magnified to a size of 300 × 300 mm and displayed on the LCD monitor. This target was placed 500 mm away from FZA.

The image reconstruction results are shown in Fig. 11. The first column of images is the ground truth of the USAF-1951 test chart, and the second to seventh columns are the reconstruction results of BP, CS, LSGM, ADMM, MLDM-I, and MLDM-II.

Fig. 11. Visual comparison of resolution test reconstructions. (a) Ground Truth (b) BP (c) CS (d) LSGM (e) ADMM (f) MLDM-I (g) MLDM-II.

Download Full Size | PDF

Reconstruction results of BP were the worst. CS and LSGM have difficulty distinguishing group line -1. For the ADMM and MLDM, the 4th element of the group line -2 can be observed, corresponding to the actual line pair width of 10 mm. Comprehensive analysis, MLDM has a clear reconstruction effect, and even the edge details of the image are well reconstructed. And the reconstruction result is also affected by the sensor and FZA encoding parameters, by reducing the $r1$ of FZA, the image reconstruction quality can be further improved.

6. Conclusion

In this study, a multi-phase FZA lensless imaging method based on score model termed MLDM is proposed. The multi-phase FZA integrated on a single mask plate is used to expand the data channel of a single-shot image. Based on the data distribution prior information extraction strategy of multi-channel coding, the information association between the pixel channel and the coding phase channel of the color image is further obtained, and the iterative reconstruction method effectively improves the reconstruction quality. Compared with the results of traditional experiment, the PSNR and SSIM indices are significantly improved. It shows that the four-phase can effectively eliminate the influence of the twin image and is more conducive to generating high-quality images. In the simulations, the highest PSNR and SSIM of MLDM-I and MLDM-II were 29.42 dB, 0.933, and 29.91 dB, 0.941 respectively. The model exhibits good modeling performance and accurately captures the data distribution, without the need for adversarial training to generate high-quality samples. By utilizing the multi-phase method to collect more encoded information, the extraction of prior information is made more effective, leading to an improvement in the reconstruction quality. The method proposed in this paper is expected to reduce the cost and size of cameras while ensuring imaging quality, and is expected to be applied to biomedical imaging, digital holography and other fields.

Funding

National Natural Science Foundation of China (62105138, 62122033); Key Research and Development Program of Jiangxi Province (20212BBE53001).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [41].

References

1. C. Hu and S. Han, “On Ghost Imaging Studies for Information Optical Imaging,” Appl. Sci. 12(21), 10981 (2022). [CrossRef]

2. J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, “Single-shot lensless imaging with fresnel zone aperture and incoherent illumination,” Light: Sci. Appl. 9(1), 53 (2020). [CrossRef]

3. H. N. Chapman and K. A. Nugen, “Coherent lensless X-ray imaging,” Nat. Photonics 4(12), 833–839 (2010). [CrossRef]

4. S. Witte, V. T. Tenner, D. W. Noom, and K. S. Eikema, “Lensless diffractive imaging with ultra-broadband table-top sources: from infrared to extreme-ultraviolet wavelengths,” Light: Sci. Appl. 3(3), e163 (2014). [CrossRef]

5. W. Bishara, T. W. Su, A. F. Coskun, and A. Ozcan, “Lensfree on-chip microscopy over a wide field-of-view using pixel super-resolution,” Opt. Express 18(11), 11181–11191 (2010). [CrossRef]

6. A. Greenbaum, W. Luo, T. W. Su, Z. Göröcs, L. Xue, S. O. Isikman, and A. Ozcan, “Imaging without lenses: achievements and remaining challenges of wide-field on-chip microscopy,” Nat. Methods 9(9), 889–895 (2012). [CrossRef]

7. L. Mertz and N. O. Young, “Fresnel transformations of images,” SPIE milestone series ms 128, 44–49 (1996).

8. M. K Kim, “Adaptive optics by incoherent digital holography,” Opt. Lett. 37(13), 2694–2696 (2012). [CrossRef]

9. M. K Kim, “Incoherent digital holographic adaptive optics,” Appl. Opt. 52(1), A117–A130 (2013). [CrossRef]

10. I. Moon and B. Javidi, “3-D visualization and identification of biological microorganisms using partially temporal incoherent light in-line computational holographic imaging,” IEEE Trans. Med. Imaging 27(12), 1782–1790 (2008). [CrossRef]

11. J. Rosen and G. Brooker, “Non-scanning motionless fluorescence three-dimensional holographic microscopy,” Nat. Photonics. 2(3), 190–195 (2008). [CrossRef]

12. T. Yanagawa, R. Abe, and Y. Hayasaki, “Three-dimensional mapping of fluorescent nanoparticles using incoherent digital holography,” Opt. Lett. 40(14), 3312–3315 (2015). [CrossRef]

13. J. Rosen and G. Brooker, “Digital spatially incoherent Fresnel holography,” Opt. Lett. 32(8), 912–914 (2007). [CrossRef]

14. M. I. Gnetto, Y. T. A. Kossonou, Y. Koffi, K. A. Kaduki, and J. T. Zoueu, “Solving the twin image problem in in-line holography by using multiple defocused intensity images reconstructed from a single hologram,” J. Mod. Opt. 69(3), 121–129 (2022). [CrossRef]

15. W. Y. Xu, D. A. Pommet, F. C. Lin, M. A. Fiddy, and M. D. Drake, “The dual twin image effect from a waveguide hologram,” Opt. Lasers. Eng. 23(2-3), 137–143 (1995). [CrossRef]

16. T. Latychevskaiat and H. W. Fink, “Solution to the twin image problem in holography,” Phys. Rev. Lett. 98(23), 233901 (2007). [CrossRef]

17. Y. Ma, J. Wu, S. Chen, and L. Cao, “Explicit-restriction convolutional framework for lensless imaging,” Opt. Express 30(9), 15266–15278 (2022). [CrossRef]

18. T. Shimano, Y. Nakamura, K. Tajima, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with Fresnel zone aperture: quasi-coherent coding,” Appl. Optics 57(11), 2841–2850 (2018). [CrossRef]

19. M. Sao, Y. Nakamura, K. Tajima, and T. Shimano, “Lensless close-up imaging with Fresnel zone aperture,” Jpn. J. Appl. Phys. 57(9S1), 09SB05 (2018). [CrossRef]

20. A. Hyvarinen and P. Dayan, “Estimation of non-normalized statistical models by score matching,” J. Mach. Learn. Res. 6(4), 695–709 (2005).

21. T. Nakamura, T. Watanabe, S. Igarashi, X. Chen, K. Tajima, K. Yamaguchi, and M. Yamaguchi, “Superresolved image reconstruction in FZA lensless camera by color-channel synthesis,” Opt. Express 28(26), 39137–39155 (2020). [CrossRef]

22. X. Chen, X. Pan, T. Nakamura, S. Takeyama, T. Shimano, K. Tajima, and M. Yamaguchi, “Wave-optics-based image synthesis for super resolution reconstruction of a FZA lensless camera,” Opt. Express 31(8), 12739–12755 (2023). [CrossRef]

23. J. Wu, L. Cao, and G. Barbastathis, “DNN-FZA camera: a deep learning approach toward broadband FZA lensless imaging,” Opt. Lett. 46(1), 130–133 (2021). [CrossRef]

24. Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” arXiv, arXiv:2203.01801 (2019). [CrossRef]

25. Y. Song, S. Garg, J. Shi, and S. Ermon, “Sliced score matching: A scalable approach to density and score estimation,” arXiv, arXiv:1905.07088 (2020). [CrossRef]

26. J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” arXiv, arXiv:1503.03585 (2015). [CrossRef]

27. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).

28. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” arXiv, arXiv:2011.13456 (2020). [CrossRef]

29. H. Rootzén and J. Olsson, “On the influence of the prior distribution in image reconstruction,” Comput. Stat. 21(3-4), 431–444 (2006). [CrossRef]

30. B. M. Hennelly, D. P. Kelly, N. Pandey, and D. S. Monaghan, “Review of twin reduction and twin removal techniques in holography,” Proc. CIICT 2009, 241–245 (2009).

31. L. Denis, C. Fournier, T. Fournel, and C. Ducottet, “Twin-image noise reduction by phase retrieval in in-line digital holography,” Proc. SPIE 5914, 59140J (2005). [CrossRef]

32. Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” arXiv, arXiv:2006.09011 (2020). [CrossRef]

33. B. D. Anderson, “Reverse-time diffusion equation models,” Stochastic Proc. Appl. 12(3), 313–326 (1982). [CrossRef]

34. P. E. Kloeden and E. Platen, “Stochastic differential equations,” Numer. Solution of Stochastic Differ. Equ. 23, 103–160 (1992). [CrossRef]

35. H. Narayanan and S. Mitter, “Sample complexity of testing the manifold hypothesis,” Proc. Adv. Neural Inf. Process. Syst. 23, 1–9 (2010).

36. G. Parisi, “Correlation functions and computer simulations,” Nucl. Phys. B 180(3), 378–384 (1981). [CrossRef]

37. C. Quan, J. Zhou, Y. Zhu, Y. Chen, S. Wang, D. Liang, and Q. Liu, “Homotopic gradients of generative density priors for MR image reconstruction,” IEEE Trans. Med. Imaging 40(12), 3265–3278 (2021). [CrossRef]

38. T. Hofmann, B. Scholkopf, and A. J. Smola, “Kernel methods in machine learning,” Ann. Statist. 36(3), 1171–1220 (2008). [CrossRef]

39. F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365 (2015).

40. C. Wu, H. Peng, Q. Liu, W. Wan, and Y. Wang, “Lens-less imaging via score-based generative model,” Opt. Precision Eng. 30(18), 2280 (2022). [CrossRef]

41. W. Wan, “Multi-phase FZA Lensless Imaging via Diffusion Model,” GitHub (2023) [accessed 11 Feb 2023], https://github.com/yqx7150/MLDM.

Method	BP	CS	LSGM	MLDM-I	MLDM-II
PSNR(dB)	7.43	15.35	24.36	28.32	28.87
SSIM	0.175	0.251	0.784	0.925	0.933

Method	BP	CS	LSGM	ADMM	MLDM-I	MLDM-II
Image 1	7.55/0.174	11.18/0.372	11.72/0.380	11.84/0.497	15.56/0.471	14.67/0.375
Image 2	7.98/0.137	11.31/0.240	10.39/0.182	9.59/0.456	14.89/0.376	14.15/0.328
Image 3	8.65/0.217	14.16/0.452	13.21/0.432	14.09/0.560	17.31/0.719	16.89/0.544
Image 4	11.17/0.313	23.10/0.892	22.12/0.721	20.92/0.717	24.35/0.904	23.94/0.861

Phase	0, π/2	0, π	π/2, 3π/2	0, π/2, π, 3π/2
Church 1	27.86/0.925	22.51/0.864	27.70/0.923	27.93/0.926
Church 2	30.61/0.946	24.19/0.913	30.58/0.945	30.50/0.947

Method	BP	CS	LSGM	MLDM-I	MLDM-II
PSNR(dB)	7.43	15.35	24.36	28.32	28.87
SSIM	0.175	0.251	0.784	0.925	0.933

Method	BP	CS	LSGM	ADMM	MLDM-I	MLDM-II
Image 1	7.55/0.174	11.18/0.372	11.72/0.380	11.84/0.497	15.56/0.471	14.67/0.375
Image 2	7.98/0.137	11.31/0.240	10.39/0.182	9.59/0.456	14.89/0.376	14.15/0.328
Image 3	8.65/0.217	14.16/0.452	13.21/0.432	14.09/0.560	17.31/0.719	16.89/0.544
Image 4	11.17/0.313	23.10/0.892	22.12/0.721	20.92/0.717	24.35/0.904	23.94/0.861

Multi-phase FZA lensless imaging via diffusion model

Abstract

1. Introduction

2. Preliminary

2.1 Lensless imaging principle

2.2 Score-based SDE

3. Proposed MLDM model

3.1 MLDM-I: reconstruction in RGB object

3.2 MLDM-II: reconstruction in synthetic object

4. Experiments

4.1 Data specification

4.2 Model training and parameter selection

4.3 Quantitative indices

4.4 Simulation results

4.5 Experimental validation

5. Discussion

5.1 Effect of the number of FZA phases on the reconstruction results

5.2 Resolution analysis

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (4)

Equations (33)

Optics Express

Method	BP	CS	LSGM	MLDM-I	MLDM-II
PSNR(dB)	7.40	17.68	24.46	29.42	29.91
SSIM	0.182	0.268	0.711	0.933	0.941