Wavefront coding image reconstruction via physical prior and frequency attention

Qinghan Zhang; Meng Bao; Liujie Sun; Yourong Liu; Jihong Zheng

doi:10.1364/OE.503026

1. Introduction

Large-field fluorescence microscopic imaging can be used for imaging, detection, and analysis of digital polymerase chain reaction (dPCR) and biological images [1–4]. However, the large field of view is difficult to combine with a large depth of field, limiting imaging application.

Wavefront coding (WFC) is a depth-of-field (DOF) extension technique that combines optical coding and digital decoding [5]. Compared with other extended methods, such as reduced aperture, central obscuration, or apodization [6,7], this method does not affect the throughput and resolution, as we as suppress phase differences, color aberrations and reduces the errors associated with the assembly [8]. However, it requires subsequent wavefront decoding to sharpen and denoise the encoded image [9].

For an optical system, optical coding inserts a phase mask at the pupil plan to ensure the PSF and the modulation transfer function (MTF) are consistent over the extended bokeh range [10]. The wavefront coding was first proposed by W. Thomas Cather and Edward Dowski, who constructed the classic cubic phase mask. Subsequently, various functional forms of phase masks have been proposed, such as logarithmic phase masks, cosine phase masks, polynomial phase masks, and higher-order phase masks. Recent studies designed multilevel phase masks based on diffraction theory [11,12]. Some other studies used the computational imaging-based technique to design a phase mask jointly [13]. The cubic phase mask has a good depth of field expansion effect, and the structure is simple with a strong processing practicality, making it easy to analyze the theory.

Traditional image reconstruction methods applied for wavefront decoding consist of the Wiener filtering and Lucy Richardson algorithm (LR). Both are difficult to handle noise well, usually generating artifacts and blurring [14–17]. Deep learning has been employed in image reconstruction applications, e.g., super-resolution [18], motion deblurring [19], image denoising [20], image inpainting [21], and depth estimation [22].

Recent studies have shown that deep learning-based models utilized for WFC decoding, which uses Resnet [23] or U-net [24] as the baseline, improve image reconstruction quality [9,25,26]. However, these models have some common problems, which do not make good use of physical prior information. And the constructed models are insufficient for extracting and fusing details and semantic features, thus not handling noise and ambiguity well enough.

For these problems, we apply the physical prior information and frequency domain module to the WFC decoding, proposing a reconstruction method. Specifically, inspired by the Transformer, we first build a generative adversarial network. Then, the PSF attention layer is proposed, end-to-end, explicitly learning the deviation between in-focus PSF and out-of-focus PSF at each layer scale. We construct a fusion module that adaptively determines which learned PSF information should be retained and incorporates it into the image's high and low frequency, mitigating the artifacts. Finally, a frequency domain self-attention block is constructed to separate and label the high-frequency, low-frequency, and PSF feature information in the frequency domain, further reconstructing the image feature information. We design a cubic phase mask with the genetic algorithm, which extends the DOF by 20 times to ±200$\mu m$, and place it on the pupil plane of the microscopy system for generating the encoded images. The experimental results show that our method reduces noise and artifacts, causing more reasonable textures and lines of encoded images.

This work constructs a frequency-domain attention model that utilizes PSF information, which is not a part of the image feature information, effectively reconstructing the encoded image. For the large-field fluorescence microscopy systems, we provide a deep learning method that improves reconstruction image quality while considering the large depth of field, which has good potential to apply in studies where PSF can be used for computational imaging.

2. Method

2.1 Phase mask design and analysis

To illustrate the validity and generalizability of the method, we use the Genetic algorithm to design a cubic phase mask. We place it on the pupil plane of our large field-of-view fluorescence microscopy system with a f-number of 3.066 [1]. As shown on the left of Fig. 1, the system consists of an infinitely far microscope objective, an infinitely far imaging lens, and a phase mask (indicators of the microscopic system are provided in Appendix A). According to the focal depth formula, the depth of focus of the conventional large field of view fluorescence microscopy system is $4\lambda {F^2} \approx 21\mu m$, which is the DOF of the image space [26].

Fig. 1. Diagram of WFC, which contains two steps: optical encoding and digital decoding.

Download Full Size | PDF

To quantify the performance of phase masks [5], we constructed an optimization function ${\textrm{S}_{\textrm{Ga}}}$ to compare the difference between the in-focus MTF and different out-of-focus MTFs as following:

(1)$$\left\{ \begin{array}{l} {\textrm{S}_{\textrm{Ga}}}\textrm{ = }\arg \min \frac{{\sum\limits_{i = 1}^N {\textrm{(MT}{\textrm{F}_{\textrm{out}}}\textrm{ - MT}{\textrm{F}_{\textrm{in}}})/N} }}{{\textrm{MT}{\textrm{F}_{\textrm{mean}}}}}\textrm{ = }\arg \min \frac{{\sum\limits_{i = 1}^N {\textrm{(MT}{\textrm{F}_{\textrm{out}}}\textrm{ - MT}{\textrm{F}_{\textrm{in}}})/N} }}{{\textrm{MT}{\textrm{F}_{\textrm{all}}}/N}}\\ \min (\textrm{MTF}) > 0.12 \end{array} \right.$$

where $\textrm{MT}{\textrm{F}_{\textrm{mean}}}$ is the mean value of MTF, $N$ is the number of samples. Let $\lambda (i)$ donates the wavelength, $FOV(j)$ denotes the field of view. $\psi$ is the out-of-focus distance, $\textrm{M(}\lambda ,FOV(i),\psi \textrm{)}$ is the matrix containing the out-of-focus MTF values. For our optical system, Eq. (1) could be written as:

(2)$$\left\{ \begin{array}{l} {\textrm{S}_{\textrm{Ga}}}\textrm{ = }\arg \min \sum\limits_{\lambda (i)} {\sum\limits_{FOV(j)} {\sum\limits_\psi {\sqrt {{{\left( {\frac{{\textrm{M(}\lambda (i),FOV(j),\psi \textrm{) - M(}\lambda (i),FOV(j),0\textrm{)}}}{{\textrm{M(}\lambda (i),FOV(j),\psi \textrm{)}}}} \right)}^2}} } } } \\ \min (M(\lambda (i),FOV(j),\psi )) > 0.12 \end{array} \right.$$

where $\min (M(\lambda (i),FOV(j),\psi )) > 0.12$ is a minimum constraint function, which must be higher than 0.1 to ensure the optical system can be used after placing the phase mask [26].

Genetic algorithm (GA) is an optimization method that simulates evolution, finding the relative optimal parameter for a determined range. Through the digital interface, we realized the dynamic interaction between the parameters of the Genetic algorithm (GA) and the values of Zemax OpticStudio, shown in Fig. 2. The first step in the optimization process is to generate an initial population with the random parameters generated within a reasonable range. Zemax OpticStudio receives the optimization parameters returned by the genetic algorithm (GA) and calculates the MTF values for different wavelengths, fields of view, and out-of-focus. Then, use the MTF values to evaluate the individuals in the population and select the excellent individuals (parameters) for the next iteration. We also use the parameter crossover and mutation to generate new individuals (parameters) to prevent falling into an optimal solution. Finally, after several iterations, the value and corresponding parameters can be output.

Fig. 2. Process of dynamic optimization of phase mask parameters by genetic algorithm.

Download Full Size | PDF

The individual is evaluated by calculating the evaluation function (fitness function), which utilizes Eq. (2) to calculate the difference between the in-focus MTF and out-of-focus MTF. If the minimum value of MTF is lower than 0.12, the fitness function returned NaN (not a number), and the optimized parameters were invalid.

For the cubic phase mask, whose surface formula is $\alpha ({x^3} + {y^3})$, the optimization results of the value $\alpha$ is $2.698 \times {10^{ - 6}}$, and the consistency parameter calculated is 0.355. During optimization, the four fields of view of the optical system are 0°, 3.3°, 4.6°, and 6.6°, and the Nyquist frequency is 80 lp/mm [1]. The phase mask extends the DOF to ±200 $\mu m$ (Simulated MTF curves of the phase mask are provided in Appendix A).

After the optical system, placed with the phase mask, makes the WFC large field fluorescence microscope system, we describe the imaging process simply:

(3)$${I_{\textrm{sensor}}} = {I_{\textrm{origin}}} \otimes \textrm{PS}{\textrm{F}_{\textrm{out}}} + n$$

where ${I_{\textrm{origin}}}$ is the origin image, $\textrm{PS}{\textrm{F}_{\textrm{out}}}$ is the out-of-focus PSF, ${I_{\textrm{sensor}}}$ is the image received by the detector, $n$ is noise. Then, the image reconstruction problem can be written as an optimization formula:

(4)$${\textrm{S}_I}\textrm{ = arg}\mathop {\textrm{min}}\limits_{\textrm{s} \ge 0} \left. {\left\{ {\frac{1}{2}||{{I_{\textrm{sensor}}} - {I_{\textrm{origin}}} \otimes \textrm{PS}{\textrm{F}_{\textrm{out}}}} ||_2^2} \right. + \lambda R(n)} \right\}$$

where ${\textrm{S}_I}$ represents the error between real imaging and ideal imaging. R(n) is a regularization function with tuning parameter λ, which ensure the method is valid on the test set. Because the out-of-focus PSF is unknown, the image is usually decoded by in-focus PSF, which brings some blurred and artifacts in the reconstructed images, requiring digital decoding for sharpening.

2.2 Analysis of reconstruction method

Traditional decoding methods consist of the Wiener filtering and Lucy Richardson algorithm. The former aims to construct a linear recovery filter to minimize the mean square error (MSE) between the generated and origin images. This method makes it hard to handle the noise while ensuring the clarity of the reconstruction image, causing the blur when the noise is suppressed. The latter is an iterative nonlinear recovery algorithm, and the reconstruction results of this method are affected by the noise in the image. We can get reasonable results with negligible noise, but the processed images may further amplify the noise as the noise increases. For the decoding of WFC, the reconstructing effect of these two methods is relatively limited as the out-of-focus distance increases.

In contrast, the deep learning-based reconstruction method has better recovery quality. Du et al. first attempt to apply convolutional neural networks (CNN) in WFC decoding [9]. This method is more effective for solving fuzzy problem. Jin et al. propose DeepDOF, which uses the U-net [24] to reconstruct the encoded images [25]. And Li et al. use the Resnet [23] as the baseline to build a generative model to restore more visible results [26].

However, the existing deep learning-based WFC decoding methods do not utilize physical prior information well, which has been applied in traditional methods and some studies [27,28]. For further analysis of the imaging process, let $\Delta \textrm{PSF}$ denotes the deviation between in-focus PSF $\textrm{PS}{\textrm{F}_{\textrm{on}}}$ and out-of-focus PSF $\textrm{PS}{\textrm{F}_{\textrm{out}}}$:

(5)$$\textrm{PS}{\textrm{F}_{\textrm{out}}}\textrm{ = PS}{\textrm{F}_{\textrm{on}}} + \Delta \textrm{PSF}$$

Then, Eq. (4) can be rewritten as:

(6)$${\textrm{S}_I}\textrm{ = arg}\mathop {\textrm{min}}\limits_{\textrm{s} \ge 0} \left. {\left\{ {\frac{1}{2}||{{I_{orgin}} \otimes \Delta PSF} ||_2^2} \right. + \lambda R(n)} \right\}$$

As seen in Eq. (6), the decoding part needs to solve the problem of generating mismatched images caused by the existence of $\Delta \textrm{PSF}$ and the noise in the imaging process.

Unlike other prior information, e.g., detailed and semantic prior information, PSF information affects imaging but is not a part of the image feature information. Therefore, we propose three modules for end-to-end learning of the PSF feature information from $\Delta \textrm{PSF}$ and use this information fusion and reconstruction of image feature information, respectively.

2.3 PSF attention layer

Why emphasize the physical prior information in WFC decoding? Deep learning methods do not need to infer hidden variables but through data fitting, so it is used in many applications. However, those approaches do not construct a theoretical model, lacking the constraints. When face with more complex problems, those methods have difficulties in generating satisfactory results. The physical prior information guides the construction of deep learning-based models through imaging theory, which can effectively improve the quality of model generation results.

The PSF information is sparse and not a part of the image feature information. The model network cannot process both PSF information and image feature information. Therefore, it is necessary to construct the attention layer to explicitly end-to-end learn the deviation $\Delta \textrm{PSF}$. At the same time, it is difficult for CNN to model the long-term relationship, and the PSF feature information is hard to propagate to the deep network layers. It is also need to retain the PSF feature information in each layer and pass it to the image features of the network.

The specific construction of the PSF attention block is shown in Fig. 3(a). In this layer, let P denotes the learned PSF feature information:

(7)$$P = L(L(\textrm{PCON}{\textrm{V}_{4 \times 4}}))$$

where $\textrm{PCON}{\textrm{V}_{4 \times 4}}$ indicates the 4 × 4 convolution processing, $L$ denotes the linear layer. Two linear layers amplify the number of channels and map back to the original, which can further extract features. After each feature layer, use the activation function to obtain the updated PSF feature information ${P_{\textrm{upd}}}$:

(8)$${P_{\textrm{upd}}} = {({\textrm{Re}} \textrm{LU(P)})^\theta }$$

where the hyper-parameter $\theta \textrm{ = }0.9$[21]. The PSF feature information propagate to the image feature is expressed as ${P_F} = {P_{\textrm{gA}}}(F)$. ${P_{\textrm{gA}}}()$ is the Gaussian activation function, presented as:

(9)$${P_{\textrm{gA}}}\textrm{(}P\textrm{) = }\left\{ \begin{array}{l} \alpha \,\exp [ - ({(P)^2} - \mu )/{\sigma^2}]\;\;\;\;\textrm{ }fP \ge \mu \\ \textrm{1 + }( \alpha \textrm{ - 1}) \exp [ - ({(P)^2} - \mu )/{\sigma^2}]\textrm{ }else \end{array} \right.$$

where the learnable parameters are set as $\alpha = 1.1$, $\mu \textrm{ = }1.2$, $\sigma = 1$. The Gaussian activation function ensures that PSF sparse information effectively propagates to the image feature information.

Fig. 3. Left is the overview of our proposed decoding model. The model includes the encoder, decoder, and a bottleneck layer. Right is the details of the modules. The gray squares represent the reconstructed baseline (details are provided in Appendix A). (a) PSF Attention layer (PAB). (b) Multi-feature fusion block (MFFB). (c) Frequency domain self-attention block (FDSAB).

Download Full Size | PDF

2.4 Multi-feature fusion block

It needs to fuse the extracting PSF feature information into image features. The PSF feature information affects the generation of image details and semantic information, but if directly propagated to the image features, it is not conducive to the fusion of information.

To adaptively determine which information should be retained and reasonably integrated into the image feature information, we construct a multi-feature fusion block in Fig. 3(b). In the shallow and deep layers of the model network, the fusion block focuses on the details and semantic information of the image, respectively. We use the $\textrm{D}{\textrm{W}_{5 \times 5}}$ to learn which information to retain:

(10)$${P_{\textrm{EF}}} = \overline w L(\textrm{LN}(\textrm{D}{\textrm{W}_{5 \times 5}}({P_F})))$$

(11)$${F_{\textrm{EF}}} = \overline w L(\textrm{LN}(\textrm{D}{\textrm{W}_{5 \times 5}}({F_F})))$$

where $\overline w $ is a learnable quantization matrix, which is used for weighting determines how the PSF information is integrated into the image details and semantic feature information. $\overline {({\cdot} )} $ denotes the conjugate transpose operation. Then, output the results of the module:

(12)$${R_{out}} = L(G({P_{EF}} + {F_{EF}}))$$

where L is the linear layer. In Eq. (10), (11), and (12), it completes the extraction and reconstruction of information to make the PSF feature information which can be incorporated into the image features more effectively.

2.5 Frequency domain self-attention block

After fusing with PSF information, we construct the frequency domain self-attention block to further segmentation and reconstruction of image information. The block separates the high-frequency and low-frequency information, facilitating meaningful information generation after converting to the frequency domain. Use the self-attention mechanism to label the information, including high-frequency detail information, low-frequency semantic information, and fused PSF feature information. And then complete the reconstruction of that information, further mitigating noise, artifacts, and blur generation. In addition, the construction method similar to the Transformer [29] further improves the generation of semantic information, thus improving the rationality of the generation result.

As shown in Fig. 3(c), we use the Convolution $\textrm{Con}{\textrm{v}_{1 \times 1}}$ and three Depthwise Convolution $\textrm{D}{\textrm{W}_{7 \times 7}}$ to extract the image feature information from the image features ${F_F}$, obtain feature map ${F_q}$, ${F_k}$ and ${F_v}$:

(13)$${F_{q,k,v}} = \textrm{D}{\textrm{W}_{7 \times 7}}(\textrm{Con}{\textrm{v}_{1 \times 1}}({F_F}))$$

Then, using the Fast Fourier Transform (FFT), we calculate the similarity score of ${F_q}$ and ${F_k}$ in the frequency domain:

(14)$${F_{cor}} = {{\cal F}^{ - 1}}({\cal F}({F_q})\overline {{\cal F}({F_k})} )$$

where ${{\cal F}^{ - 1}}$ is inverse FFT. The similarity scores ${F_{cor}}$ is used to reconstruct image feature map $\textrm{ }{F_v}$:

(15)$${F_{re}} = \textrm{LN(}{F_{cor}}\textrm{) }{F_v}$$

where LN is the layer normalization. Finally, we build a short-skip connection that uses the image feature map ${F_F}$:

(16)$${F_{\textrm{FEA}}} = {F_F} + L(G(L({F_{re}})))$$

where ${F_{\textrm{FEA}}}$ denotes the output feature of the Frequency domain self-attention block, L is the linear layer, $G$ is the GELU activation function.

2.6 Discriminator and loss function of decoding model

As shown in Fig. 1, our model also contains a discriminator, which is built by PatchGAN [30]. This discriminator, a fully convolutional GAN, effectively determines whether the 70 × 70 patches area is true or not. Our loss function defined as:

(17)$${L_G} = {\alpha _a}{L_{\textrm{adv}}} + {\alpha _p}{L_{\textrm{perc}}} + {\alpha _s}{L_{\textrm{style}}} + {\alpha _{l1}}{L_{l1}}$$

where ${L_{\textrm{adv}}}$ is the adversarial loss [31], ${L_{\textrm{perc}}}$ is the perceptual loss [32], ${L_{\textrm{style}}}$ is the style loss [33], ${L_{l1}}$ is the l₁ loss [34]. We set the hyper-parameter ${\alpha _a}\textrm{ = }0.1$, ${\alpha _p} = 1$, ${\alpha _s} = 250$, ${\alpha _{l1}} = 1$ [21].

3. Experiments and analysis

3.1 Decoding experiment setup

The original image dataset consists of DIV2K [35], Flickr2K [36], and some of our produced data. DIV2K and Flickr2K are high-contrast images containing people, animals, landscapes, and our own data as line or shape images. To simulate real imaging, we acquire the PSF at different out-of-focus distances within ±200 $\mu m$, slightly panned or rotated from -2 mm to 2 mm and rotated from -5 degrees to 5 degrees, to generate the dataset [26]. After randomly adding Gaussian white noise from 0 to 20 dB, the dataset was divided into the training and test sets with 57600 and 2400 images, respectively.

To illustrate the effectiveness of the image reconstruction model in this paper, we compare our method with several others methods applied to the WFC decoding: Wiener filtering, Lucy Richardson algorithm (LR), and DeblurGAN. The loss function of DeblurGAN includes Adversarial loss and L1 loss, and the image size of the experiment is 256 × 256 pixels. To provide an equal term, we add the style loss and perceptual loss to DeblurGAN's loss function. The training and test images in this paper are resized to 256 × 256 pixels. For the experiments, the Adam optimizer is used for optimization with a learning rate of 0.0001 [21]. All reconstruction experiments are performed on a PC with a single NVIDIA 3090 GPU.

3.2 Quantitative and qualitative evaluations

The mean values of quantitative results of different out-of-focus distances (mainly the large out-of-focus) are summarized in Table 1. We adopt four metrics, peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), Mean Absolute Error (MAE), and correlation of coefficient (CORR), to evaluate the performance of four different methods.

Table 1. Quantitative comparison results

View Table | View all tables in this article

Among the four metrics, higher PSNR, SSIM, and CORR indicate that the method generates lower noise and blur, while lower MAE demonstrates that the method generates fewer artifacts. Compared with other methods, our method achieves the best in all four metrics, showing that the model reconstructs better results.

As shown in Fig. 4, we offer qualitative comparisons of the different methods (These qualitative comparisons shown in Fig. 4, Fig. 5, Fig. 6, and Fig. 7 are experienced at the max out-of-focus distance ±200 $\mu m$).

Fig. 4. Qualitative comparisons of decoding results.

Download Full Size | PDF

Fig. 5. The denoising results of several decoding methods.

Download Full Size | PDF

Fig. 6. Qualitative results of the ablation study.

Download Full Size | PDF

Fig. 7. Experiment performance of several decoding methods.

Download Full Size | PDF

Although the Wiener filtering reconstructs the general outline, the results have visible artifacts and noise due to the limited reconstruction effect. It can generate a certain quality of results. However, when the out-of-focus distances increase, it is difficult for this method to recover structural information and deal with noise.

The least-squares method reduces the generation of noise and artifacts. But this method is a linear estimation. When dealing with the complex problem of WFC reconstruction, the noise in the image limits the processing performance, and it is also easy to generate visible artifacts.

The result of DeblurGAN further reduces noise and enhances details, but there are still some artifacts and unreasonable textures. This method does not utilize the feature information of PSF well and results in poor image consistency. It also affects the generation of high-frequency details, thus causing some blurring.

In contrast, the reconstruction results of our method are better, which also illustrates the importance of PSF feature information. PSF feature information affects the generation of images’ high-frequency detail information and low-frequency semantic information. We construct these models to complete the extraction of PSF feature information, fuse it into the image features, subsequent separation and re-normalization. It helps distinguish the foreground and background of the image, generating more meaningful textures and sharper line details, thus reducing image blur.

3.3 Noise sensitivity

In the real WFC imaging process, the noise level is easily affected by the environment. To further test the denoising results of several methods, we add 40 dB white noise to the data, and the reconstructed results are shown in Fig. 5.

Under high noise-level, the traditional method restores reasonable structure. However, these methods further intensify artifact generation, and the noise suppression effect is limited. Deep learning-based methods are more effective in noise removal. Without training with the high noise-level, DeblurGAN and our model have reduced noise and blur, generating more realistic results.

3.4 Ablation study

We conduct ablation studies for our method and analyze the effectiveness of the proposed three modules contributing to the final performance of WFC image construction. We gradually add modules to the Baseline (BL) until form the whole model. The three modules contain the PSF attention layer (PAL), Multi-feature fusion block (MFF), and Frequency domain self-attention block (FSA). The quantitative and qualitative results of the ablation study are shown in Table 2 and Fig. 6.

Table 2. Quantitative results of the ablation study

View Table | View all tables in this article

As seen from Table 2, the metrics of the constructed model are gradually increasing with the addition of each module. Among them, the MFF module has the effect of re-normalization, and the reconstruction results improved via adding this module to BL. However, after adding the PAL module, the model does not improve much, because the PSF information is not the information of the image itself, which needs the subsequent FSA to complete the information reorganization. The changes in the indicators also further illustrate the effectiveness of the FSA module.

As shown in Fig. 6, BL generates consistent results and effectively suppresses the noise when enlarging the image. However, the reconstruction results are fuzzy for the stamens in the upper part of Fig. 6 and the cable in the below. This is because the information overlaps in the encoded image so that details and semantic information cannot be well distinguished, making the high-frequency information difficult to generate. With the gradual introduction of the proposed modules, the foreground and background could be further distinguished, generating more meaningful texture and structure. It also helps the reconstruction of more realistic details, alleviating blur generation.

3.5 Performance on test target image

As shown in Fig. 7, We perform tests on usaf1951 and focused star maps to visualize the reconstruction performance of several methods for high-frequency details. The results show that the Wiener and LR hardly suppress the noise. They recover high-frequency details but also introduce certain artifacts and blurring. The generated lines of DeblurGAN have little curves and artifacts. In contrast, our method generates more consistent results.

4. Conclusion

In this paper, we proposed an effective reconstruction method using the physical prior information and frequency domain model in the wavefront decoding. We rebuild the baseline and propose a PSF attention layer, which explicitly learns PSF feature information, and uses a fusion module to integrate it into image details and semantic information. In addition, we introduce the frequency domain self-attention block that separates the PSF feature information, high-frequency and low-frequency of image, for further reconstruction. These models contribute to precisely separating image information, reducing blur, and increasing image sharpness. In the wavefront encoding, we designed a phase mask in a large field-of-view fluorescence microscope system, with F-number 3.066, to extend the DOF by 20 times to ±200 $\mu m$. Compared with the state-of-the-art WFC decoding methods, our approach improves PSNR by 9.38%, SSIM by 2.07%, CORR by 0.35%, and reduces MAE by 26.1%. A deep learning-based wavefront decoding model is proposed to reconstruct images for a large DOF of the large field-of-view system, with good potential in studies that could use PSF for computational imaging.

Future work could be focused on reducing the size of the large field-of-view fluorescence microscopy system to increase portability and, if able, scrutinize the phase mask itself to extend depth-of-field. For the particular kind of chips, we will research deep learning-based algorithms for super-resolution imaging by using the feature information of PSF and adaptive detection methods for complex environments.

Funding

National Key Research and Development Program of China (2018YFA0701802); Program of Shanghai Academic Research Leader (22XD1401000); National Natural Science Foundation of China (61975122).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

The publicly available part of the code and data underlying the results presented in this paper are available at [37].

Supplemental document

See Supplement 1 for supporting content.

References

1. J Shen, J Zheng, and Z Li, “A rapid nucleic acid concentration measurement system with large field of view for a droplet digital PCR microfluidic chip,” Lab Chip 21(19), 3742–3747 (2021). [CrossRef]

2. M Nie, M Zheng, and C Li, “Assembled step emulsification device for multiplex droplet digital polymerase chain reaction,” Anal. Chem. 91(3), 1779–1784 (2019). [CrossRef]

3. E Ortyn W, J Perry D, and V Venkatachalam, “Extended depth of field imaging for high speed cell analysis,” Cytometry 71A(4), 215–231 (2007). [CrossRef]

4. Z Li, X Xu, and D Wang, “Recent advancements in nucleic acid detection with microfluidic chip for molecular diagnostics,” TrAC, Trends Anal. Chem. 158, 116871 (2023). [CrossRef]

5. E. R. Dowski and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. 34(11), 1859–1866 (1995). [CrossRef]

6. J, Pieper R and A Korpel, “Image processing for extended depth of field,” Appl. Opt. 22(10), 1449–1453 (1983). [CrossRef]

7. J Ojeda-Castañeda, R Ramos, and A Noyola-Isgleas, “High focal depth by apodization and digital restoration,” Appl. Opt. 27(12), 2583–2586 (1988). [CrossRef]

8. R. N. Zahreddine, R. H. Cormack, and C. J. Cogswell, “Noise removal in extended depth of field microscope images through nonlinear signal processing,” Appl. Opt. 52(10), D1–D11 (2013). [CrossRef]

9. H Du, L Dong, and M Liu, “Image restoration based on deep convolutional network in wavefront coding imaging system,” Digital Image Computing: Techniques and Applications1–8 (IEEE, 2018).

10. S Bradburn, T Cathey W, and R Dowski E, “Realizations of focus invariance in optical–digital systems with wave-front coding,” Appl. Opt. 36(35), 9157–9166 (1997). [CrossRef]

11. S Banerji, M Meem, A Majumder, B Sensale-Rodriguez, and R Menon, “Extreme-depth-of-focus imaging with a flat lens,” Optica 7(3), 214–217 (2020). [CrossRef]

12. R M Rostami S, S Pinilla, I Shevkunov, V Katkovnik, and K Egiazarian, “Power-balanced hybrid optics boosted design for achromatic extended depth-of-field imaging via optimized mixed OTF,” Appl. Opt. 60(30), 9365–9378 (2021). [CrossRef]

13. S Elmalem, R Giryes, and E Marom, “Learned phase coded aperture for the benefit of depth of field extension,” Opt. Express 26(12), 15316–15331 (2018). [CrossRef]

14. F Yan, L Zheng, and X Zhang, “Image restoration of an off-axis three-mirror anastigmatic optical system with wavefront coding technology,” Opt. Eng. 47(1), 017006 (2008). [CrossRef]

15. N Boddeti V and V K V Kumar B, “Extended-depth-of-field iris recognition using unrestored wavefront-coded imagery,” IEEE Trans. Syst., Man, Cybern. A 40(3), 495–508 (2010). [CrossRef]

16. H Richardson W, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. Am. 62(1), 55–59 (1972). [CrossRef]

17. B Lucy L, “An iterative technique for the rectification of observed distributions,” Astron. J. 79, 745 (1974). [CrossRef]

18. C Ledig, L Theis, and F Huszár, “Photo-realistic single image super-resolution using a generative adversarial network,” Conference on Computer Vision and Pattern Recognition, 4681–4690 (IEEE, 2017).

19. O Kupyn, V Budzan, and M Mykhailych, “Deblurgan: Blind motion deblurring using conditional adversarial networks,” Conference on Computer Vision and Pattern Recognition, 8183–8192 (IEEE, 2018).

20. T Remez, O Litany, and R Giryes, “Deep class-aware image denoising,” International Conference on Sampling Theory and Applications138–142 (IEEE, 2017).

21. L Sun, Q Zhang, W Wang, et al., “Image inpainting with learnable edge-attention maps,” IEEE Access 9, 3816–3827 (2021). [CrossRef]

22. C Godard, O Mac Aodha, and J Brostow G, “Unsupervised monocular depth estimation with left-right consistency,” Conference on Computer Vision and Pattern Recognition, 270–279 (IEEE, 2017).

23. K He, X Zhang, and S Ren, “Deep residual learning for image recognition,” Conference on Computer Vision and Pattern Recognition, 770–778 (IEEE, 2016).

24. O Ronneberger, P Fischer, and T Brox, “U-net: Convolutional networks for biomedical image segmentation,” Medical Image Computing and Computer-Assisted Intervention, 18th International Conference, 234–241 (2015).

25. L Jin, Y Tang, and Y Wu, “Deep learning extended depth-of-field microscope for fast and slide-free histology,” Proc. Natl. Acad. Sci. 117(52), 33051–33060 (2020). [CrossRef]

26. Y Li, J Wang, and X Zhang, “Extended depth-of-field infrared imaging with deeply learned wavefront coding,” Opt. Express 30(22), 40018–40031 (2022). [CrossRef]

27. D Rego J, K Kulkarni, and S Jayasuriya, “Robust lensless image reconstruction via PSF estimation,” IEEE/CVF Winter Conference on Applications of Computer Vision403–412 (2021).

28. T, Zeng and Y Lam E, “Robust reconstruction with deep learning to handle model mismatch in lensless imaging,” IEEE Trans. Comput. Imaging 7, 1080–1092 (2021). [CrossRef]

29. S Roy, G Koehler, and C Ulrich, “MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation,” arXiv, arXiv:2303.09975 (2023). [CrossRef]

30. U, Demir and G Unal, “Patch-based image inpainting with generative adversarial networks,” arXiv, arXiv:1803.07422 (2018). [CrossRef]

31. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” arXiv, arXiv:1704.00028 (2017). [CrossRef]

32. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for realtime style transfer and super-resolution,” in European Conference on Computer Vision pp.694–711 (Springer, 2016).

33. L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Conference on Computer Vision and Pattern Recognition, pp.2414–2423 (IEEE, 2016).

34. X. Wang, K. Y. u, C. Dong, and C. C. Loy, “Recovering realistic texture in image super-resolution by deep spatial feature transform,” in Conference on Computer Vision and Pattern Recognition, pp.606–615 (IEEE, 2018).

35. E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study,” in Conference on Computer Vision and Pattern Recognition Workshops1122–1131 (IEEE, 2017).

36. E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results,” in Conference on Computer Vision and Pattern Recognition Workshops, 1110–1121 (IEEE, 2017).

37. Q. Zhang, M. Bao, L. Sun, Y. Liu, and J. Zheng, “LPPFDAB,” GitHub (2023), https://github.com/manlupanshan/LPPFDAB.

	PSNR	SSIM	MAE	CORR
Wiener	22.5877	0.8884	0.0737	0.9749
L-R	23.1321	0.8934	0.0677	0.9742
DeblurGAN*	27.7900	0.9533	0.0391	0.9895
Ours	30.3985	0.9730	0.0289	0.9930

	PSNR	SSIM	MAE	CORR
BL	28.0172	0.9561	0.0397	0.9909
BL + MFF	28.3368	0.9581	0.0382	0.9917
BL + MFF + PAL	28.4648	0.9597	0.0374	0.9920
BL + MFF + PAL + FSA(ALL)	30.4002	0.9735	0.0303	0.9936

	PSNR	SSIM	MAE	CORR
Wiener	22.5877	0.8884	0.0737	0.9749
L-R	23.1321	0.8934	0.0677	0.9742
DeblurGAN*	27.7900	0.9533	0.0391	0.9895
Ours	30.3985	0.9730	0.0289	0.9930

	PSNR	SSIM	MAE	CORR
BL	28.0172	0.9561	0.0397	0.9909
BL + MFF	28.3368	0.9581	0.0382	0.9917
BL + MFF + PAL	28.4648	0.9597	0.0374	0.9920
BL + MFF + PAL + FSA(ALL)	30.4002	0.9735	0.0303	0.9936

Wavefront coding image reconstruction via physical prior and frequency attention

Abstract

1. Introduction

2. Method

2.1 Phase mask design and analysis

2.2 Analysis of reconstruction method

2.3 PSF attention layer

2.4 Multi-feature fusion block

2.5 Frequency domain self-attention block

2.6 Discriminator and loss function of decoding model

3. Experiments and analysis

3.1 Decoding experiment setup

3.2 Quantitative and qualitative evaluations

3.3 Noise sensitivity

3.4 Ablation study

3.5 Performance on test target image

4. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (2)

Equations (17)

Optics Express