Alternative deep learning method for fast spatial-frequency shift imaging microscopy

Qianwei Zhang; Qianwei Zhang; Chenhui Liang; Chenhui Liang; Mingwei Tang; Mingwei Tang; Xiaoyu Yang; Muchun Lin; Yubing Han; Xu Liu; Xu Liu; Xu Liu; Qing Yang; Qing Yang; Qing Yang

doi:10.1364/OE.482062

1. Introduction

Spatial frequency shift (SFS) imaging microscopy, such as structured illumination microscopy (SIM) [1,2], Fourier ptychographic microscopy (FPM) [3,4], and tunable virtual-wave-vector spatial frequency shift (TVSFS) microscopy [5], can break the diffraction limit for the imaging of both fluorescently labeled and label-free samples. The high spatial-frequency information of the samples is shifted into the passband of the conventional microscopy using illumination with large lateral wave-vector. Then the super-resolution (SR) image is obtained by shifting the high spatial frequency spectrum back to the correct position in the Fourier frequency domain and performing an inverse Fourier transform on the extended spatial frequency spectrum. The drawback is that the reconstruction of SR images needs dozens of SFS images with over 25% [6] spectrum overlapping rate between the nearby spatial frequency spectrum to avoid artifacts or failed restoration [7]. Thus, the temporal resolution of SFS imaging will decrease dramatically with the increased spatial resolution, which limits the further application of SFS imaging for dynamic process research, such as tissue activity and drug action in living cells.

Thankfully, the development of deep learning has enabled fast and robust reconstruction in SFS imaging, such as denoising [8,9] and accelerating super-resolution [10–12] in SIM, improving the imaging speed [13–15] in FPM and others [16–20]. However, on the one hand, most of them are trained for certain kinds of samples, lacking generalization for both fluorescently labeled and label-free samples. On the other hand, most current architectures of deep learning super-resolution model mainly relied on leveraging the structural differences between the ground truth (GT) image and the raw low resolution (LR) images in the spatial domain, which was demonstrated to be hardly prominent. When resolving highly complex samples, more network parameters may be required for the models based on the spatial structure differences. For labeled and label-free datasets, differences in physical mechanisms may also lead to inconsistent model performance [21,22].

In this paper, we propose an alternative deep-learning method for fast SFS imaging named joint spatial-Fourier channel attention network (JSFCAN), which is designed to learn the general connection between the spatial domain and Fourier frequency domain from the complex samples. As a comparison, a typical spatial domain optimization network U-net with densely connected convolutional blocks [14] is trained to focus on the structural differences between LR images and GT images in the spatial domain. We demonstrate that JSFCAN can reduce the amounts of raw SFS images by removing the redundant high-frequency information in both fluorescently labeled and label-free imaging models. Compared with traditional iterative algorithms, similar resolution can be achieved using JSFCAN while the number of input raw SFS images is reduced to nearly 1/4 with dozens of times imaging speed improvement or even higher, depending on the total number of input images. We also demonstrate that JSFCAN shows greater superiority in the reconstruction of fine structures corresponding to high spatial frequency and raw SFS images with severe noise than U-net. Therefore, we believe JSFCAN can improve the temporal resolution of SFS imaging, enabling future applications in sub-cellular real-time imaging.

2. Methods

2.1 Data generation and processing

In this work, we generate raw input data by mimicking the label-free and fluorescently-labeled imaging process of SFS microscopy with using common image datasets [5,17]. A chip-based SFS imaging system compatible with both fluorescently labeled and label-free samples is displayed in Fig. 1(a), which is composed of a high-index SFS chip with series of diffractive gratings, an objective lens, a tube lens and a camera. Compared with conventional algorithms that require sufficient spectral overlap and omnidirectional SFS illumination, the number of raw SFS images for SR image reconstruction is reduced with the adoption of the deep-learning method and the prior knowledge of SFS mechanism, as shown in the right column of Fig. 1(b). The yellow circle at the center represents the spectrum of a wide-field (WF) image taken by conventional microscopy, while other colored circles represent spectrum of high-frequency information obtained using different SFS magnitude illumination. k_c is the cut-off frequency of the given objective’s passband, which is simulated with the ideal filtering function (CTF = NA/λ or label-free imaging or OTF = 2 NA/λ_em for labeled imaging, where λ is the illumination wavelength, NA is the numerical aperture and λ_em is the fluorescene emission wavelength). k_sn refers to different SFS magnitudes.

Fig. 1. (a) Schematic of the SFS super-resolution imaging setup. (b) The Fourier spectrum of conventional algorithm and JSFCAN for labeled and label-free imaging respectively, according to the SFS mechanism. k_c and k_sn identify the cutoff frequency and different SFS magnitude. (c) The architecture of JSFCAN used in this work. The raw image stack containing multiple SFS images and a WF image is used as the input for JSFCAN; the output of the network is a SR image of upscaling 2×. The loss function of the network is defined as a combination of (i) the spatial domain loss and (ii) the frequency domain loss.

Download Full Size | PDF

For label-free imaging, the GT images were generated by Matlab, which contains 1600 images (512 × 512 pixels) of different patterns with extensive structural complexity. For labeled imaging, we obtained the GT images from the public biological image dataset BioSR [22], which consists of experimental images of four biological objects with high-quality benchmarks and diverse structures. We acquired ∼50 GT images (1024 × 1024 pixels) from each type of specimen. All data were divided into training set (∼80%), validation set (∼10%), and test set (∼10%). The raw SFS images is simulated by using coherent model for label-free samples and incoherent model for labeled samples [5] according to the set basic parameters. The size of LR images for training was preprocessed to 128 × 128 pixels and the corresponding GT image size was 256 × 256 pixels by randomly cropping. Horizontal/vertical flipping and rotation transformation are applied to further enrich the data set, which eventually generated ∼20000 pairs of LR (128 × 128 pixels) and GT (256 × 256 pixels) images for the training set, and ∼2500 LR-GT pairs for the validation set. We also generate the noisy datasets by adding Gaussian noise to the raw LR images.

2.2 Deep learning architecture

Motivated by the excellent performance of deep Fourier channel attention network (DFCAN) [22] in reconstructing high-frequency information of object, we utilize the JSFCAN based on the SFS mechanism to recover SR images from the raw images stack containing LR images at different SFS magnitudes with different illumination orientations and a WF image. The SFS imaging microscopy can capture the precise spatial representations of all frequency components in Fourier frequency domain of diverse structures. Therefore, JSFCAN is designed to learn the Fourier spectrum coverage difference between the GT image and the raw SFS images by introducing the FCAB architecture, thereby extracting the general connection between the spatial domain and Fourier frequency domain from the complex samples, and then effectively reconstructing the predicted image. The JSFCAN network consists of three parts: a shallow feature representation, a high-frequency feature extraction, and an upscaling module, as shown in Fig. 1(c). A convolutional layer (64 × 3 × 3) with a Gaussian Error Linear Unit (GELU) activation function is adopted to extract the shallow feature representation of SFS images. Next, these feature maps flow into the module of high-frequency feature extraction, which consists of 4 residual groups. Each residual group contains 4 Fourier channel attention (FCA) blocks. The depth of the network needs to take the trade-off between reconstruction performance and training efficiency into account. In FCA block, feature maps first flow to two convolutions (Conv) layers (64 × 3 × 3) activated by GELU activation function for deep feature extraction. Then, a fast Fourier transforms (FFT) layer is introduced to calculate the Fourier spectrum of each feature map, and a convolution layer (64 × 3 × 3) with a Rectified Linear Unit (ReLU) activation function is added to obtain the frequency feature maps. Subsequently, these frequency feature maps are sent into the global average pooling layer and the gating mechanism module consisting of Conv (4 × 1 × 1)-ReLU-Conv (64 × 1 × 1)-sigmoid to fully capture interdependencies among the feature channels from the aggregated frequency information. To enable the FCA block focusing on learning high-frequency features, skip connections are applied in each residual group. Finally, the upscaling module starts with a conventional layer (256 × 3 × 3) with a GELU activation function, and a pixel shuffle layer is added to upsample the feature maps according to a desired upscaling factor, which is usually set to 2 in this work. Then, the SR image is generated by a convolution layer (1 × 3 × 3) with the sigmoid function.

2.3 Loss function

In particular, the loss function of our deep learning model is composed of spatial domain loss and Fourier frequency domain loss:

(1)$$L = {L_{spatial}}\textrm{ + }\alpha \times {L_{frequency}}$$

where α is a hyper-parameter used to balance the contribution of spatial domain loss and frequency domain loss. Spatial domain loss L_spatial can measure the difference between the predicted image and the GT image, and the difference in the spatial frequency content is represented as Fourier frequency domain loss L_frequency. The fixed loss function can help the network learn the accurate features from the connection of the spatial domain and the Fourier frequency domain. Moreover, some research also denotes that this combination can refine the reconstructed images by avoiding the significant deviation of high-frequency information and suppressing artifacts in Fourier frequency domain and spatial domain [23,24]. Both the spatial domain loss and frequency domain loss are defined as a combination of mean squared error loss (also called L₂) and structural similarity index (SSIM) loss, which can be expressed as:

(2)$${L_{spatial}}\textrm{ } = {L_2} + {\beta _1} \times (1 - SSIM)$$

(3)$${L_{frequency\textrm{ }}} = {L_2} + {\beta _2} \times (1 - SSIM)$$

(4)$${L_2}(Y,Y^{\prime}) = \frac{1}{{w \times h}}\mathop \sum \limits_{i = 1}^{w \times h} {({{Y_i}^{\prime} - {Y_i}} )^2}$$

(5)$$SSIM(Y,Y^{\prime}) = \frac{{(2{\mu _Y}{\mu _{Y^{\prime}}} + {c_1})(2{\sigma _{YY^{\prime}}} + {c_2})}}{{(\mu _{_Y}^2 + \mu _{_{Y^{\prime}}}^2 + {c_1})(\sigma _{_Y}^2 + \sigma _{_{Y^{\prime}}}^2 + {c_2})}}$$

where the hyper-parameters β₁ and β₂ decide the relative weights of L₂ and SSIM in the spatial domain loss and frequency domain loss, respectively. Here the common grid search strategy was used for hyperparameter optimization, and we choose α = 10 for the label-free dataset, α = 5 for the labeled dataset, and β₁ = β₂ = 0.1. Y′ is defined as the output of this network and Y as the corresponding GT, where i is the index of the image. w and h are the size of the image. µ_Y, µ_Y′ are the average grayscale of Y and Y′. σ_Y, σ_Y′ are the standard deviation of Y and Y′. σ_YY′ is the covariance of Y and Y′, and c₁, c₂ are constants to avoid the denominator to be zero in the above equation.

2.4 Training

During the training stage, the networks were initialized randomly and trained for about 350,000 mini-batch iterations, using the Adam optimizer with a typical initial learning rate of 1 × 10⁻⁴ and a batch size of four. U-Net is a popular deep learning architecture based on feature structure differences in spatial domain and has been extensively used in SIM and FPM reconstruction. Therefore, we compared the JSFCAN model with the U-net model based on densely connected convolutional blocks, which has prominent performance in learning the desired features for FPM imaging. For U-net, the model was trained for 100 epochs with the Adam optimizer and an initial learning rate of 1 × 10⁻⁴. The batch size was 4 due to the limitation of GPU memory. Once the model was trained, the best model was selected based on the multi-scale structural similarity (MS-SSIM) metric calculated using the validation set, which can fast reconstruct a high-quality SR image from a few raw LR images. The programs were performed on a computer workstation equipped with 64 GB memory, an AMD Ryzen Thread-ripper 3960X 24-Core Processor @ 3.79 GHz, and a NVIDIA GeForce GTX 1080Ti GPU (11 GB). In addition, the training and testing of the networks were running with python based on the Tensor-flow framework, while the image processing and assessment metrics calculations were implemented on MATLAB 2020b.

3. Results

3.1 JSFCAN for label-free SFS imaging

We first verify the feasibility of our JSFCAN network for label-free SFS imaging. The network is trained and tested with simulated simple samples such as dots, lines and curves of random sizes and distributions and complex samples like some gray animation patterns. As a comparison, we generate the WF images by multiplying the ideal filtering function and the Fourier spectrum of GT and performing inverse Fourier transform later. The illumination wavelength λ is set to 660 nm and the objective’s NA is 0.85. The GS-full images and GS-missing images are reconstructed using the conventional Gerchberg-Saxton (GS) algorithm [25] with the full and incomplete spectrum, respectively, as displayed in Fig. 1(b). The SFS magnitudes are set as k_s1= 1.3k₀, k_s2= 2.3k₀, and k_s3= 3.0k₀, where k₀= NA/λ, both JSFCAN and U-net are trained with the input of 13 raw SFS images corresponding to the spectrum in Fig. 1(b). This selected spectrum can remove the redundant high-frequency information as much as possible but can fill in the spectrum loss after the centrosymmetric operation. According to the prior knowledge, the centrosymmetric position in the Fourier has the same effect on image reconstruction.

The output images are displayed in Fig. 2. As we can see, the fine structure is unobservable in WF images because only low spatial frequency information can be collected by the objective lens. For the full spectrum, the GS algorithm can reconstruct structures with low spatial frequency correctly, but ring-effect was introduced in structures with high spatial frequency because of the abrupt cut-off at high spatial frequency, as shown in the region of the green rectangle in the third row of Fig. 2(b). For incomplete spectrum, the reconstruction results of the GS algorithm exhibit inevitable artifacts and reconstruction errors, as shown in the GS-missing spectrum column. In contrast, deep learning methods can restore images with better quality using the same incomplete spectrum. Statistical comparison of wide-field, GS, U-net and JSFCAN in terms of PSNR and MS-SSIM over various samples are shown in Table 1. Compared to the WF image and GS-missing images, we can see that reconstruction results from U-net and JSFCAN methods based on incomplete spectrum input have a significantly higher value both in peak signal-to-noise ratio (PSNR) and MS-SSIM. Additionally, compared with U-net, JSFCAN has better performance in the fidelity, as indicated by the reconstruction results in Fig. 2. This demonstrates that JSFCAN can learn the connection between small structures and high spatial frequency from the SFS images more accurately, thus having better performance in the reconstruction of small structures.

Fig. 2. Comparison of label-free SR images reconstruction of simulated (a) Dots, (b) Lines, (c) Curves, and (d) eagle structure inferred by GS-full spectrum approach (third column), GS-missing spectrum approach (fourth column), U-net (fifth column) and DFCAN (sixth column). The WF images (first column) and GT images (second column) are shown for reference. Lower rows show the magnified images of the rectangle regions in the upper images. Scale bars: 2 µm, and 1 µm for all magnified regions.

Download Full Size | PDF

Table 1. Mean PSNR and MS-SSIM values for the methods calculated on label-free test images

View Table

Although we have demonstrated that JSFCAN can reconstruct SR images of label-free samples, the influence of input SFS images number on the reconstruction quality still needs to be further investigated. Thus, the eagle pattern and Chinese oracle bone script character “light” are constructed to test the performance of JSFCAN with different input SFS images number. The JSFCAN is trained only using gray animation patterns which have complex structures since we find that simple structures contribute little to learn the connection between spatial domain and Fourier domain. 25, 17, 13 and 10 raw SFS images with designed spectrum are used as the input of JSFCAN, respectively. The principle of designing spectrum is to fill in the extended spectrum with less original raw SFS images with the knowledge that the centrosymmetric position in the spectrum has the same effect on image reconstruction. The designed frequency spectrums of different inputs are shown in the left-bottom of Fig. 3(b-e, g-j), respectively, where the parameters of SFS magnitudes are set to k_s1= 1.3k₀, k_s2= 2.3k₀, and k_s3= 3.0k₀. It can be observed that the reconstruction quality of the image degraded slightly while the coverage of the Fourier spectrum decreased. That is because our JSFCAN focuses on learning the connection between the spatial domain and Fourier frequency domain rather than guessing unknown information. Therefore, taking the quality of SR reconstruction into account, 13 raw SFS images is a good choice for cases that need three SFS magnitudes.

Fig. 3. Reconstruction of label-free SFS images of the Chinese oracle bone script character of “light” and the eagle pattern based on the JSFCAN (simulated). (a, f) The GT images and WF images (bottom left) are shown for reference. (b-e) show the reconstructed images of the character of “light” with the number of raw input images of 25, 17, 13, and 10. (b-e,g-j) show the reconstructed images of “light” and the eagle pattern with the number of raw input images of 25, 17, 13 and 10, the corresponding Fourier spectrum is shown in the bottom left corners. The image quality metrics shown in brackets are the PSNR and the MS-SSIM, respectively. Scale bars: 1 µm. (k) Computation time for reconstruction of a single raw image stack of 10, 13, 17, 25, and 49 frames for JSFCAN, U-net and GS.

Download Full Size | PDF

The comparison of the reconstruction time of JSFCAN, U-net and GS algorithm with different input numbers of SFS images is displayed in Fig. 3(k). During the testing process, the size of all LR images are set to 256 × 256 pixels and the size of reconstructed SR images are 512 × 512 pixels. Testing were repeated independently for 50 samples. An excellent imaging quality is achieved in Fig. 4(a) by JSFCAN using 13 raw SFS images in 0.13s, while the traditional GS algorithm needs 49 raw SFS images and 43.60s, which saves nearly 3/4 acquisition time and 99.7% reconstruction time. Compared to the GS algorithm, the reconstruction speed of JSFCAN is improved by almost two orders of magnitude. Moreover, the speed advantage of JSFCAN over GS algorithm will be further highlighted with the increase of SFS magnitudes. The reconstruction speed of JSFCAN is also nearly 100 times faster than U-net with dense blocks, demonstrating that the effectiveness of JSFCAN with the limited parameters in learning the high spatial-frequency information for SFS imaging. Although the reconstruction speed of JSFCAN decreases a little with the increasing input of SFS images, it still meets the requirement of fast imaging.

Fig. 4. (a) Reconstructions of the eagle pattern without noise and with Gaussian noise based on different methods, and the image is reconstructed using 13 raw images. The Fourier spectrum of each reconstructed image is shown directly below. To illustrate the recovery capacity of each model in the Fourier spectrum, we mark two circles in the Fourier space, in which the yellow circle corresponds to the highest detectable order of frequency component with a radius of 1× NA·k0 limited by the numerical aperture of the objective, the green circle corresponds to the highest detectable order of frequency component with a radius of 4.5× NA·k0 reached by SFS imaging. Scale bar: 2 µm. (b) Intensity profiles of the region are indicated by the dashed line in (a).

Download Full Size | PDF

Noise is an inevitable problem in SFS imaging, so we generated the eagle pattern with Gaussian noise to check the anti-noise ability of the GS algorithm, U-net and JSFCAN. The two deep learning methods are both trained using the same complex gray animation patterns with no/added Gaussian noise and the input number of SFS images is set to 13 as shown in Fig. 1(b). As we can see in Fig. 4(a), the MS-SSIM of the images reconstructed by the traditional GS algorithm has dropped by more than half after adding noise, both for the full spectrum or the incomplete spectrum. In contrast, deep-learning methods show great noise resistance, and the imaging quality is even improved a bit. As shown in Fig. 4(b), a dashed line is selected to show the intensity change before and after adding noise. It can be observed that JSFCAN can reconstruct the high-frequency information more accurately than U-net, thus better output quality is obtained by JSFCAN, whether adding noise or not. JSFCAN shows great robustness in the reconstruction of SFS images with severe noise.

3.2 JSFCAN for fluorescently labeled SFS imaging

To further demonstrate the generalization of JSFCAN, the fluorescently labeled BioSR dataset is used to train the neural networks. The raw fluorescently labeled SFS images are generated according to the incoherent model in TVSFS. The full FOV image is cropped to produce GT image, and the WF image is obtained with excitation/emission wavelengths of 405 nm/421 nm through a 1.49-NA objective. The SIM image is reconstructed by the tunable SIM algorithm [26] used in TVSFS with 28 SFS images. The input number of SFS images of deep learning is reduced from 28 to 7 through lowering overlapping rate and using only one phase-shifted image. The spectrum of the input images of tunable SIM and deep learning methods are shown in the lower row of Fig. 1(b), where the magnitudes of the wave-vectors for SFS images acquisition are set to k_s1= 2.44k₀, k_s2= 4.34k₀, k_s3= 6.50k₀ for the traditional approach and k_s1= 3.48k₀, k_s2= 6.50k₀ for the deep learning approaches, respectively. The reconstruction results of clathrin-coated pits (CCPs), endoplasmic reticulum (ER), microtubules (MTs), and filamentous actin (F-actin) are shown in Fig. 5.

Fig. 5. Comparison of fluorescently-labeled SFS imaging of (a) CCPs, (b) ER, (c) MTs, and (d) F-actin from the BioSR dataset reconstructed by conventional SIM imaging algorithm (fourth column), U-net (fifth column) and JSFCAN (sixth column). The GT images and WF images are shown for reference. Insets in the top right corners of these figures show the corresponding Fourier spectra. The white dashed line in each image indicates the cross-sectional profile (bottom left). Scale bar: 2 µm, and 0.5 µm for magnified images.

Download Full Size | PDF

To better display the reconstruction quality of images, the top right corners and the bottom left corners of each recovered image in Fig. 5 shows the corresponding Fourier spectrum and the cross-sectional intensity profile, respectively. As expected, fine structures cannot be observed in the WF images because of the lack of high-frequency information. The tunable SIM also suffers from degradation in the resolution because part of the high-frequency information in the recovered spectrum is missing. In contrast, the deep learning method can obtain a higher resolution by using fewer SFS images and the designed spectrum, as shown in the last three columns in Fig. 5. However, since U-net only relies on diversity of spatial structure differences as a constraint, artifacts are easy to appear in the output because of stripe illumination, as shown in Fig. 5(b)–(d). The irregular bumps in the white intensity profiles and unexpected frequency content in the Fourier spectrum indicate the wrong information restored by the network of U-net. In contrast, by combining the frequency content difference, JSFCAN can precisely resolve smaller biological structures and has better resolution compared to U-net. The reconstruction speed of fluorescently labeled samples has been improved by dozens to hundreds of times, like label-free samples. A reconstructed image using JSFCAN only needs 0.035s, while the reconstruction time of U-net with dense blocks and tunable SIM algorithm are 3.45s and 7.09s, respectively. The size of LR images are set to 128 × 128 pixels and the size of reconstructed SR images are 256 × 256 pixels.

To further check the anti-noise ability of JSFCAN, Gaussian noise is added to the MTs cell datasets. As always, we selected a portion of the original image as the GT image, as shown in Fig. 6(a). The low signal-to-noise ratio WF image is generated with the same parameter above and added Gaussian noise later. The results of the conventional tunable SIM algorithm are seriously interfered by the noise and lots of artifacts are introduced. In contrast, the U-net and JSFAN are trained with 7 raw SFS images containing Gaussian noise. The intensity distribution of the region indicated by a dashed line is shown in Fig. 6(f). The reconstruction result of U-net has great improvement, but still exists an inevitable loss of resolution. Nevertheless, JSFCAN shows better resolution and robustness of fine structure than U-net as expected. The image reconstructed by JSFCAN has less artifacts and no honeycomb noise caused by the stripe illumination, as shown in the region of the white rectangle in Fig. 6(d)–(e).

Fig. 6. Reconstructions of MTs cell with Gaussian noise based on different methods. (a) A full-FOV GT image is shown for reference. (b-e) Cropped regions of the reconstruction output correspond to the area enclosed by the blue rectangle. (f) Intensity profiles of the region are indicated by the dashed line in each magnified image. Insets in the bottom left corners of these figures show the corresponding Fourier spectra. Scale bar: 2 µm, and 0.5 µm for magnified images.

Download Full Size | PDF

4. Conclusions

In conclusion, we proposed an alternative deep learning method named JSFCAN which learns the general connection between the spatial domain and Fourier frequency domain from the complex samples to improve the temporal resolution of SFS super-resolution imaging. Thanks to the prior knowledge that the symmetric frequency spectrum refers to the same structure, almost 3/4 of raw SFS images has been omitted by decreasing the overlapping rate and remove the redundant centrosymmetric spectrum. The reconstruction speed is improved by an order of two in magnitude, depending on the amount of input raw SFS images. The JSFCAN can achieve a better imaging quality for both label-free and fluorescently labeled models compared with U-net that focuses on leveraging the structural differences between the GT image and the raw LR images in the spatial domain and realize a similar resolution compared to traditional algorithms. Additionally, we demonstrate that JSFCAN has better anti-noise ability compared with other methods with Gaussian noise added to the raw SFS images. With a fast and robust super-resolution imaging ability, we anticipate JSFCAN-based SFS method will find wide applications in real-time living cell research. We hope this article can inspire more scientists to optimize super-resolution imaging from the perspective of Fourier domain.

Funding

National Natural Science Foundation of China (61735017, 61822510, 62020106002, 92250304, T2293751); National Key Research and Development Program of China (2021YFC2401403); Major Scientific Research Project of Zhejiang Laboratory (2019MC0AD02).

Disclosures

The authors declare no conflicts of interest.

Data availability

Codes in this paper are available through this link: “https://github.com/Teckzhang/JSFCAN-for-SFS-imaging”. Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. G. Gustafsson, “Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy,” J. Microsc. 198(2), 82–87 (2000). [CrossRef]

2. E. H. Rego, L. Shao, J. J. Macklin, L. Winoto, G. A. Johansson, N. Kamps-Hughes, M. W. Davidson, and M. G. Gustafsson, “Nonlinear structured-illumination microscopy with a photo-switchable protein reveals cellular structures at 50-nm resolution,” Proc. Natl. Acad. Sci. U. S. A. 109(3), E135–E143 (2012). [CrossRef]

3. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” Nat. Photonics 7(9), 739–745 (2013). [CrossRef]

4. L. Tian, X. Li, K. Ramchandran, and L. Waller, “Multiplexed coded illumination for fourier ptychography with an led array microscope,” Biomed. Opt. Express 5(7), 2376–2389 (2014). [CrossRef]

5. M. Tang, Y. Han, D. Ye, Q. Zhang, C. Pang, X. Liu, W. Shen, Y. Ma, C. F. Kaminski, X. Liu, and Q. Yang, “High-Refractive-Index Chip with Periodically Fine-Tuning Gratings for Tunable Virtual-Wavevector Spatial Frequency Shift Universal Super-Resolution Imaging,” Adv. Sci. 9(9), 2103835 (2022). [CrossRef]

6. M. Tang, X. Liu, Q. Yang, and X. Liu, Chip-based wide-field 3D nanoscopy through tunable spatial-frequency-shift effect, SPIE/COS Photonics Asia (SPIE, 2020), Vol. 11549.

7. X. Liu, M. Tang, C. Meng, C. Pang, C. Kuang, W. Chen, C. F. Kaminski, Q. Yang, and X. Liu, “Chip-compatible wide-field 3D nanoscopy through tunable spatial frequency shift effect,” Sci. China-Phys. Mech. Astron. 64(9), 294211 (2021). [CrossRef]

8. Z. H. Shah, M. Müller, T.-C. Wang, P. M. Scheidig, A. Schneider, M. Schüttpelz, T. Huser, and W. Schenck, “Deep-learning based denoising and reconstruction of super-resolution structured illumination microscopy images,” Photonics Res. 9(5), B168–B181 (2021). [CrossRef]

9. C. Qiao, D. Li, Y. Liu, S. Zhang, K. Liu, C. Liu, Y. Guo, T. Jiang, C. Fang, N. Li, Y. Zeng, K. He, X. Zhu, J. Lippincott-Schwartz, Q. Dai, and D. Li, “Rationalized deep learning super-resolution microscopy for sustained live imaging of rapid subcellular processes,” Nat. Biotechnol. 1–11 (2022).

10. L. Jin, B. Liu, F. Zhao, S. Hahn, B. Dong, R. Song, T. C. Elston, Y. Xu, and K. M. Hahn, “Deep learning enables structured illumination microscopy with low light levels and enhanced speed,” Nat. Commun. 11(1), 1934 (2020). [CrossRef]

11. C. Ling, C. Zhang, M. Wang, F. Meng, L. Du, and X. Yuan, “Fast structured illumination microscopy via deep learning,” Photonics Res. 8(8), 1350 (2020). [CrossRef]

12. W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nat. Biotechnol. 36(5), 460–468 (2018). [CrossRef]

13. Y. F. Cheng, M. Strachan, Z. Weiss, M. Deb, D. Carone, and V. Ganapati, “Illumination pattern design with deep learning for single-shot Fourier ptychographic microscopy,” Opt. Express 27(2), 644–656 (2019). [CrossRef]

14. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for Fourier ptychography microscopy,” Opt. Express 26(20), 26470 (2018). [CrossRef]

15. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Convolutional neural network for Fourier ptychography video reconstruction: learning temporal dynamics from spatial ensembles,” arXiv, arXiv:1805.00334 (2018).

16. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-storm: super-resolution single-molecule microscopy by deep learning,” Optica 5(4), 458–464 (2018). [CrossRef]

17. C. N. Christensen, E. N. Ward, M. Lu, P. Lio, and C. F. Kaminski, “ML-SIM: universal reconstruction of structured illumination microscopy images using transfer learning,” Biomed. Opt. Express 12(5), 2720–2733 (2021). [CrossRef]

18. J. Lim, A. Ayoub, and D. Psaltis, “Three-dimensional tomography of red blood cells using deep learning,” Adv. Photonics 2(02), 1 (2020). [CrossRef]

19. H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Gunaydin, L. A. Bentolila, C. Kural, and A. Ozcan, “Deep learning enables cross-modality super-resolution in fluorescence microscopy,” Nat. Methods 16(1), 103–110 (2019). [CrossRef]

20. Y. Zhang, X. Chen, B. Li, S. Jiang, T. Zhang, X. Zhang, X. Yuan, P. Qin, G. Zheng, and X. Ji, “Accelerated Phase Shifting for Structured Illumination Microscopy based on Deep Learning,” IEEE Trans. Comput. Imaging 7, 1–12 (2021). [CrossRef]

21. K. Xu, M. Qin, F. Sun, Y. Wang, Y.-K. Chen, and F. Ren, “Learning in the frequency domain,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 1740–1749.

22. C. Qiao, D. Li, Y. Guo, C. Liu, T. Jiang, Q. Dai, and D. Li, “Evaluation and development of deep neural networks for image super-resolution in optical microscopy,” Nat. Methods 18(2), 194–202 (2021). [CrossRef]

23. X. Li, J. Dong, B. Li, Y. Zhang, Y. Zhang, A. Veeraraghavan, and X. Ji, “Fast confocal microscopy imaging based on deep learning,” in 2020 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2020), pp. 1–12.

24. G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo, and D. Firmin, “Dagan: Deep de-aliasing generative adversarial networks for fast compressed sensing mri reconstruction,” IEEE Trans. Med. Imaging 37(6), 1310–1321 (2018). [CrossRef]

25. R. Gerchberg and W. Saxton, “A practical algorithm for the determination of the phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972).

26. R. Cao, Y. Chen, W. Liu, D. Zhu, C. Kuang, Y. Xu, and X. Liu, “Inverse matrix based phase estimation algorithm for structured illumination microscopy,” Biomed. Opt. Express 9(10), 5037–5051 (2018). [CrossRef]

	Dot		Line		Curve		Pattern
Methods	PSNR	MS-SSIM	PSNR	MS-SSIM	PSNR	MS-SSIM	PSNR	MS-SSIM
Wide Field	19.24	0.6726	11.05	0.5280	11.28	0.5002	15.32	0.8285
GS-full	35.88	0.9873	20.09	0.9124	21.71	0.9275	20.21	0.9551
GS-missing	24.11	0.8008	14.90	0.6740	15.84	0.6909	16.16	0.6769
U-net	31.10	0.9813	21.30	0.9655	20.65	0.9598	17.85	0.9234
JSFCAN (our work)	33.26	0.9886	22.22	0.9695	22.73	0.9756	18.13	0.9438

Alternative deep learning method for fast spatial-frequency shift imaging microscopy

Abstract

1. Introduction

2. Methods

2.1 Data generation and processing

2.2 Deep learning architecture

2.3 Loss function

2.4 Training

3. Results

3.1 JSFCAN for label-free SFS imaging

3.2 JSFCAN for fluorescently labeled SFS imaging

4. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Tables (1)

Equations (5)

Optics Express