Image reconstruction with a deep convolutional neural network in high-density super-resolution microscopy

Bowen Yao; Wen Li; Wenhui Pan; Zhigang Yang; Danni Chen; Jia Li; Junle Qu

doi:10.1364/OE.392358

1. Introduction

Single-molecule localization microscopy (SMLM), like stochastic optical reconstruction microscopy (STORM) [1] and fluorescence photoactivated localization microscopy (fPALM) [2], has been considered a promising technique to resolve biological structures at a typical scale of 20 nm resolution [3]. It first labels specific proteins or oligonucleotides with a fluorophore or an autofluorescent protein to light up those molecules of interest [4], and then constructs super-resolution image by localizing and merging the positions of molecules. Reconstruction algorithms are required to quickly detect and precisely locate molecules.

Although a variety of localization algorithms provide rich techniques for solving SMLM over the past decade [5–7], the following obstacles still limits its adoption for practical use in observing dynamic processes within live cells. First, thousands of frames of raw images are generally required to achieve high spatial resolution, resulting in a long acquisition time. To improve temporal resolution, one feasible way is to increase the density of activated fluorophores in each raw image such that fewer raw images are required. This poses a challenge to the algorithmic capability of resolving overlapped molecules. Second, since each frame of live-cell imaging will collect more molecules and higher noise, denoising is necessary to be incorporated into the reconstruction process. Despite its necessity, commonly-used pre-processing algorithms are prone to valid information losses and undesirable time consumption.

To address above issues, we propose an efficient reconstruction algorithm based on deep learning. Deep learning is a class of machine learning techniques, which has more complex ways of connecting layers and larger amount of computing power in training than previous networks. Another remarkable advantage is automatic feature extraction that deep learning has over conventional machine learning algorithms [8,9]. All these advances move machine learning closer to its original goal of artificial intelligence (AI). Deep convolutional neural networks (CNNs) have implemented successful applications to image processing in different research fields, including image classification [10], segmentation [11,12], computed tomography [13], optical microscopy [14] and other areas [15–18]. There has been research presenting deep learning used for localization microscopy (Deep-STORM) [19], and it shows significantly higher speed than existing approaches. Nevertheless, the method is not applied to live-cell imaging, particularly under the conditions of higher molecule densities and more noise.

This work aims to develop a reconstruction algorithm suited for live-cell STORM imaging, and the important contributions are twofold. The first one is that it improves temporal resolution of SMLM. We demonstrate a favorable algorithmic performance in a temporal resolution of 0.5 seconds, and the dynamic structures of live mitochondria can be observed with greater clarity. This is fulfilled by designing the network architecture elaborately and taking advantage of its enhanced ability to localize high-density data. More importantly, the proposed deep convolutional neural network affords algorithmic insights to how residual layer in architecture explore larger computing power from both depth and width. The residual layer is incorporated into the network to separate redundant information including noise from the image, achieving denoising without additional procedure. Accordingly, we term it deep residual learning STORM (DRL-STORM). The reconstructed result is not only robust against noise, but also preserves sharp structures.

2. Deep residual learning STORM

Just like other deep learning applications, the design of proper network architecture, training data and loss function are crucial for STORM. High-performing architecture, comprehensive training data and robust loss function all assist deep learning in further improvement. In this section, we describe how to apply deep learning method for the reconstruction of images with high-density fluorescent molecules. Our network architecture configuration and its components are first specified, and then the network is trained by taking into account the ratio ranges of FWHM of point spread function (PSF) to pixel size. Optimal parameters of the network can be found by optimizing the loss function such that the actual network’s prediction is very close to its ideal.

2.1 Architecture

Inspired by the previous works of DAOSTORM [20], ResNets [21] and Deep-STORM, we propose a network architecture based on deep convolutional neural networks. It can be divided into two parts connected by a residual layer, as shown in Fig. 1, and there are five types of blocks with different colors for distinction.

(1) Conv + BN + ReLU: Convolutional filters of size $3 \times 3$ are firstly used to generate feature maps, followed by batch normalization (BN) to alleviate the internal covariate shift and speed up training [22,23], and then element-wise rectified linear units (ReLU) are added for nonlinearity [24]. Mathematically, $\textrm{ReLU}({\cdot} )= \max ({ \cdot \textrm{ },\textrm{ }0} )$. The width of each block denotes the number of filters in the corresponding convolutional layer, which is also the number of features in the representation. It increases from 35 up to 563 and reduces back to 35 in our network.
(2) Conv + BN: This block is utilized to increase the depth of the network by consisting of convolutional filters of size $3 \times 3$ and batch normalization. Additionally, it is exclusive of ReLU to retain more information and improve the performance of the network, because ReLU is often used before down-sampling or up-sampling and induces sparsity in the activation output.
(3) Max Pooling: We insert max pooling between successive convolutional layers to reduce the spatial dimension of the feature representation and to help control overfitting [8]. Max pooling performs down-sampling operation by taking the maximal element in the corresponding $2 \times 2$ region and discarding redundant information.
(4) Up-sampling: With a filter of size $2 \times 2$, the elements of input feature map of up-sampling are replicated to the filter area to recover the spatial size of the data.
(5) Residual: Residual layer is designed to predict a residual image containing molecules which are not identified, by calculating the deviation between raw data and the convolution result of the output of the first part in the network and the Gaussian filter. Given the raw data x and the output feature map of the first part ${\boldsymbol{x}_{\boldsymbol{t}}}$, the output of Residual layer y is formulated as $(1)$$\boldsymbol{y} = \boldsymbol{x} - {\boldsymbol{x}_{\boldsymbol{t}}} \ast {\boldsymbol{g}_1},$$$ where ${\boldsymbol{g}_1}$ represents Gaussian kernel of size $3 \times 3$ and standard deviation of 1 pixel, and ${\ast} $ is convolution operator. Here we choose to pre-define ${\boldsymbol{g}_1}$ rather than let the network learn it because the reconstruction results of two ways are almost the same and pre-definition can decrease slightly the number of trainable parameters.

The whole network is composed of 14 blocks of Conv + BN + ReLU, 2 blocks of Conv + BN and 1 block of Residual. The first part of the architecture aims to identify redundant information including noise which is then separated from the original raw image by Residual block. This facilitates molecules localization in the second part of the network. The benefit of using two-part is not only introducing more parameters, but also explore larger computing power of deep learning from two aspects of depth and width, while single-part network cannot. It hardly keeps improving performance only by increasing parameter numbers. Although the single-part network is capable of achieving similar performance with the same number of parameters, it takes longer time. We take advantage of the architecture to enhance the capacity for denoising and high-density molecules localization, thereby improving temporal resolution.

Fig. 1. The architecture of the proposed DRL-STORM network.

Download Full Size | PDF

2.2 Training

As a critical factor, training data determines what we want the neural network to learn. To obtain the training examples, 20 frames of $128 \times 128$ pixels images with randomly distributed molecules are firstly generated by ImageJ [25] Thunderstorm plugin [26]. Next, 500 random $30 \times 30$ regions from each frame of simulated image are extracted, and then each region is up-sampled by a factor of 4. It is feasible for our network to select 8 as the up-sampling factor, which results in an improved performance. Table 1 summarizes the training time and inference time for each frame of $30 \times 30$ pixels image on two different up-sampling factors using Deep-STORM and DRL-STORM. For both up-sampling factors, our processing time is two times less than that of Deep-STORM; note, however, our network consists of ∼9.4M parameters, while Deep-STORM has ∼1.3M parameters. To balance the tradeoff between the performance and the computation time, we use 4 as the up-sampling factor. 70% of the up-sampled regions are set as training data and the others are validation data. In order to fairly compare the performance of methods, DRL-STORM and Deep-STORM are trained on the same data with the same epochs.

Table 1. Comparison of processing time with different up-sampling factors and methods.

View Table | View all tables in this article

For deep learning methods, in addition to the molecule density, the ratio of FWHM to the pixel size is another important parameter affecting its localizing ability to distinguish overlapped fluorophores. Larger ratio indicates more overlapped fluorophores and more blurred images, as illustrated in Fig. 2, although they have the same density. Figures 2(a) and 2(b) are the simulated images at FWHM of 300 nm and 500 nm, respectively, and both of them have a pixel size of 110 nm, while Fig. 2(c) has the same FWHM as that of Figs. 2(b) and 2(a) pixel size of 150 nm. It is evident that the overlapped fluorophores are more difficult to be resolved under the circumstance of larger FWHM and smaller pixel size. To achieve the best reconstruction results, the ratio of FWHM to pixel size for the training data should in principle be precisely matched to that of the experimental data. However, if we train the network for each different ratio value, it will be time consuming. In our work, after evaluating an approximate ratio of the experimental data, we incorporate it within the range for the generation of simulated training examples. As long as the ratio is in the range, there is no need to retrain the network even if both FWHM and the pixel size change. For example, assuming the ratio of FWHM to pixel size equals to 2, we set the ratio range as 0.91-4.37 for the simulated data generator, and then the capacity of the trained network for resolving more overlapped molecules and sparser molecules are both improved. Besides, since STORM data often has variable molecule densities in different regions of an image, we train a range of ratios rather than a fixed value which contributes to localizing higher density and lower density molecules. This also enables the network to be trained only one time, and the resulting weights and the trained network are appropriate to reconstruct super-resolution image for different ratios of FWHM to pixel size. Although this advantage is also true for CARE [18] and Deep-STORM, our method can achieve better performance under conditions of different ratios, as illustrated in next section.

Fig. 2. Simulated images at (a) FWHM of 300 nm and pixel size of 110 nm, (b) FWHM of 500 nm and pixel size of 110 nm, and (c) FWHM of 500 nm and pixel size of 150 nm.

Download Full Size | PDF

2.3 Loss function

Generally, the loss function consists of a data-fidelity term and one regularization term for specific purpose. For the former, this work measures the ${l_2}$ norm of the difference between the network’s prediction ${\tilde{{\boldsymbol {x}}}_i}$ and the desired ground-truth image ${{\boldsymbol x}_i}$ convolved with a Gaussian kernel ${\boldsymbol{g}_2}$ of size $7 \times 7$ and standard deviation of 1 pixel. Regularization is taken against overfitting which renders the network incapable of predicting unseen data [8]. As in Ref [19]., we minimize the ${l_1}$ norm of the network’s output ${\tilde{{\boldsymbol x}}_i}$ to control sparsity. It also has automatic feature selection and helps interpret the weights by driving many weights to zero while allowing a few to grow large [8]. Assuming that the number of images in the training set is N, we therefore have the loss function

(2)$${\cal L} = \frac{1}{N}\sum\limits_{i = 1}^N {||{{{\tilde{{\boldsymbol x}}}_i} \ast {{\boldsymbol g}_2} - {{\boldsymbol x}_i} \ast {{\boldsymbol g}_2}} ||_2^2 + {{||{{{\tilde{{\boldsymbol x}}}_i}} ||}_1}} .$$

Loss function reframes training convolutional neural networks as an optimization problem, and as a result, the optimal parameters such as weights can be found by iterative optimization algorithms [8]. During each epoch, the weights of network change in the direction that minimize the loss value of training data, and the validation loss value is generated by testing the validation data. We notice that more epochs are required when the training conditions become severer like higher molecule density, so the network is trained for 100 epochs on batches of 16 samples with Adam [27] implemented in TensorFlow [28]. 100 epochs are generally sufficient for different conditions. The network reaches the minimum validation loss value, and weights are saved and employed for the following STORM data reconstruction. Both training and testing are performed on a workstation equipped with 512 GB of memory, two Intel Xeon E5-2687W v4, 3.00GHz CPUs, and a NVidia Quadro P6000 GPU with 24 GB of video memory. The execution time depends on the device. In our experiments, DRL-STORM spends typically one hour in the whole training process.

3. Results

3.1 Model validation

We begin with evaluating the performance of our proposed DRL-STORM model using validation data. The reconstructed results are analyzed and compared to that solved by Deep-STORM method in term of loss value, which measures the difference between reconstructed image and desired image.

Figure 3(a) depicts the validation loss value for different molecule densities during 100 epochs with Deep-STORM and DRL-STORM, respectively. The ratio range of FWHM to pixel size is given as 4.37-6.10. Both methods exhibit that the validation loss values decrease with the increment of epoch for the molecule density range of 8 µ${\textrm{m}^{ - 2}}$ to 13 µ${\textrm{m}^{ - 2}}$, which signifies that both models inhibit the overfitting effectively. It is worth noting that our method can provide noticeably decreased validation loss value for densely distributed molecules. The minimum loss value using Deep-STORM increases from 3.5 to 5.7, illustrated by dashed lines, while DRL-STORM obtains smaller loss value of 2.5 to 5. The rationale behind this result is that deeper DRL-STORM model might be beneficial for high-density molecule localization accuracy. The validation loss value mainly accounts for the architecture design of DRL-STORM, notably the residual layer, which reduces the noise effectively. To validate the effectiveness of the residual layer, we also test the network without the residual layer, and both the loss value and reconstructed image are not as good as that of current architecture with the residual layer. Here we only show the comparison under the condition of density range from 8 µ${\textrm{m}^{ - 2}}$ to 13 µ${\textrm{m}^{ - 2}}$, which is generally high enough for real experimental STORM data. The results of applying our method to lower density data localization are presented later.

Fig. 3. Loss value comparison between Deep-STORM method (dashed lines) and the proposed DRL-STORM (solid lines) on validation data for (a) different molecule densities and (b) different ratio ranges of FWHM to pixel size.

Download Full Size | PDF

Another evaluation is conducted on validation data with variable ratios of FWHM to pixel size, and the resulting loss value profiles for different ratio ranges are plotted in Fig. 3(b). We fix the density at 13 µ${\textrm{m}^{ - 2}}$, and generate several ratio ranges according to commonly-used wavelength $\lambda$ (380 nm-780 nm), numerical aperture NA (1.3-1.5) and pixel size (60 nm-170 nm). The ratio is calculated according to the following formula

(3)$$ratio = \frac{{0.61 \times \lambda }}{{\textrm{NA}}} \cdot \frac{1}{{\textrm{pixel size}}},$$

and possible minimum ratio value and maximum ratio value are given by $\frac{{0.61 \times 380}}{{\textrm{1}\textrm{.5} \times \textrm{170}}}\textrm{ = }0.91$ and $\frac{{0.61 \times 780}}{{\textrm{1}\textrm{.3} \times \textrm{60}}}\textrm{ = }6.10$, respectively. We also select five different smaller ratio ranges to show the comparison. It can be seen that the validation loss values rise along with the increasing ratio, which means molecules are more difficult to be resolved, as analyzed above. For instance, the minimum validation loss value of the ratio range of 0.91-6.10 reaches 3.2 after 100 epochs by using our method, while the minimum loss value for the ratio range of 0.91-1.78 is about 1.5, as shown in solid lines. This result is related to the fact that the localization accuracy of more densely distributed molecules tends to be lower. For all validated ratio ranges, our method achieves better performance by reducing the validation loss value, such as a 25% reduction for the ratio range of 0.91-5.24.

To quantify the quality of the results, we compare the image in terms of normalized mean square error (NMSE), which is defined as

(4)$$\textrm{NMSE} = \frac{{||{\tilde{\boldsymbol x} - \boldsymbol{x}} ||_2^2}}{{||\boldsymbol{x} ||_2^2}} \times \textrm{100} \%.$$

Figure 4(a) displays the comparison of NMSE between Deep-STORM (dashed lines) and DRL-STORM (solid lines) for different density conditions under a varying background of 10, 50, 100, 150, 200 photons per pixel. The intensity is fixed as 1000 photons per molecule, and 3000 frames of images are used to obtain average measurements. From a low background of 10 photons per pixel to a very high background of 200 photons per pixel, our method all exhibits smaller NMSE than that of Deep-STORM. DRL-STORM is also superior to Deep-STORM at high molecule density of 13 µ${\textrm{m}^{ - 2}}$ and high background of 200 photons, with 30% compared to 48% NMSE. The robustness against noise of the proposed DRL-STORM is confirmed again by the reduced NMSE. Similar improvements of NMSE can be observed in Fig. 4(b), where simulations are performed under different ratios of FWHM to the pixel size. It is duly noted that NMSE is more sensitive to the variations of ratio of FWHM to the pixel size than density and SNR variations, and our proposed method can still be more accurate and obtain a relative stable NMSE between 20% - 30%. This is also in accordance with our expectations on incorporation of the residual layer into the network.

Fig. 4. NMSE comparison between Deep-STORM method (dashed lines) and the proposed DRL-STORM (solid lines) on validation data for (a) different background photons and (b) different ratio ranges of FWHM to pixel size.

Download Full Size | PDF

3.2 Reconstruction of high-density tubulin

Next, we verify the practicability of the algorithm by using a publicly accessible dataset from the single molecule localization challenge (EPFL) website [29]. The tubulin dataset is composed of 500 raw images of $128 \times 128$ pixels with a pixel size of 100 nm. Figure 5(a) is the sum of 500 frames of raw images. We set the parameters of the training data as a FWHM range of 300 nm to 500 nm and a molecule density of 4 µ${\textrm{m}^{ - 2}}$. Due to unknown signal-to-noise ratio (SNR), the intensity and background of raw images are determined respectively as 1000 - 4000 photons per molecule and 200 photons per pixel at a rough estimation. Figures 5(c) and 5(d) display reconstructed images using 500 frames of raw images with two methods. The correspondingly reconstructed results of yellow squares are given in upper right corner. We observe that Deep-STORM and our proposed DRL-STORM produce similar quality results in relative low-density regions, however, our method resolves two closely microtubules slightly better in high-density regions, and reduces the spurious and noisy peaks at the positions between adjacent microtubules. This is also consistent with the intensity profiles along the white dotted line in Fig. 5(b).

Fig. 5. Comparison of Deep-STORM and DRL-STORM on experimental STORM data from EPFL website. (a) Sum of 500 frames of raw images. (b) Plots of intensity profiles along the white dotted lines. (c) and (d) are reconstructed images respectively using Deep-STORM and DRL-STORM. Yellow squares in upper right corners are correspondingly magnified reconstructed results.

Download Full Size | PDF

To assess the performance of the proposed algorithm on high-density data localization, every 10 images are combined together to make one new raw image, and each of them contains more overlapped molecules and more noise [30]. Consequently, we adjust the parameters of the training data to a molecule density of 12 µ${\textrm{m}^{ - 2}}$ and a background of 400 photons per pixel, and final reconstruction of STORM image is achieved by using 50 frames of low-resolution images. Both methods employ high-density data as training data, yet Deep-STORM is less competent in learning high-density data localization, as indicated in Fig. 6. The reconstructed images respectively using Deep-STORM and DRL-STORM method with 50 frames are presented in Figs. 6(a) and 6(b). It can be observed that the reconstructed image quality using the proposed DRL-STORM is improved, especially that the denser regions such as filament crossings are better reconstructed, compared with that using Deep-STORM, as illustrated by the magnified results of green and yellow squares in Figs. 6(c) and 6(d), as well as in Figs. 6(f) and 6(g). There are spurious peaks between adjacent microtubules and noise in the reconstructed images using Deep-STORM method. This is reasonable because that the fewer frames are used the more molecules are contained in each frame. Deep-STORM suffers from noise and low capability of learning high-density data localization, while DRL-STORM can yield better results in both denoising and localization accuracy for high molecule density. The configuration of DRL-STORM architecture, especially the design of residual layer and deeper depth, promotes the suitability and feasibility of our method for high-density molecules localization.

Fig. 6. Comparison of Deep-STORM and DRL-STORM on 50 frames of high-density data with each frame being a summation of 10 raw images in Fig. (5). (a) and (b) are reconstructed images respectively using Deep-STORM and DRL-STORM. (c) and (d) are magnified reconstructed results of green squares respectively using Deep-STORM and DRL-STORM. (f) and (g) are magnified reconstructed results of yellow squares respectively using Deep-STORM and DRL-STORM. (e) and (h) are plots of intensity profiles along the white dotted lines.

Download Full Size | PDF

The intensity profiles along the white dotted lines are measured and plotted in Figs. 6(e) and 6(h). It should be noted that there are missing microtubules and spurious and noisy peaks at the positions between adjacent microtubules in reconstructed result from Deep-STORM, due to its failure in denoising and localizing dense molecules. Nevertheless, our method is capable of reconstructing tubulins structures and generating clearly two microtubules using 50 frames of raw images. The above results also reveal the potential of DRL-STORM for live-cell imaging, using fewer frames to obtain an acceptable super-resolution image.

As analyzed previously, our proposed DRL-STORM achieves better performance through the design of network rather than only increment of parameter numbers. To demonstrate the view, we modify the number of filters in Deep-STORM such that its overall architecture has the same number of parameters as DRL-STORM, and still apply to 50 frames of high-density data. The comparison in Fig. 7 is further evidence that the reconstructed image quality is indeed improved by our method. These results validate that the two-part architecture of DRL-STORM makes better use of computing power of CNN. Since training with an upper bound on density potentially limits performance and causes edge-like artifacts as highlighted in white arrows in Fig. 6(a), now the molecule density of the training data is set as 6 µ${\textrm{m}^{ - 2}}$ and the artifacts disappear in Fig. 7(a). In addition, introducing more parameters for single-part network causes a longer processing time, and Deep-STORM currently spends one hour and a half in training the network for 100 epochs. The two-part architecture of DRL-STORM can be extended to multiple-part to produce more powerful model but longer time. The computation time of single-part network versus multiple-part network with the same number of parameters are listed in Table 2. Note that the runtime for single-part architecture increases much faster than that of the proposed multiple-part network with the increment of parameter numbers.

Fig. 7. Performance comparison of Deep-STORM and DRL-STORM with the same number of parameters on 50 frames of high-density data. (a) and (b) are reconstructed images respectively using Deep-STORM and DRL-STORM. (c) and (d) are magnified reconstructed results of green squares respectively using Deep-STORM and DRL-STORM. (f) and (g) are magnified reconstructed results of yellow squares respectively using Deep-STORM and DRL-STORM. (e) and (h) are plots of intensity profiles along the white dotted lines.

Download Full Size | PDF

Table 2. Comparison of processing time with different parameter numbers and models.

View Table | View all tables in this article

3.3 Live-cell imaging

We finally apply the proposed method to reconstruct STORM image acquired from live-cell imaging. Specifically, imaging of the live mitochondria samples is performed on custom built STORM microscopy system with a self-designed Cy5 based STORM probe. We make the reconstructed movie with each frame being a single STORM image acquired over 0.5 seconds (60 frames of raw data), and it can demonstrate that DRL-STORM improves temporal resolution without deterioration to performance. The network is retrained by changing parameters of molecule density to 4 µ${\textrm{m}^{ - 2}}$ and background to 300 photons per pixel. DRL-STORM spends approximately 0.07s in reconstructing an image of 128×128, while Deep-STORM takes 0.04s to reconstruct an image with the same size.

The result is also compared to another super-resolution fluorescence microscopy method which is a fast localization algorithm based on continuous-space formulation (FALCON) [31]. We run FALCON under the default options except for low-resolution image pixel pitch of CCD_pitch = 100, PSF width of Gsigma = 1.2 and EM gain of EM = 1. Figures 8(a)–8(c) exhibit snapshots of reconstructed results using FALCON respectively in 0-0.5 second, 9-9.5 second and 13.5-14 second. The corresponding results using Deep-STORM and DRL-STORM are given in the following two rows. The dynamic motions of mitochondria are visualized, as marked by white, magenta, green and yellow arrows in Figs. 8(g), 8(h) and 8(i). The motions of these structures can be divided into two kinds, one is changing in the shape of tubules (magenta and yellow arrows), and the other one is changing in the junction point of tubules (white and green arrows). For the first kind, we capture that tubules vary distinctly over time, and for the second kind, original junctions of two tubules disappear because tubules shrink and disconnect, forming two separated tubules. Furthermore, it should be noted from the results in Fig. 8 that DRL-STORM enhances reconstructed image quality, and regions highlighted in blue ellipses are a few examples. DRL-STORM produces tubule structures with greater clarity and less noise, while Deep-STORM retains structural information at the expense of reserving more noise. By contrast, only a fraction of overlapped molecules can be localized by FALCON, resulting in obvious discontinuities and unstructured regions in dense area, which can be perceived as over-denoising. Accordingly, our proposed method reaches a compromise between retaining useful information and denoising. Live-cell imaging, at high temporal resolution in particular, is prone to introducing more noise, necessitating denoising that is important for STORM. Our method realizes moderate denoising by designing a residual layer. Moreover, DRL-STORM has deeper and wider network which is conducive to using fewer frames of raw data to reconstruct super-resolution image, thus improving temporal resolution.

Fig. 8. Comparison of FALCON, Deep-STORM and DRL-STORM on live mitochondria data. (a), (b) and (c) are snapshots of reconstructed results using FALCON respectively in 0-0.5 second, 9-9.5 second and 13.5-14 second (see Visualization 1). (d), (e) and (f) are snapshots of reconstructed results using Deep-STORM respectively in 0-0.5 second, 9-9.5 second and 13.5-14 second (see Visualization 2). (g), (h) and (i) are snapshots of reconstructed results using DRL-STORM respectively in 0-0.5 second, 9-9.5 second and 13.5-14 second (see Visualization 3). The dynamic motions of these structures are indicated by white, magenta, green and yellow arrows. Several image quality comparisons are highlighted in blue ellipses.

Download Full Size | PDF

4. Conclusion

In this paper, an image reconstruction algorithm based on deep convolutional neural networks is presented for high-density data reconstruction in STORM. We apply the approach to real experimental data, as well as to live mitochondrial imaging data. Evaluations against sparse recovery method represented by FALCON and deep learning method Deep-STORM demonstrate that our network delivers advantages such as reduced noise, improved temporal resolution and image quality. These results allow the proposed algorithm to be a prime candidate for super-resolution microscopy, especially with the increasing demand for highly accurate and fast live-cell imaging applications. We intend to continue exploring the architecture by using multiple-part network in future work, which extends computing power of deep learning by increasing both the depth and width of the model.

Funding

National Natural Science Foundation of China (11774242, 61525503, 61605120, 61875131); Department of Education of Guangdong Province (2016KCXTD007); Natural Science Foundation of Guangdong Province (2014A030312008); Science, Technology and Innovation Commission of Shenzhen Municipality (JCYJ20170412105003520, JCYJ20170818100931714, JCYJ20170818142804605); Shenzhen International Cooperation Project (GJHZ20180928161811821).

Disclosures

The authors declare no conflicts of interest.

References

1. M. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM),” Nat. Methods 3(10), 793–796 (2006). [CrossRef]

2. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313(5793), 1642–1645 (2006). [CrossRef]

3. D. Sage, H. Kirshner, T. Pengo, N. Stuurman, J. Min, S. Manley, and M. Unser, “Quantitative evaluation of software package for single-molecule localization microscopy,” Nat. Methods 12(8), 717–724 (2015). [CrossRef]

4. A. Diezmann, Y. Shechtman, and W. E. Moerner, “Three-dimensional localization of single molecules for super resolution imaging and single-particle tracking,” Chem. Rev. 117(11), 7244–7275 (2017). [CrossRef]

5. L. Zhu, W. Zhang, D. Elnatan, and B. Huang, “Faster STORM using compressed sensing,” Nat. Methods 9(7), 721–723 (2012). [CrossRef]

6. K. Agarwal and R. Macháň, “Multiple signal classification algorithm for super-resolution fluorescence microscopy,” Nat. Commun. 7(1), 13752 (2016). [CrossRef]

7. A. Lee, K. Tsekouras, C. Calderon, C. Bustamante, and S. Pressé, “Unraveling the thousand word picture: an introduction to super-resolution data analysis,” Chem. Rev. 117(11), 7276–7330 (2017). [CrossRef]

8. J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach (O’Reilly Media, 2017).

9. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

10. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds. (Curran Associates Inc., 2012), pp. 1097–1105.

11. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2015), pp. 3431–3440.

12. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, eds. (Springer, 2015), pp. 234–241.

13. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process. 26(9), 4509–4522 (2017). [CrossRef]

14. Y. Rivenson, Z. Göröcs, H. Günadin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica 4(11), 1437–1443 (2017). [CrossRef]

15. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv: 1409.1556 (2015).

16. W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nat. Biotechnol. 36(5), 460–468 (2018). [CrossRef]

17. D. S. Kermany, M. Goldbaum, W. Cai, C. C. S. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, J. Dong, M. K. Prasadha, J. Pei, M. Ting, J. Zhu, C. Li, S. Hewett, J. Dong, I. Ziyar, A. Shi, R. Zhang, L. Zheng, R. Hou, W. Shi, X. Fu, Y. Duan, V. A. N. Huu, C. Wen, E. D. Zhang, C. L. Zhang, O. Li, X. Wang, M. A. Singer, X. Sun, J. Xu, A. Tafreshi, M. A. Lewis, H. Xia, and K. Zhang, “Identifying medical diagnoses and treatable diseases by image-based deep learning,” Cell 172(5), 1122–1131.e9 (2018). [CrossRef]

18. M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, M. Rocha-Martins, F. Segovia-Miranda, C. Norden, R. Henriques, M. Zerial, M. Solimena, J. Rink, P. Tomancak, L. Royer, F. Jug, and E. W. Myers, “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nat. Methods 15(12), 1090–1097 (2018). [CrossRef]

19. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-STORM: super-resolution single-molecule microscopy by deep learning,” Optica 5(4), 458–464 (2018). [CrossRef]

20. S. J. Holden, S. Uphoff, and A. N. Kapanidis, “DAOSTORM: an algorithm for high-density super-resolution microscopy,” Nat. Methods 8(4), 279–280 (2011). [CrossRef]

21. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.

22. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising,” IEEE Trans. Image Process. 26(7), 3142–3155 (2017). [CrossRef]

23. S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, F. Bach and D. Blei, eds. (JMLR. org, 2015), pp. 448–456.

24. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds. (Curran Associates Inc., 2012), pp. 1097–1105.

25. C. T. Rueden, J. Schindelin, M. C. Hiner, B. E. DeZonia, A. E. Walter, E. T. Arena, and K. W. Eliceiri, “Imagej2: Imagej for the next generation of scientific image data,” BMC Bioinf. 18(1), 529 (2017). [CrossRef]

26. M. Ovesný, P. Křížek, J. Borkovec, Z. Švindrych, and G. M. Hagen, “ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging,” Bioinformatics 30(16), 2389–2390 (2014). [CrossRef]

27. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv: 1412.6980 (2014).

28. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: large-scale machine learning on heterogeneous systems,” arXiv: 1603.04467 (2016).

29. Biomedical Imaging Group, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, “Benchmarking of single-molecule localization microscopy software,” http://bigwww.epfl.ch/smlm/.

30. J. Li, D. Chen, and J. Qu, “Efficient image reconstruction of high-density molecules with augmented Lagrangian method in super-resolution microscopy,” Opt. Express 26(19), 24329–24342 (2018). [CrossRef]

31. J. Min, C. Vonesch, H. Kirshner, L. Carlini, N. Olivier, S. Holden, S. Manley, J. C. Ye, and M. Unser, “FALCON: fast and unbiased reconstruction of high-density super-resolution microscopy data,” Sci. Rep. 4(1), 4577 (2015). [CrossRef]

Name	Description
Visualization 1	Reconstructed movie of live mitochondria sample by FALCON method.
Visualization 2	Reconstructed movie of live mitochondria sample by Deep-STORM method.
Visualization 3	Reconstructed movie of live mitochondria sample by the proposed DRL-STORM method.

Models	Parameter numbers	Training time (ms)	Inference time (ms)
Single-part network (Deep-STORM)	∼1.3M	4	3
Single-part network	∼9.4M	9	7
Two-part network (DRL-STORM)	∼9.4M	7	5
Single-part network	∼14M	16	13
Three-part network	∼14M	11	8
Single-part network	∼18.7M	24	19
Four-part network	∼18.7M	14	11

Models	Parameter numbers	Training time (ms)	Inference time (ms)
Single-part network (Deep-STORM)	∼1.3M	4	3
Single-part network	∼9.4M	9	7
Two-part network (DRL-STORM)	∼9.4M	7	5
Single-part network	∼14M	16	13
Three-part network	∼14M	11	8
Single-part network	∼18.7M	24	19
Four-part network	∼18.7M	14	11

Image reconstruction with a deep convolutional neural network in high-density super-resolution microscopy

Abstract

1. Introduction

2. Deep residual learning STORM

2.1 Architecture

2.2 Training

2.3 Loss function

3. Results

3.1 Model validation

3.2 Reconstruction of high-density tubulin

3.3 Live-cell imaging

4. Conclusion

Funding

Disclosures

References

Supplementary Material (3)

Cited By

Figures (8)

Tables (2)

Equations (4)

Optics Express