We demonstrate that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field of view and depth of field. After its training, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design. We blindly tested this deep learning approach using various tissue samples that are imaged with low-resolution and wide-field systems, where the network rapidly outputs an image with better resolution, matching the performance of higher numerical aperture lenses and also significantly surpassing their limited field of view and depth of field. These results are significant for various fields that use microscopy tools, including, e.g., life sciences, where optical microscopy is considered as one of the most widely used and deployed techniques. Beyond such applications, the presented approach might be applicable to other imaging modalities, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers that get better as they continue to image specimens and establish new transformations among different modes of imaging.
© 2017 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Deep learning is a class of machine learning techniques that uses multilayered artificial neural networks for automated analysis of signals or data [1,2]. The name comes from the general structure of deep neural networks, which consist of several layers of artificial neurons stacked over each other. One type of a deep neural network is the deep convolutional neural network (CNN). Typically, an individual layer of a deep convolutional network is composed of a convolutional layer and a nonlinear operator. The kernels (filters) in these convolutional layers are randomly initialized and can then be trained to learn how to perform specific tasks using supervised or unsupervised machine learning techniques. CNNs form a rapidly growing research field with various applications in, e.g., image classification , annotation , style transfer , compression , and deconvolution in photography [7–10], among others [11–14]. Recently, deep neural networks have also been successfully applied to solve numerous imaging-related problems in, e.g., computed tomography , magnetic resonance imaging , photoacoustic tomography , and phase retrieval , among others.
Here, we demonstrate the use of a deep neural network to significantly enhance the performance of an optical microscope without changing its design or hardware. This network uses a single image that is acquired under a standard microscope as input, and quickly outputs an improved image of the same specimen, e.g., in less than 1 s using a laptop, matching the resolution of higher-numerical-aperture (NA) objectives, while at the same time surpassing their limited field of view (FOV) and depth of field (DOF). The first step in this deep-learning-based microscopy framework involves learning the statistical transformation between low-resolution and high-resolution microscopic images, which is used to train a CNN. Normally, this transformation can be physically understood as a spatial convolution operation followed by an under-sampling step (going from a high-resolution and high-magnification microscopic image to a low-resolution and low-magnification one). However, the proposed CNN framework instead focuses on training multiple layers of artificial neural networks to statistically relate low-resolution images (input) to high-resolution images (output) of a specimen. In fact, to train and blindly test this deep-learning-based imaging framework, we have chosen bright-field microscopy with spatially and temporally incoherent broadband illumination, which presents challenges to provide an exact analytical or numerical modelling of light–sample interaction and the related physical image formation process, making the relationship between high-resolution images and low-resolution ones significantly more complicated to exactly model or predict. Although bright-field microscopic imaging has been our focus in this paper, the same deep learning framework might be applicable to other microscopy modalities, including, e.g., holography, dark-field, fluorescence, multi-photon, optical coherence tomography, among others.
Sample Preparation: A de-identified formalin-fixed paraffin-embedded (FFPE) hematoxylin and eosin (H&E)-stained human breast tissue section from a breast cancer patient, a Masson’s-trichrome-stained lung tissue section from two pneumonia patients, and a Masson’s-trichrome-stained kidney tissue section from a moderately advanced diabetic nephropathy patient were obtained from the Translational Pathology Core Laboratory at UCLA. Sample staining was done at the Histology Lab at UCLA. All the samples were obtained after de-identification of the patient and related information and were prepared from the existing specimen. Therefore, this work did not interfere with standard practices of care or sample collection procedures.
Microscopic Imaging: Image data acquisition was performed using an Olympus IX83 microscope equipped with a motorized stage and controlled by MetaMorph microscope automation software (Molecular Devices, LLC). The images were acquired using a set of Super Apochromat objectives, (UPLSAPO 40X2 / 0.95NA, 100XO / 1.4NA—oil immersion objective lens). The color images were obtained using a Qimaging Retiga 4000R camera with a pixel size of 7.4 μm.
To initially train the deep neural network, we acquired microscopy images of Masson’s-trichrome-stained lung tissue sections using a pathology slide, obtained from an anonymous pneumonia patient. The lower-resolution images were acquired with a objective lens, providing a FOV of per image, while the higher-resolution training images were acquired with a oil-immersion objective lens, providing a FOV of per image, i.e., 6.25-fold smaller in area. Both the low-resolution and high-resolution images were acquired with 0.55-NA condenser illumination, leading to a diffraction-limited resolution of and , respectively, both of which were adequately sampled by the image sensor chip, with an “effective” pixel size of and , respectively. Following a digital registration procedure to match the corresponding FOVs of each set of images (Section 2 in Supplement 1), we generated 179 low-resolution images corresponding to different regions of the lung tissue sample, which were used as input to our network, together with their corresponding high-resolution labels for each FOV. Out of these images, 149 low-resolution input images and their corresponding high-resolution labels were randomly selected to be used as our training image set, while 10 low-resolution images and their corresponding high-resolution labels were used for selecting and validating the final network model, and the remaining 20 low-resolution inputs and their corresponding high-resolution labels formed our test images used to blindly quantify the average performance of the final network (see the structural similarity index, SSIM, reported in Table S1 in Supplement 1). This training dataset was further augmented by extracting -pixel and -pixel image patches with 40% overlap from the low-resolution and high-resolution images, respectively, which effectively increased our training data size by more than 6-fold. As shown in Fig. 1(a) and further detailed in Section 1 in Supplement 1, these training image patches were randomly assigned to 149 batches, each containing 64 randomly drawn low- and high-resolution image pairs, forming a total of 9,536 input patches for the network training process (Section 3 in Supplement 1, Table S2 in Supplement 1). The pixel count and the number of the image patches were empirically determined to allow rapid training of the network, while at the same time containing distinct sample features in each patch. In this training phase, as further detailed in the supplementary information, we utilized an optimization algorithm to adjust the network’s parameters using the training image set and utilized the validation image set to determine the best network model, also helping to avoid overfitting to the training image data.
After this training procedure, which needs to be performed only once, the CNN is fixed (Fig. 1(b), Sections 1 and 4 in Supplement 1) and ready to blindly output high-resolution images of samples of any type, i.e., not necessarily from the same tissue type that the CNN has been trained on. To demonstrate the success of this deep-learning-enhanced microscopy approach, first we blindly tested the network’s model on entirely different sections of Masson’s-trichrome-stained lung tissue, which were not used in our training process, and in fact were taken from another anonymous patient. These samples were imaged using the same and objective lenses with 0.55-NA condenser illumination, generating various input images for our CNN. The output images of the CNN for these input images are summarized in Fig. 2, which clearly demonstrate the ability of the network to significantly enhance the spatial resolution of the input images, whether or not they were initially acquired with a or a objective lens. For the network output image shown in Fig. 2(a), we used an input image acquired with a objective lens, and therefore it has a FOV that is 6.25-fold larger compared to the objective lens FOV, which is highlighted with a red box in Fig. 2(a). Zoomed-in regions of interest (ROI) corresponding to various input and output images are also shown in Figs. 2(b)–2(p), better illustrating the fine spatial improvements in the network output images compared to the corresponding input images. To give an example of the computational load of this approach, the network output images shown in Fig. 2(a) and Figs. 2(c), 2(h), and 2(m) (with FOVs of and , respectively) took on average and 0.037 s, respectively, to compute using a dual graphics processing unit (GPU) running on a laptop computer (see Section 5 and Table S3 in Supplement 1).
In Fig. 2, we also illustrate that “self-feeding” the output of the network as its new input significantly improves the resulting output image, as demonstrated in Figs. 2(d), 2(i), and 2(n). A minor disadvantage of this self-feeding approach is increased computation time, e.g., on average for Figs. 2(d), 2(i), and 2(n) on the same laptop computer, in comparison to on average for Figs. 2(c), 2(h), and 2(m) (see Section 5 and Table S3 in Supplement 1). After one cycle of feeding the network with its own output, the next cycles of self-feeding do not change the output images in a noticeable manner, as also highlighted in Fig. S6 in Supplement 1.
Quite interestingly, when we use the same deep neural network model on input images acquired with a objective lens, the network output also demonstrates significant enhancement in spatial details that appear blurry in the original input images. These results are demonstrated in Figs. 2(f), 2(k), and 2(p) and in Fig. S8 in Supplement 1, revealing that the same learned model (which was trained on the transformation of images into images) can also be used to super-resolve images that were captured with higher-magnification and higher-numerical-aperture lenses compared to the input images of the training model. This feature suggests the scale-invariance of the image transformation (from lower-resolution input images to higher-resolution ones) that the CNN is trained on.
Next, we blindly applied the same lung-tissue-trained CNN for improving the microscopic images of a Masson’s-trichrome-stained kidney tissue section obtained from an anonymous moderately advanced diabetic nephropathy patient. The network output images shown in Fig. 3 emphasize several important features of our deep-learning-based microscopy framework. First, this tissue type, although stained with the same dye (Masson’s trichrome) is entirely new to our lung-tissue-trained CNN, and yet, the output images clearly show a similarly outstanding performance as in Fig. 2. Second, similar to the results shown in Fig. 2, self-feeding the output of the same lung tissue network as a fresh input back to the network further improves our reconstructed images, even for a kidney tissue that has not been part of our training process; see, e.g., Figs. 3(d), 3(i), and 3(n). Third, the output images of our deep learning model also exhibit a significantly larger DOF. To better illustrate this, the output image of the lung-tissue-trained CNN on a kidney tissue section imaged with a objective was compared to an extended DOF image, which was obtained by using a depth-resolved stack of five images acquired using a objective lens (with 0.4-μm axial increments). To create the gold standard, i.e., the extended DOF image used for comparison to our network output, we merged these five depth-resolved images acquired with a objective lens using a wavelet-based depth-fusion algorithm . The network’s output images, shown in Figs. 3(d), 3(i), and 3(n), clearly demonstrate that several spatial features of the sample that appear in focus in the deep network output image can only be inferred by acquiring a depth-resolved stack of objective images because of the shallow DOF of such high-NA objective lenses—also see the yellow pointers in Figs. 3(n) and 3(p) to better visualize this DOF enhancement. Stated differently, the network output image not only has 6.25-fold larger FOV () compared to the images of a objective lens, but it also exhibits a significantly enhanced DOF. The same extended DOF feature of the deep neural network image inference is further demonstrated using lung tissue samples shown in Figs. 2(n) and 2(o).
Until now, we have focused on bright-field microscopic images of different tissue types, all stained with the same dye (Masson’s trichrome), and used a deep neural network to blindly transform lower-resolution images of these tissue samples into higher-resolution ones, also showing significant enhancement in FOV and DOF of the output images. Next, we tested to see if a CNN that is trained on one type of stain can be applied to other tissue types that are stained with another dye. To investigate this, we trained a new CNN model (with the same network architecture) using microscopic images of a hematoxylin and eosin (H&E)-stained human breast tissue section obtained from an anonymous breast cancer patient. As before, the training pairs were created from lower-resolution images and high-resolution images (see Tables S1, S2 in Supplement 1 for specific implementation details). First, we blindly tested the results of this trained deep neural network on images of breast tissue samples (which were not part of the network training process) acquired using a objective lens. Figure 4 illustrates the success of this blind testing phase, which is expected since this network has been trained on the same type of stain and tissue (i.e., H&E-stained breast tissue). To compare, in the same Fig. 4 we also report the output images of a previously used deep neural network model (trained using lung tissue sections stained with Masson’s trichrome) for the same input images reported in Fig. 4. Except a relatively minor color distortion, all the spatial features of the H&E-stained breast tissue sample have been resolved using a CNN trained on Masson’s-trichrome-stained lung tissue. These results, together with the earlier ones discussed so far, clearly demonstrate the universality of the deep neural network approach, and how it can be used to output enhanced microscopic images of various types of samples from different patients and organs and using different types of stains. A similarly outstanding result, with the same conclusion, is provided in Fig. S9 in Supplement 1, where the deep learning network trained on H&E-stained breast tissue images was applied on Masson’s-trichrome-stained lung tissue samples imaged using a objective lens, representing the opposite case of Fig. 4. To mitigate possible color distortions when inferring images that are stained differently compared to the training image set, one can train a universal network with various types of samples, as demonstrated in, e.g., Ref.  for holography and phase recovery. Such an approach would then increase the number of feature maps and the overall complexity of the network .
To quantify the effect of our deep neural network on the spatial frequencies of the output image, we have applied the CNN that was trained using the lung tissue model on a resolution test target, which was imaged using a objective lens with a 0.55-NA condenser. The objective lens was oil-immersed as depicted in Fig. 5(a), while the interface between the resolution test target and the sample cover glass was not oil-immersed, leading to an effective NA of and a lateral diffraction-limited resolution of . The modulation transfer function (MTF) was evaluated by calculating the contrast of different elements of the resolution test target (Section 6 in Supplement 1). Based on this experimental analysis, the MTFs for the input image and the output image of the deep neural network that was trained on lung tissue are compared to each other in Fig. 5(e) and Table S4 in Supplement 1. The output image of the deep neural network, despite the fact that it was trained on tissue samples imaged with a objective lens, shows an increased modulation contrast for a significant portion of the spatial frequency spectrum at especially high frequencies, while also resolving a period of 0.345 μm (Table S4 in Supplement 1).
To conclude, we have demonstrated how deep learning significantly enhances optical microscopy images by improving their resolution, FOV, and DOF. This deep learning approach is extremely fast to output an improved image, e.g., taking on average per image with a FOV of even using a laptop computer, and only needs a single image taken with a standard optical microscope without the need for extra hardware or user-specified post-processing. After appropriate training, this framework and its derivatives might be applicable to other forms of optical microscopy and imaging techniques and can be used to transfer images that are acquired under low-resolution systems into high-resolution and wide-field images, significantly extending the space bandwidth product of the output images. Furthermore, using the same deep learning approach we have also demonstrated the extension of the spatial frequency response of the imaging system along with an extended DOF. In addition to optical microscopy, this entire framework can also be applied to other computational imaging approaches, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers with improved resolution, FOV, and DOF.
Presidential Early Career Award for Scientists and Engineers (PECASE); Army Research Office (ARO) (W911NF-13-1-0419, W911NF-13-1-0197); ARO Life Sciences Division; National Science Foundation (NSF) (0963183); Division of Chemical, Bioengineering, Environmental, and Transport Systems (CBET) Division Biophotonics Program; Division of Emerging Frontiers in Research and Innovation (EFRI), NSF EAGER Award, NSF INSPIRE Award, NSF Partnerships for Innovation: Building Innovation Capacity (PFI:BIC) Program; Office of Naval Research (ONR); National Institutes of Health (NIH); Howard Hughes Medical Institute (HHMI); Vodafone Foundation; Mary Kay Foundation (TMKF); Steven & Alexandra Cohen Foundation; King Abdullah University of Science and Technology (KAUST); American Recovery and Reinvestment Act of 2009 (ARRA). European Union’s Horizon 2020 Framework Programme (H2020); H2020 Marie Skłodowska-Curie Actions (MSCA) (H2020-MSCA-IF-2014-65959).
See Supplement 1 for supporting content.
1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015). [CrossRef]
2. J. Schmidhuber, “Deep learning in neural networks: an overview,” Neural Netw. 61, 85–117 (2015). [CrossRef]
3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2012), pp. 1097–1105.
4. V. N. Murthy, S. Maji, and R. Manmatha, “Automatic image annotation using deep learning representations,” in 5th ACM on International Conference on Multimedia Retrieval, ICMR‘15 (ACM, 2015), pp. 603–606.
5. L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2414–2423.
6. C. Dong, Y. Deng, C. Change Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in IEEE International Conference on Computer Vision (ICCV) (2015), pp. 576–584.
7. J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 1646–1654.
8. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2016). [CrossRef]
9. W. Shi, J. Caballero, F. Huszar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 1874–1883.
10. M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep bilateral learning for real-time image enhancement,” ACM Trans. Graph. 36, 118 (2017). [CrossRef]
11. V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros, R. Kim, R. Raman, P. C. Nelson, J. L. Mega, and D. R. Webster, “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” JAMA, J. Am. Med. Assoc. 316, 2402–2410 (2016). [CrossRef]
12. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature 529, 484–489 (2016). [CrossRef]
13. N. Jean, M. Burke, M. Xie, W. M. Davis, D. B. Lobell, and S. Ermon, “Combining satellite imagery and machine learning to predict poverty,” Science 353, 790–794 (2016). [CrossRef]
14. A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature 542, 115–118 (2017). [CrossRef]
15. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process. 26, 4509–4522 (2017).
16. S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang, “Accelerating magnetic resonance imaging via deep learning,” in IEEE 13th International Symposium on Biomedical Imaging (ISBI) (2016), pp. 514–517.
17. S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” arXiv preprint arXiv:1704.04587 (2017).
18. Y. Rivenson, Y. Zhang, H. Gunaydin, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Sci. Appl. 7, e17141 (2018). [CrossRef]
19. B. Forster, D. Van De Ville, J. Berent, D. Sage, and M. Unser, “Complex wavelets for extended depth-of-field: a new method for the fusion of multichannel microscopy images,” Microsc. Res. Tech. 65, 33–42 (2004). [CrossRef]