Super-resolution image reconstruction has become a hot topic with the development of deep learning methods, which have been applied in medical images and shown its great potential application. The available simple and uniform bicubic interpolation down-sampling cannot reflect the actual OCT image degradation. A more realistic low-resolution OCT image generation approach is proposed for training deep neural networks. OCT images with high and low resolutions by multiplying two different spectral widths of the light source are obtained. Three kinds of classical deep learning networks are trained to super-resolve OCT images, and the primary results prove their effectiveness. Super-resolution study for the more realistic low-resolution images is of significance for improving the resolution of OCT system in practice.
© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
As a non-destructive and high-resolution imaging technology, optical coherence tomography (OCT) has been developed rapidly since it was proposed in 1991 . There are many physical methods for OCT resolution enhancement, such as micro-OCT , synthetic aperture , structured illumination for lateral super-resolution  and multiframe superresolution . Its axial resolution reaches micron level by using ultra-wide spectrum of Ti: Sapphire laser  or supercontinuum (SC) light source [7–8] up to now. However, limited by cost and available technological level, it is difficult to continuously upgrade the resolution immensely only based on physical methods.
Benefiting from the improvement of computing power, some digital signal or image processing methods have been attempted to improve the resolution of OCT images. In 2013, L. Fang et al.  introduced the sparsity-based simultaneous denoising and interpolation (SBSDI) which used coupled orthogonal matching pursuit to construct sparse representation dictionaries from previously collected datasets to reconstruct the retinal OCT images. In 2017, they further proposed a segmentation-based framework sparse reconstruction (SSR) method, which used automatically segmented retinal layer information to build layer specific structure dictionaries for retinal OCT image reconstruction . In 2018, A. Abbasi et al.  proposed a method based on nonlocal weighted sparse representation, which combined multiple sparse representations of similar noise and denoised patches to better estimate the sparse representation of each patch. However, the sparsity-based methods may smooth the reconstructed HR images, which will lead to the loss of the sharp boundaries.
Deep learning has ultra-strong learning ability and has also been used in the field of image super-resolution. In 2014, C. Dong et al.  proposed the image super-resolution method based on convolutional neural network (CNN) for the first time. After that, many super-resolution networks based on deep learning were proposed in succession, such as the very deep convolutional networks (VDSR) , the enhanced deep residual networks (EDSR) , super-resolution using a generative adversarial network (SRGAN) , the blind super-resolution with iterative kernel correction (IKC)  and the deep plug-and-play super-resolution for arbitrary blur kernels (DPSR) . Super-resolution methods based on deep learning have been also studied in medical images, such as MRI [18–19] and microscopy .
Y. Huang et al.  firstly applied the super-resolution method based on deep learning to OCT images, and they denoised and super-resolved the low-resolution (LR) retinal OCT images simultaneously. It showed better performance compared with the sparsity-based methods. They got HR images through registering and averaging several B-scans obtained at the same position. In order to get LR images, they downsampled the noisy patches based on bicubic interpolation in the patch pairs with the scale factor of 2×, 4× and 8×. In 2020, V. Das et al.  introduced an unsupervised OCT image super-resolution method, which used two generators and two discriminators and did not require one-to-one alignment between the LR and the HR images during training. Their LR images are also synthetically generated by down-sampling the noisy images with the factor of 2× and 4× of bicubic interpolation.
The above two studies both used public datasets of retinal OCT images and the data downsampling method they used are different from the imaging principle of OCT. In order to get closer to principle of real OCT imaging, we use a degradation method similar to OCT sampling to degrade the OCT axial resolution by multiplying a narrow spectral width of the OCT light source in our study. Then, three super-resolution networks, deep residual network for super-resolution (SRResNet) , residual-in-residual dense block network (RRDBNet)  and super-resolution using a generative adversarial network (SRGAN)  are selected to improve the resolution of OCT images.
2.1 Collection of OCT images
The signals of OCT system are obtained by low coherence interference technology, which collects reflected light from samples with different depths. The B-scan images obtained from the OCT system can show the morphological features of the tissue microstructure, whose theoretical axial resolution is mainly determined by the center wavelength and the full width half maximum (FWHM) of its light source. When the output spectrum of light source is Gaussian type, the theoretical formula of axial resolution of OCT system can be calculated as follows ,
In our study, we obtained HR images and LR images based on a home-made spectral domain OCT (SD-OCT) system , whose light source is a super luminescent diode (SLD) (BLM2-D, Superlum) with the central wavelength of 840 nm and the FWHM of 100 nm. Its axial resolution in air and transverse resolution in the focal plane are 3.4 µm and 13 µm, respectively.
After collecting the OCT signal of the sample, the original interference spectrum is multiplied by two Gaussian functions with different widths, to obtain HR and LR signals, whose FWHM are 64 nm and 17 nm, respectively. Then, HR and LR images were obtained by doing inverse Fourier transform, whose axial resolutions are about 5 µm and 18 µm, respectively. HR images will be regarded as the ground-truth images. The advantage of this kind of degradation method is that image pairs of HR and LR images with the same size can be calculated directly, which avoids image pixel registration when they are collected separately.
2.2 Generation of dataset
Skin of zebrafish has rich microstructure details, which is very suitable for the evaluation of image super-resolution effect. Zebrafish, older than 90 days, were scanned to obtain OCT images in our study. As described in Ref. , zebrafish were immersed in 0.024% and 0.012% tricaine solutions successively to be anesthetized. Then, they were scanned by the SD-OCT system, and HR and LR OCT images were obtained based on the methods of Section 2.1.
We collected 25 pairs HR and LR OCT images of zebrafish with 1000 × 2048 pixels × pixels (width × height), respectively, and extracted 1000 × 1004 pixels × pixels with effective information in each image. Each training and test dataset consists of 20 and 5 pairs of OCT HR and LR images, respectively.
For comparison, we also did 4× and 8× bicubic interpolation for HR images in axial direction to get bicubic LR images. In super-resolution study, bicubic interpolation is widely used for the up-sampling and down-sampling of images. When we expressed the meaning of bicubic interpolation in this paper, we used the concept of bicubic down-sampling in order to be consistent with the existing description in super-resolution study. In order to reduce the axial resolution of the image by bicubic interpolation, the lateral scale was set to 1, and the axial scale was 1/4 or 1/8. Then, a bicubic interpolation was again carried out by setting the lateral scale to 1 and the axial scale to 4 or 8, and the image was resized back to the original size. Finally, we obtained the OCT image with reduced axial resolution by bicubic interpolation (4× and 8×).
Rotation and mirror flip were carried out to expand the datasets. Then, each big image was cropped into small images with 64 × 64 pixels × pixels. At last, 2566 and 439 pairs of small images were obtained in each training and test dataset, respectively.
Three classical super-resolution networks, SRResNet, RRDBNet and SRGAN, were trained to obtain the super-resolution images. Because the size of our LR - HR image pair is the same, the up-sampling module of these networks were removed, and other parameters were set according to their original networks. In training section, we used Adam optimizer for all of these networks. The learning rate was initialized with 10−4 and it was decayed with the increasing of iterations. The models were trained on batch-size of 16 for 400,000 iterations. All work was implemented by using Pytorch software and performed on an NVIDIA TITAN RTX GPU with 24G graphic memory volume.
The images of different degradation methods are shown in Fig. 1. Figure 1(a) is the HR image. Figures 1(b) and (c) are LR images of bicubic interpolation downsampling of 4× and 8×, respectively. Figure 1(d) is the LR image. The zoomed versions of the highlighted regions pointed out by red rectangles in each image are displayed on the lower side of the corresponding image for better visualization. As shown in Fig. 1(b), after 4× bicubic interpolation downsampling for the OCT image, the image becomes slightly blurred, but the thin white lines in the image can still be clearly distinguished. After 8× bicubic interpolation downsampling, these white lines are more blurred in Fig. 1(c). Compared with Figs. 1(b) and (c), although the resolution is degraded from 5 µm to 18 µm, the white lines are severely widened in the axial direction in Fig. 1(d), which makes them difficult to be distinguished.
The reconstruction results of super-resolution images based on SRGAN, SRResNet and RRDBNet are shown in Fig. 2. The zoomed versions of the highlighted regions pointed out by yellow and red rectangles at the bottom of the corresponding image for better visualization. Figure 2(a) is the HR image. In order to better compare the differences of these three degradation methods, profiles of A-lines pointed out by red dashed lines in Figs. 2(a), (c1), (d1) and (e1) are shown in Fig. 2(b). As shown in Fig. 2(b), in axial direction, the red curve (4× bicubic) and green curve (8× bicubic) are close to the blue curve (HR image), and the purple curve (the actual OCT degradation) deviates greatly from the blue curve, which explains why the images generated by actual OCT degraded imaging look more different from HR images than those of the bicubic interpolation. Although the axial resolution is degraded from 5 µm to 18 µm, the results of Fig. 2(b) demonstrate that the actual OCT downsampling loses more information than 4× and 8× bicubic interpolations.
Figures 2(c1) - (c4), and 2(d1) - (d4) are LR images of 4× and 8× bicubic interpolation and their reconstruction results of SRGAN, SRResNet and RRDBNet, respectively. Figures 2(e1) - (e4) are the LR image of actual OCT degraded imaging and its reconstruction results of SRGAN, SRResNet and RRDBNet, respectively. As we can see, the results of SRAGN [Fig. 2(c2)], SRResNet [Fig. 2(c3)] and RRDBNet [Fig. 2(c4)] for 4× bicubic dataset are all pretty good. As indicated by the yellow arrows in Figs. 2(c3) and 2(c4), the track trace and thickness of the reconstructed scales of zebrafish are very close to the HR image [Fig. 2(a)]. This is because the LR image [Fig. 2(c1)] did not lose much detail information compared with the HR image. As shown in Figs. 2(d2), (d3) and (d4), these three networks also achieve good performance for the LR image of 8× bicubic interpolation. The track trace and thickness of scales of zebrafish can be also reconstructed very well. For the LR image generated by actual OCT degradation, the reconstructed image of SRGAN [Fig. 2(e2)] is the best among these three networks, where the micro-texture of scales is almost reconstructed. The images of SRResNet [Fig. 2(e3)] and RRDBNet [Fig. 2(e4)] are better than the LR image [Fig. 2(e1)], but it is not good enough to reconstruct the scales of zebrafish completely.
Three commonly used metrics, peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and mean opinion score (MOS) were used to evaluate the super-resolution performance quantitatively.
PSNR is based on the error between corresponding pixels of two images, which is simply defined by mean squared error (MSE) as follows.
SSIM is used to compare the texture change between two images by calculating the similarity from the aspects of luminance (l), contrast (c), and structure (s), which is defined as follows:
MOS is a visual evaluation parameter, which is obtained by the observer’s subjective evaluation of image quality. We performed the mean-opinion-score test to quantify the visual effect of different methods. Specifically, we invited 20 volunteers to assign a score from 1(bad) to 5(excellent) by comparing the output images of different networks with the HR images. The scores were averaged as the final evaluation results.
The mean values of these parameters for the test dataset are shown in Table 1. As shown in Table 1, for 4× and 8× bicubic datasets, SRResNet achieved the best mean of PSNR and the highest SSIM value. However, compared with their LR images, the improvement of PSNR and SSIM is very small, since their LR images are not much different from the HR image. For the dataset generated using the actual OCT degradation, RRDBNet has the best mean of PSNR value and SRResNet has the best mean of SSIM value. The unit of PSNR in Table 1 is dB.
In each row, SRGAN has the smallest values in PSNR and SSIM among these three networks. Although the PSNR and SSIM of SRResNet and RRDBNet are higher than those of SRGAN for actual OCT degradation, the super-resolution images of SRResNet and RRDBNet are over-smoothed, which are difficult to reconstruct the details, and their visual effect is not good. SRGAN has the best visual effect among these methods.
In each column, we can see that the LR image of the actual OCT degradation has the worst PSNR and SSIM, which means it is the most different from the HR image, and induces these three networks also have the minimum PSNR and SSIM value. However, compared with all of LR images, the PSNR and SSIM of the actual OCT degradation have the most obvious improvement.
From Table 1, we can notice that the PSNR and SSIM of SRGAN method for 4× and 8× bicubic datasets are lower than the results of themselves without any process. We think that the main reason for this phenomenon is that SRGAN was trained based on content loss and adversarial loss, which may reduce the PSNR and SSIM values of the results. This is caused by competition between the MSE-based content loss and the adversarial loss. In addition, we found PSNR and SSIM sometimes cannot reveal the improved visual effect authentically, and high PSNR and SSIM do not mean better visual performance.
The subjective evaluation parameter, MOS, was often alternatively used to evaluate the super-resolution performance quantitatively. As shown in Table 1, we can see SRGAN has the best results for all kinds of super-resolution images.
4. Discussions and conclusion
Based on the actual resolution degradation in OCT imaging, we proposed a more realistic low-resolution OCT image generation approach for training deep neural networks. It is demonstrated that the resolution of LR OCT image can be improved based on deep learning, and the extent of improvement is better than those of 4× and 8× bicubic interpolation.
In order to obtain better super-resolution results, many high-performance network structures have been proposed in the previous super-resolution research, but their downsampling methods are almost predeﬁned as the simple and uniform bicubic interpolation, which is quite different from the specific OCT imaging. According to the above qualitative and quantitative comparison, for the actual OCT degradation, image super-resolution is more challenging than those of simple bicubic interpolation and it has higher training difficulty. In the meantime, we think that the more reasonable degradation model is of great significance for the practical application in OCT imaging, which determines the actual super-resolution effect of OCT images.
In recent two years, some new networks like IKC , DPSR  and USRNet  were proposed in the field of image super-resolution, which achieved excellent performance. In addition, the discriminator of generative adversarial network  can help the generator to recover genuine textures. We will compare and select optimum networks in OCT image super-resolution in the future. Besides, we will further use larger and richer datasets to study the super-resolution of OCT images to improve the robustness of the model.
In addition, one important scenario for super-resolution study is in vivo clinical applications like surgery guidance where low-resolution acquisition is preferred to minimize motion artefacts and to save time. We will further pursue the applicability in in vivo OCT images acquired by our handheld probe .
Axial and lateral resolutions of OCT system are determined by different factors, which provide us the possibility of super-resolution study separately. Only axial super-resolution is studied in our study. Actually, lateral super-resolution is more important for obtaining high lateral resolution and large depth of focus. Our ultimate goal is to improve the resolution of two directions and make the existing OCT system have higher resolution by digital methods at the lowest cost.
National Natural Science Foundation of China (61875092); Science and Technology Support Program of Tianjin (17YFZCSY00740); the Beijing-Tianjin-Hebei Basic Research Cooperation Special Program (19JCZDJC65300); and the Fundamental Research Funds for the Central Universities (63201178).
The authors declare no conﬂicts of interest.
1. D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito, and J. G. Fujimoto, “Optical coherence tomography,” Science 254(5035), 1178–1181 (1991). [CrossRef]
2. L. Liu, J. A. Gardecki, S. K. Nadkarni, J. D. Toussaint, Y. Yagi, B. E. Bouma, and G. J. Tearney, “Imaging the subcellular structure of human coronary atherosclerosis using micro–optical coherence tomography,” Nat. Med. 17(8), 1010–1014 (2011). [CrossRef]
3. T. S. Ralston, D. L. Marks, P. S. Carney, and S. A. Boppart, “Interferometric synthetic aperture microscopy,” Nat. Phys. 3(2), 129–134 (2007). [CrossRef]
4. J. Yi, Q. Wei, H. F. Zhang, and V. Backman, “Structured interference optical coherence tomography,” Opt. Lett. 37(15), 3048–3050 (2012). [CrossRef]
5. K. Shen, H. Lu, S. Baig, and M. R. Wang, “Improving lateral resolution and image quality of optical coherence tomography by the multi-frame superresolution technique for 3D tissue imaging,” Biomed. Opt. Express 8(11), 4887–4918 (2017). [CrossRef]
6. J. Xi, A. Zhang, Z. Liu, W. Liang, L. Y. Lin, S. Yu, and X. D. Li, “Diffractive catheter for ultrahigh-resolution spectral-domain volumetric OCT imaging,” Opt. Lett. 39(7), 2016–2019 (2014). [CrossRef]
7. W. Yuan, J. Mavadia-Shukla, J. Xi, W. Liang, X. Yu, S. Yu, and X. D. Li, “Optimal operational conditions for supercontinuum-based ultrahigh-resolution endoscopic OCT imaging,” Opt. Lett. 41(2), 250–253 (2016). [CrossRef]
8. M. Maria, I. B. Gonzalo, T. Feuchter, M. Denninger, P. M. Moselund, L. Leick, O. Bang, and A. Podoleanu, “Q-switch-pumped supercontinuum for ultra-high resolution optical coherence tomography,” Opt. Lett. 42(22), 4744–4747 (2017). [CrossRef]
9. L. Fang, S. Li, R. P. McNabb, Q. Nie, A. N. Kuo, C. A. Toth, J. A. Izatt, and S. Farsiu, “Fast acquisition and reconstruction of optical coherence tomography images via sparse representation,” IEEE Trans. Med. Imaging 32(11), 2034–2049 (2013). [CrossRef]
10. L. Fang, S. Li, D. Cunefare, and S. Farsiu, “Segmentation based sparse reconstruction of optical coherence tomography images,” IEEE Trans. Med. Imaging 36(2), 407–421 (2017). [CrossRef]
11. A. Abbasi, A. Monadjemi, L. Fang, and H. Rabbani, “Optical coherence tomography retinal image reconstruction via nonlocal weighted sparse representation,” J. Biomed. Opt. 23(3), 036011 (2018). [CrossRef]
12. C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in Proceedings of European Conference on Computer Vision (ECCV), (2014), pp. 184–199.
13. J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), pp. 1646–1654.
14. B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2017), pp. 136–144.
15. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 4681–4690.
16. J. Gu, H. Lu, W. Zuo, and C. Dong, “Blind super-resolution with iterative kernel correction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 1604–1613.
17. K. Zhang, W. Zuo, and L. Zhang, “Deep plug-and-play super-resolution for arbitrary blur kernels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 1671–1681.
18. Y. Chen, F. Shi, A. G. Christodoulou, Y. Xie, Z. Zhou, and D. Li, “Efficient and accurate MRI super-resolution using a generative adversarial network and 3D multi-level densely connected network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), (2018), pp. 91–99.
19. J. Shi, Q. Liu, C. Wang, Q. Zhang, S. Ying, and H. Xu, “Super-resolution reconstruction of MR image with a novel residual learning network algorithm,” Phys. Med. Biol. 63(8), 085011 (2018). [CrossRef]
20. H. Zhang, C. Fang, X. Xie, Y. Yang, W. Mei, D. Jin, and P. Fei, “High-throughput, high-resolution deep learning microscopy based on registration-free generative adversarial network,” Biomed. Opt. Express 10(3), 1044–1063 (2019). [CrossRef]
21. Y. Huang, Z. Lu, Z. Shao, M. Ran, J. Zhou, L. Fang, and Y. Zhang, “Simultaneous denoising and super-resolution of optical coherence tomography images based on a generative adversarial network,” Opt. Express 27(9), 12289–12307 (2019). [CrossRef]
22. V. Das, S. Dandapat, and P. K. Bora, “Unsupervised super-resolution of OCT images using generative adversarial network for improved age-related macular degeneration diagnosis,” IEEE Sens. J. 20(15), 8746–8756 (2020). [CrossRef]
23. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European Conference on Computer Vision Workshops (ECCVW), (2018), pp. 1–16.
24. W. Drexler and J. G. Fujimoto, Optical Coherence Tomography: technology and applications (Springer-Verlag, 2008), p. 26.
25. D. Yang, M. Hu, M. Zhang, and Y. Liang, “High-resolution polarization-sensitive optical coherence tomography for zebrafish muscle imaging,” Biomed. Opt. Express 11(10), 5618–5632 (2020). [CrossRef]
26. K. Zhang, L. V. Gool, and R. Timofte, “Deep unfolding network for image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 3217–3226.
27. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proceedings of Advances in Neural Information Processing Systems (NIPS), (2014), pp. 2672–2680.
28. K. Li, Z. Yang, W. Liang, J. Shang, Y. Liang, and S. Wan, “Low-cost, ultracompact handheld optical coherence tomography probe for in vivo oral maxillofacial tissue imaging,” J. Biomed. Opt. 25(4), 046003 (2020). [CrossRef]