Depixelation and image restoration with meta-learning in fiber-bundle-based endomicroscopy

Bowen Yao; Bowen Yao; Boyi Huang; Boyi Huang; Xiang Li; Jinpeng Qi; Yuan Li; Yonghong Shao; Junle Qu; YueQing Gu; Jia Li

doi:10.1364/OE.447495

1. Introduction

Laser-scanning confocal endomicroscopy has been considered a promising technique to realize real-time nondestructive in-vivo imaging of subcellular structures [1]. It combines laser-scanning confocal fluorescence imaging and endoscopic technology, which enables to produce images with high contrast and high resolution [2]. Since the endomicroscopy uses fiber bundles (FB), different light transmitting properties at individual cores and the surrounding cladding cause honeycomb-like artifacts, also known as pixelated pattern noise, which are superimposed on the imaging results and prevent precise analysis of the object [3]. Accordingly, effective algorithm for removing honeycomb patterns is crucial to enhance spatial resolution and to improve image quality and diagnostic accuracy.

A variety of methods have been developed to cater for eliminating the artifacts over the past decades [4]. In the case of single FB image input, initial methods include interpolation in spatial domain [5] and filtering in Fourier domain [6,7]. The former removes the undesired pixelated patterns without substantively improving the spatial resolution, while the latter is inherently susceptible to blurring the underlying image [4]. Recent methods employ computationally iterative techniques to suppress the honeycomb patterns, represented by image retrieval from speckle-based correlations [8], ${l_1}$ norm minimization in the wavelet domain [3], and compressive sensing reconstruction [9,10], and their feasibility has been demonstrated by evaluations. However, the iterative methods maintain high spatial resolution and high contrast at the cost of heavy computation. Another category of methods uses multiple images as input, which typically poses the artifacts removal as an image registration or image alignment problem, like superimposition methods [11,12], video image mosaicking [13,14], generation of a transmission matrix [15], and maximum-a-posterior (MAP) estimation [16]. Although they are capable of effectively reducing the honeycomb artifacts as well as increasing the image resolution by combining a sequence of frames, the accurate real-time alignment and compounding continue to present a challenge [4].

To address above issues, we propose a fast image restoration algorithm based on meta-learning which has attracted considerable attention in recent years. Conventional deep learning algorithms heavily rely on abundant examples to train the network, while meta-learning is closer to human visual systems, recognizing new objects after learning from a limited amount of labeled instances. Another remarkable advantage is that meta-learning can extract and propagate transferable knowledge from a collection of tasks to prevent overfitting and improve generalization [17]. There has been research presenting deep learning applications for reduction of the superimposed artifact from single or multiple FB images. They build a generative adversarial restoration neural network (GARNN) or a 3D convolution network to remove fixed patterns and to recover hidden information [18,19]. Nevertheless, the GARNN requires to be trained by well-registered pairs of FB image and corresponding ground truth data which are acquired simultaneously from the same imaging system, and the 3D convolution network needs multiple frames of FB images to estimate high-resolution image. Particularly when the core diameter of FB or the distribution of fiber bundles changes, it will be time consuming to prepare corresponding training data and to train the network again.

This work focuses on developing a fast image restoration algorithm suited to endomicroscopy imaging, and the important contributions are twofold. The first one is that it improves image quality. The restored results not only remove honeycomb artifacts but also preserve more structures and detail. We demonstrate a better algorithmic performance with higher image contrast, larger peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), based on simulation results and real endomicroscopy data of living mice tissues. The other contribution is that it affords algorithmic insights to how the meta-learning algorithm is applied to endomicroscopy image restoration and solves the depixelation problem. We design the network architecture elaborately where two sub-networks with different depth are used to capture features at various scales, thus we term the proposed network JoinNet. More importantly, the model is trained to achieve rapid adaptation to new task by meta-training. Our algorithm succeeds in producing good results on test images which are different types of tissues and have different distributions of fiber bundles from that of training examples.

2. Method

In this section, we describe how to apply meta-learning method for honeycomb artifacts removal and endomicroscopy image restoration. Proper design of high-performing architecture, efficient meta-training process and robust loss function all contribute to the improvement of the network capability. We first specify the network architecture configuration and its components, and then train the network by taking advantage of simulated training samples and test samples. Optimal parameters of the network can be found by performing the gradient descent so that the actual network’s output image is high resolution without artifacts.

2.1 Architecture

Inspired by model-agnostic meta-learning [20] and automated relational meta-learning [21], we propose a network architecture based on meta-learning, as shown in Fig. 1. It can be divided into two parts, and there are five types of blocks with different colors for distinction. The first part is composed of two sub-networks which compute feature representations at different levels, and the second part restores image without honeycomb artifacts via a simple encoder-decoder model.

(1) Conv + ReLU: Convolutional filters of size $3 \times 3$ are firstly used to generate feature maps, and then element-wise rectified linear units (ReLU) are added for nonlinearity [22]. Mathematically, $\textrm{ReLU}({\cdot} )= \max ({ \cdot \textrm{ },\textrm{ }0} )$. The width of each block denotes the number of features in the representation, which is also the filter count in the corresponding convolutional layer.

Fig. 1. The framework of the proposed network. Green dashed lines and solid lines show the process of meta-training.

Download Full Size | PDF

Although features of higher abstraction are captured by higher layers [23], and deeper networks often have better representation capacity and can learn more complex input features, deeper networks incline to suffer from gradients vanishing and become difficult to train [24]. Hence, we build two sub-network architectures with different depths to extract enough features processed at various levels of abstraction.

(1) Average Pooling: we insert average pooling between successive convolutional layers to reduce the spatial dimension of the feature representation and to help control overfitting [25]. Average pooling performs down-sampling operation by taking the spatial average of the feature maps in the corresponding 2 $\times$ 2 region and discarding redundant information, which facilitates more complete information flow to the next stage.
(2) Sum + Up-sampling: This block is utilized to firstly map 128 channels or 64 channels into 1 channel by direct addition, and then to recover the spatial size of the data by replicating the elements of input feature map of up-sampling to the filter area. Different depths of two sub-networks produce feature maps of different spatial resolutions. The first sub-network captures features with relatively coarser resolution, and these features are enhanced with features of higher resolution from the other sub-network. They are of the same spatial size by up-sampling and hence can be concatenated to form the input of the next part.
(3) Concatenate Layer: Concatenate is an important operation in the design of network architecture, and it is able to strengthen feature propagation and to improve efficiency [26]. Despite requiring less memory and fewer parameters, add operation cannot change the number of channels and lose information, leading to a decline in image quality. As a consequence, we combine features through concatenating them instead of summation.
(4) Sepaconv + ReLU: Because the separable convolutional layer performs convolutional kernels for each channel separately [27], we utilize it to further fuse the features of different levels after concatenation and to improve representational efficiency of the model.
(5) Deconv + ReLU: The deconvolutional layer is expected to reconstruct images by mapping features to pixels, which is contrary to what a convolutional layer does [25]. Here, we let the network learn the deconvolution filter in such a block rather than pre-defining it, to restore more accurate information from preceding multi-levels of feature maps.

2.2 Meta-training

We prepare the training dataset by simulations, which dispenses with the need for extra imaging system to provide well-registered pairs of fiber bundle images and their corresponding ground truth data. Note that each optical fiber in the fiber bundles of our confocal fluorescent endomicroscopy has a small core diameter of 3 $\mathrm{\mu m}$ and a pixel size of 0.3 $\mathrm{\mu m}$ [28], therefore the diffraction effect is distinct enough to be taken into account to establish more accurate simulation. The fiber cores are then regularly arranged on a discrete grid with the image pixel size, represented by the mask in Fig. 2(b). It is superimposed on a simulated pattern in Fig. 2(a) to generate a fiber bundle image with honeycomb artifacts. The synthetic endomicroscopy image in Fig. 2(c) and its corresponding ground truth data in Fig. 2(a) comprise a training data pair.

Fig. 2. Simulated training data and test data. (a) Simulated random pattern. (b) Mask represents a distribution of fiber bundle cores. (c) Synthetic endomicroscopy image is generated by multiplication of the mask in (b) with the original image in (a). (d), (e), (f) and (g) are respectively simulated random pattern, synthetic image, restored result with meta-learning and without meta-learning. (h), (i), (j) and (k) are respectively simulated random pattern, synthetic image, restored result with meta-learning and without meta-learning.

Download Full Size | PDF

We firstly generate 36000 frames of random pattern images ($64 \times 64$ pixels) as dataset for meta-training, which are masked by 8 types of different fiber bundle distributions. Each type of distribution can be considered as a learning task and has 4500 frames of images. In meta-learning, the dataset is often split into two parts, a training set ${D_{\textrm{train}}}$ and a test set ${D_{\textrm{test}}}$. Next, in every step, 50 frames of simulated images are randomly extracted from the $j\textrm{th}$ task to constitute the training set $D_{\textrm{train}}^j$, and another 50 frames of simulated images are randomly extracted from the $j\textrm{th}$ task as $D_{\textrm{test}}^j$.

Given a model represented by a parameterized function ${f_\theta }$ with parameters $\theta$, we minimize the training loss ${{\cal L}_j}({{f_{\theta {^{\prime}_j}}},D_{\textrm{train}}^j} )$ to obtain the temporary parameters $\theta {^{\prime}_j}$ of the $j\textrm{th}$ task, as indicated by the green dashed lines in Fig. (1). This work adopts mean square error (MSE) loss function which measures the square of ${l_2}$ norm of the difference between the network’s prediction ${f_{\theta {^{\prime}_j}}}$ and the desired ground-truth image ${y_k}$. Assuming that N is the number of images in the training set $D_{\textrm{train}}^j$ and $N\textrm{ = }50$, we therefore have the training loss function

(1)$${{\cal L}_j}({{f_{\theta {^{\prime}_j}}},D_{\textrm{train}}^j} )= \frac{1}{N}\sum\limits_{k = 1}^N {||{{y_k} - {f_{\theta {^{\prime}_j}}}} ||_2^2} .$$

Loss function reframes training neural networks as an optimization problem, and as a result, the optimal parameters can be found by iterative optimization algorithms [25]. We compute the temporary parameters $\theta {^{\prime}_j}$ of the $j\textrm{th}$ task by using one gradient descent update, formulated as

(2)$$\theta {^{\prime}_{j,\textrm{ }i}} = \theta {^{\prime}_{j,\textrm{ }i - 1}} - \alpha {\nabla _{\theta ^{\prime}}}{{\cal L}_j}({{f_{\theta {^{\prime}_j}}},D_{\textrm{train}}^j} ),$$

where the subscript i denotes the $i\textrm{th}$ step. In our experiments, the learner learning rate $\alpha$ and the number of steps are set as 0.001 and 180, respectively.

Afterward, we reload the temporary parameters $\theta ^{\prime} = \{{\theta {^{\prime}_1},\theta {^{\prime}_2}, \cdots ,\theta {^{\prime}_j}, \cdots \theta {^{\prime}_8}} \}$ to the network, as indicated by the green solid lines in Fig. (1), and compute the meta-objective

(3)$${{\cal L}_j}({{f_{\theta {^{\prime}_j}}},D_{\textrm{test}}^j} )= \frac{1}{M}\sum\limits_{k = 1}^M {||{{y_k} - {f_{\theta {^{\prime}_j}}}} ||_2^2} .$$

In the above equation, M represents the number of images in test set $D_{\textrm{test}}^{}$ for all the tasks and $M = 400$.

The meta-optimization across tasks is performed via gradient descent again, given by

(4)$${\theta _i} = {\theta _{i - 1}} - \beta {\nabla _\theta }\sum\limits_{j = 1}^K {{{\cal L}_j}({{f_{\theta {^{\prime}_j}}},D_{\textrm{test}}^j} )} ,$$

in which the meta-learner learning rate $\beta$ is fixed as 0.001, and K expresses the number of tasks and $K\textrm{ = 8}$ in our work. The pseudo-code in Table 1 explains the meta-training procedures of the proposed algorithm.

Table 1. Pseudo-code of meta-training process of JoinNet

View Table | View all tables in this article

As a critical factor, training data determines what we want the neural network to learn. The goal of meta-learning is to train a model that can quickly adapt to a new task using only a small number of training examples. From a feature learning standpoint, the process of meta-training a model’s parameters can be viewed as building an internal representation that is broadly suitable for many tasks [20]. We have tested and find that the proposed network is immune from different distributions of fiber bundles and various core diameters of FB. For instance, whilst the pixilated patterns in Fig. 2(e) and 2(i) are dissimilarly from those in training dataset, Fig. 2(f) and 2(j) illustrate that JoinNet is still competent for alleviating the pixelated artifacts and recovering the image. By contrast, Fig. 2(g) and 2(k) are the restored images with JoinNet without meta-training, which are much worse than the inference results of the JoinNet trained with meta-training. This result is related to the fact that meta-learning optimizes the parameters such that they lie in a region that is amenable to fast adaptation and is sensitive to loss function [20], rather than overfitting to parameters that only improve after one-time training in conventional deep learning. Thus meta-learning can redirect the gradients of parameters to fit all tasks well and produces good generalization performance on new task, while traditional deep learning and transfer learning does not work well when the target task is not so similar [29]. Accordingly, our method does not require the parameters of the training examples to be precisely matched to that of real confocal laser endomicroscopy. This is conducive to extend applications of our easy and effective network to more FB-based imaging system.

Meta-learning does not always spend more time in training. In our experiments, the JoinNet reaches its best performance within 300 steps, and each step takes 945 ms with our hardware device. It only needs 800 frames of images per step and its trainable parameters are fewer, and it is fast. The key idea of meta-learning is to train the model’s parameters such that the model has maximal performance on a new task, learning and adapting quickly from a small amount of training data and training iterations.

Our network is trained with Adam [30] implemented in TensorFlow [31]. Both training and testing are performed on a workstation equipped with 512 GB of memory, two Intel Xeon E5-2687W v4, 3.00GHz CPUs, and a NVidia Quadro P6000 GPU with 24 GB of video memory.

3. Results

3.1 Model validation

We begin with evaluating the performance of our proposed JoinNet model using testing data under 8 types of FB distributions. These distributions are not only different from that in the training dataset but also irregular. We conduct the evaluations on test data to quantify the quality of the results, and analyze and compare the restored results to a very deep residual encoder-decoder network (RED30) [24] in terms of the peak signal-to-noise ratio that is calculated according to the following formula

(5)$$\textrm{PSNR} = 10{\log _{10}}\left( {\frac{{{{({{2^n} - 1} )}^2}}}{{\textrm{MSE}}}} \right).$$

In Eq. (5), n is taken from the range of data type, for example, it is set as 8 for unit8 image and 1 for floating data. MSE measures the difference between restored image and the reference image. We also compute and compare the structural similarity that is very well matched to assess perceptual image quality [32].

Table 2 summarizes objective measurements and inference time on test data using RED30 and JoinNet. In order to fairly compare the performance of two methods, RED30 is trained with all 36000 frames of images in a traditional deep learning way, while JoinNet is meta-trained with 400 frames of images in each step. It can be seen that our JoinNet obtains larger average PSNR and SSIM values; specifically, it outperforms RED30 by 10% in SSIM and has about 4 dB higher of PSNR over RED30. The rationale behind this result is that the honeycomb pattern noise are removed effectively by our network such that the concealed images are restored very close to their ground truth, with higher PSNR and SSIM values representing better results. As also listed in Table 2, our network comprises ∼0.2M parameters, while RED30 has ∼1.1M parameters. This manifests that the proposed JoinNet achieves better performance through the design of network rather than increasing parameter numbers. A faster inference time for each frame of $64 \times 64$ pixels image reveals the potential of our method for real-time endomicroscopy image processing, using shorter time to obtain an acceptable quality image.

Table 2. Comparison of performance with different models.

View Table | View all tables in this article

3.2 Simulated data

We use a USAF resolution target to validate quantitatively the performance of our method in terms of image resolution and contrast. The original image of the USAF target, shown in Fig. 3(a), is $512 \times 512$ pixels. The simulated honeycomb-like artifacts are then imposed on it, generating a synthetic resolution target image in Fig. 3(b). Figures 3(c) and 3(d) present the processed results using RED30 and JoinNet methods, respectively.

Fig. 3. Comparison of RED30 and JoinNet on synthetic USAF resolution target. (a) Original USAF target image (ground truth). (b) Synthetic USAF target image. (c) and (d) are restored images respectively using RED30 and JoinNet. (e) Plots of intensity profiles along red and blue lines.

Download Full Size | PDF

Both methods exhibit removal of artifacts and similarly resolve up to the second element in Group 6. It is worth noting that RED30 leads to intensity fluctuations along the surface of the bars and numbers, while our method maintains uniformity in intensity. The improvement is primarily due to an effective decrease in the pixilated pattern noise by JoinNet so that the intensities of the restored bars and numbers are closer to the original resolution image. Moreover, we notice that edges are better preserved by JoinNet. There are obvious remaining pixilated patterns along the edge of the bars in the image restored with RED30, like the fourth element in Group 4; its recovered edges are jagged rather than smooth. On the contrary, our method removes these artifacts along the edges. This is consistent with the intensity profiles along the red and blue lines in Fig. 3(e), where the spurious and noisy peaks at the positions between adjacent bars are suppressed by JoinNet (blue curve). The curves illustrate that our proposed method can achieve not only an intensity closer to that of original image but also a higher image contrast.

3.3 Real data

Since there is no actual objective metric used for evaluating real data performance, we firstly capture real encomicroscopy image of USAF resolution target and restore the image with our method. The result in Fig. 4(c) reveals that the proposed JoinNet recovers hidden information instead of generating artificial artifacts. Compared to the results of RED30 in Fig. 4(b), similar improvements of image contrast and image quality can be observed in Fig. 4(c), particularly for the numbers in the left column. It should be noted that the endomicroscope used for USAF resolution target is different from that for tissues in the following work. This provides another demonstration of the feasibility of the network to restore images acquired using various FB-based imaging systems.

Fig. 4. Experimental results with the USAF resolution target: (a) raw endomicroscopy image, (b) restored image from RED30, and (c) restored image from JoinNet.

Download Full Size | PDF

Next, we verify the practicability of the algorithm in artifacts removal and image restoration by using different types of real endomicroscopy images. Imaging of the intestine, kidney and adipose tissues of living mice are performed on custom built multi-laser scanning confocal fluorescent endomicroscopy system [28]. The endomicroscopy images are composed of $512 \times 512$ pixels with a pixel size of 300 nm. It is important to note that the network needs to be trained only one time, and the resulting weights and the trained network are appropriate to restore different types of tissues.

Figure 5(a) is the raw image of intestine tissues, and Figs. 5(f) and 5(g) display restored images using RED30 and JoinNet, respectively. Both results indicate that deep learning methods work well by recovering hidden information instead of indiscriminately producing unattainable information. Take the dark spots marked by magenta arrows as example, they are caused by damaged fiber pixels, and the corresponding locations in restored results still retain dark without artificial information. In the light of this, it is reliable to suppress honeycomb patterns and to restore endomicroscopy images by applying meta-learning methods.

Fig. 5. Experimental results with the intestine tissues: (a) raw endomicroscopy image, (b) magnified result of green rectangle using RED30, (c) magnified result of green rectangle using JoinNet, (d) magnified result of yellow rectangle using RED30, (e) magnified result of yellow rectangle using JoinNet, (f) restored image from RED30, and (g) restored image from JoinNet.

Download Full Size | PDF

Despite the fact that both methods are successful in eliminating the repeated pixelated patterns, the image from our method appears to be brighter and has higher contrast. This assist in visualization much, as many structures are not yet hidden from view, the region enclosed with green rectangle is a prominent example. In contrast to the results from RED30 in Fig. 5(b), the outline of the intestine tissues in Fig. 5(c) becomes clearer after removing bundle mask pattern, and the curved edge of the dark areas obscured previously are better captured and resolved. Furthermore, as exemplified by the magnified results in Figs. 5(d) and 5(e), it is noticeable that RED30 removes pixelated noise at the expense of blurring the image and resulting in obvious low signal intensity and unstructured regions, while our proposed method reaches a compromise between retaining useful information and denoising. This is in accordance with our expectations on devising sub-networks with different depths, which is beneficial to process the visual information at various scales so that the next stage can map combined features to pixels, and restore fiber bundle-based endomicroscopy image without degradation in the image quality.

As further evidence that the image quality is indeed improved by our proposed JoinNet, another experiment is conducted with the kidney tissues in Fig. 6(a) as input. The corresponding results are given in Fig. 6 with the same layout as Fig. 5. It can be seen that the recovered image in Fig. 6(h) by applying JoinNet has higher image contrast than that of Fig. 6(g) from RED30. Tissue details are not discernable in the raw FB-based endomicroscopy image and results from RED30, while our method is capable of reducing blur and resolving finer details, as presented in Figs. 6(b) to 6(d). JoinNet adequately promotes restoration of hidden information but does not do so with the pixelated noise. The signal consequently becomes much more pronounced over the noise, producing high contrast image. The improvements in the image quality can also be clearly observed in the magnified results of yellow rectangles in Figs. 6(e) and 6(f). There are sharper features in the restored images using JoinNet, compared with that using RED30 method.

Fig. 6. Experimental results with the kidney tissues: (a) raw endomicroscopy image, (b) magnified result of green rectangle in raw image, (c) magnified result of green rectangle using RED30, (d) magnified result of green rectangle using JoinNet, (e) magnified result of yellow rectangle using RED30, (f) magnified result of yellow rectangle using JoinNet, (g) restored image from RED30, and (h) restored image from JoinNet.

Download Full Size | PDF

We finally apply the proposed method to restore image of adipose tissues acquired from endomicroscopy imaging. Figures 7(b) and 7(c) exhibit recovered results using RED30 and JoinNet, respectively. We observe that JoinNet eliminates the artifacts more effectively, thereby restoring more continuous image than that of RED30. The structures are sharper and more obvious in the restored results by using our proposed method, whereas there are unrecovered regions in the resulting image from RED30. All the above results demonstrate that JoinNet is feasible to remove honeycomb artifacts in real endomicroscopy imaging, and is effective for different types of tissues to achieve improvements in the image quality. This can be largely attributed to the meta-training process, which trains the model such that it can solve new task and produce good performance. Our network only needs to be trained with simulated patterns and simulated FB distributions, yet it is able to restore different types of real endomicroscopy images.

Fig. 7. Experimental results with the adipose tissues: (a) raw endomicroscopy image, (b) restored image from RED30, and (c) restored image from JoinNet.

Download Full Size | PDF

4. Conclusion

In this paper, a meta-learning based algorithm is developed for honeycomb artifacts removal and image restoration in endomicroscopy. We take advantage of meta-training to shorten the training time and to improve the adaptability of the model to new task. The approach is applied to synthetic USAF resolution target image, as well as to real imaging data of intestine, kidney and adipose tissues of living mice. Evaluations against deep learning method RED30 demonstrate that our network can reduce pixilated pattern noise and provide images with favorable performance in terms of image quality. These advantages allow the proposed algorithm to be a prime candidate for endomicroscopy applications, especially with the increasing demand for accurately real-time in-vivo imaging of subcellular structures.

Funding

National Natural Science Foundation of China (81727804); Shenzhen Science and Technology R&D and Innovation Foundation (JCYJ20200109105608771); Shenzhen International Cooperation Project (GJHZ20180928161811821).

Acknowledements

We thank Dr. Jianbo Shao and Prof. Rongguang Liang (University of Arizona, Tucson, Arizona 85721, USA) for sharing the training dataset for comparison study.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [28].

References

1. D. Shin, M. H. Lee, A. D. Polydorides, M. C. Pierce, P. M. Vila, N. D. Parikh, D. G. Rosen, S. Anandasabapathy, and R. R. Richards-Kortum, “Quantitative analysis of high-resolution microendoscopic images for diagnosis of neoplasia in patients with Barrett’s esophagus,” Gastrointest. Endosc. 83(1), 107–114 (2016). [CrossRef]

2. A. F. Gmitro and D. Aziz, “Confocal microscopy through a fiber-optic imaging bundle,” Opt. Lett. 18(8), 565–567 (1993). [CrossRef]

3. X. Liu, L. Zhang, M. Kirby, R. Becker, S. Qi, and F. Zhao, “Iterative l₁-min algorithm for fixed pattern noise removal in fiber-bundle-based endoscopic imaging,” J. Opt. Soc. Am. A 33(4), 630–636 (2016). [CrossRef]

4. A. Perperidis, K. Dhaliwal, S. McLaughlin, and T. Vercauteren, “Image computing for fibre-bundle endomicroscopy: a review,” Med. Image Anal. 62, 101620 (2020). [CrossRef]

5. S. Rupp, C. Winter, and M. Elter, “Evaluation of spatial interpolation strategies for the removal of comb-structure in fiber-optic images,” in Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC) (2009), pp. 3677–3680.

6. C. Winter, S. Rupp, M. Elter, C. Münzemayer, H. Gerhäuser, and T. Wittenberg, “Automatic adaptive enhancement for images obtained with fiberscopic endoscopes,” IEEE Trans. Biomed. Eng. 53(10), 2035–2046 (2006). [CrossRef]

7. M. Dumripatanachod and W. Piyawattanametha, “A fast depixelation method of fiber bundle image for an embedded system,” in 8th Biomedical Engineering International Conference (BMEiCON) (2015), pp. 1–4.

8. A. Porat, E. R. Andresen, H. Rigneault, D. Oron, S. Gigan, and O. Katz, “Widefield lensless imaging through a fiber bundle via speckle correlations,” Opt. Express 24(15), 16835–16855 (2016). [CrossRef]

9. J–H Han, S. M. Yoon, and G–J Yoon, “Decoupling structural artifacts in fiber optic imaging by applying compressive sensing,” Optik 126(19), 2013–2017 (2015). [CrossRef]

10. S. P. Mekhail, N. Abudukeyoumu, J. Ward, G. Arbuthnott, and S. N. Chormaic, “Fiber-bundle-basis sparse reconstruction for high resolution wide-field microendoscopy,” Biomed. Opt. Express 9(4), 1843–1851 (2018). [CrossRef]

11. C–Y Lee and J–H Han, “Elimination of honeycomb patterns in fiber bundle imaging by a superimposition method,” Opt. Lett. 38(12), 2023–2025 (2013). [CrossRef]

12. G. W. Cheon, J. Cha, and J. U. Kang, “Random transverse motion-induced spatial compounding for fiber bundle imaging,” Opt. Lett. 39(15), 4368–4371 (2014). [CrossRef]

13. N. Bedard, T. Quang, K. Schmele, R. R. Kortum, and T. S. Tkaczyk, “Real-time video mosaicing with a high-resolution microendoscope,” Biomed. Opt. Express 3(10), 2428–2435 (2012). [CrossRef]

14. K. Vyas, M. Hughes, and G–Z Yang, “Electromagnetic tracking of handheld high-resolution endomicroscopy probes to assist with real-time video mosaicking,” Proc. SPIE 9304, 93040Y (2015). [CrossRef]

15. C. Yoon, M. Kang, J. H. Hong, T. D. Yang, J. Xing, H. Yoo, Y. Choi, and W. Choi, “Removal of back-reflection noise at ultrathin imaging probes by the single-core illumination and wide-field detection,” Sci. Rep. 7(1), 6524 (2017). [CrossRef]

16. J. Shao, W–C Liao, R. Liang, and K. Barnard, “Resolution enhancement for fiber bundle imaging using maximum a posteriori estimation,” Opt. Lett. 43(8), 1906–1909 (2018). [CrossRef]

17. W. –Y. Chen, Y. –C. Liu, Z. Kira, Y. –C. F. Wang, and J. –B. Huang, “A closer look at few-shot classification,” arXiv: 1904.04232v2 (2020).

18. J. Shao, J. Zhang, X. Huang, R. Liang, and K. Barnard, “Fiber bundle image restoration using deep learning,” Opt. Lett. 44(5), 1080–1083 (2019). [CrossRef]

19. J. Shao, J. Zhang, R. Liang, and K. Barnard, “Fiber bundle image resolution enhancement using deep learning,” Opt. Express 27(11), 15880–15890 (2019). [CrossRef]

20. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in 34th International Conference on Machine Learning (ICML) (2017), pp. 1126–1135.

21. H. Yao, X. Wu, Z. Tao, Y. Li, B. Ding, R. Li, and Z. Li, “Automated relational meta-learning,” arXiv: 2001.00745v1 (2020).

22. A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in 30th International Conference on Machine Learning (ICML) (2013), pp. 6.

23. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 1–9.

24. X. –J. Mao, C. Shen, and Y. –B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” in 30th Annual Conference on Neural Information Processing Systems (NIPS) (2016), pp. 2810–2818.

25. J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach (O’Reilly Media, 2017).

26. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 2261–2269.

27. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: efficient convolutional neural networks for mobile vision application,” arXiv: 1704.04861v1 (2017).

28. X. Zhang, X. Li, Q. Lin, J. Chen, Y. Gu, and Y. Shao, “Multi-laser scanning confocal fluorescent endoscopy scheme for subcellular imaging,” Prog. Electromagn. Res. 169, 17–23 (2020). [CrossRef]

29. F. Hutter, L. Kotthoff, and J. Vanschoren, Automated Machine Learning: Methods, Systems, Challenges (Springer, 2019)

30. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv: 1412.6980 (2014).

31. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: large-scale machine learning on heterogeneous systems,” arXiv: 1603.04467 (2016).

32. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). [CrossRef]

Models	PSNR (dB)	SSIM	Parameter numbers	Inference time (ms)
Test data	8.80	52.76%	/	/
RED30	13.96	76.56%	∼1.1M	1
JoinNet	17.81	87.28%	∼0.2M	0.25

Models	PSNR (dB)	SSIM	Parameter numbers	Inference time (ms)
Test data	8.80	52.76%	/	/
RED30	13.96	76.56%	∼1.1M	1
JoinNet	17.81	87.28%	∼0.2M	0.25

Depixelation and image restoration with meta-learning in fiber-bundle-based endomicroscopy

Abstract

1. Introduction

2. Method

2.1 Architecture

2.2 Meta-training

3. Results

3.1 Model validation

3.2 Simulated data

3.3 Real data

4. Conclusion

Funding

Acknowledements

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (2)

Equations (5)

Optics Express