Selfrec-Net: self-supervised deep learning approach for the reconstruction of Cherenkov-excited luminescence scanned tomography

Wenqian Zhang; Wenqian Zhang; Ting Hu; Ting Hu; Zhe Li; Zhe Li; Zhonghua Sun; Zhonghua Sun; Kebin Jia; Kebin Jia; Huijing Dou; Jinchao Feng; Jinchao Feng; Brian W. Pogue; Brian W. Pogue

doi:10.1364/BOE.480429

1. Introduction

Cherenkov-excited luminescence scanned imaging (CELSI) is an emerging imaging modality that utilizes sheet-shaped megavolt (MV) x-ray or electron beams generated by a medical linear accelerator (LINAC) to induce Cherenkov radiation within tissues. This Cherenkov light excites luminescence of optical probes across the sheet plane [1,2], providing a highly localized source location. The emitted luminescent photons can be captured by a time-domain gated intensified charge-coupled device (ICCD) camera [2]. CELSI can yield a high spatial resolution of about 100 µm within 20-30 mm thick tissue [3]. In addition, it has been demonstrated for potential in monitoring physiological changes of tumors during radiotherapy treatment [4–6]. However, CELSI imaging fails to present a 3D distribution of probes, and instead only provides their 2D luminescence images. Therefore, the tomographic techniques of CELSI (CELST) have been used to reconstruct a 3D internal distribution of the luminescent probe from the captured surface image [1–3,7].

Due to the diffusive nature of light propagation in tissue, CELST reconstruction is a severely ill-posed inverse problem. To solve this ill-posed problem, many regularization-based approaches have been published. Based on the Tikhonov regularization, Feng et al. developed a reconstruction algorithm for CELST [7]; nevertheless, over-smoothed phenomenon appeared on some sharp features of the reconstructed images. Moreover, artifacts might be introduced especially when optical probes were located deeper than 20 mm [8]. To enhance the image quality of CELST, l_p-regularization [9] and total variation (TV) constraints [10] can be extended to CESLT reconstruction. However, the non-differentiability of l_p and TV norm penalty could result in a hard computation. In addition, the above regularization-based algorithms should be optimized iteratively, which usually results in a large computational burden in the inverse problem of CELST, because the forward light propagation was calculated during each iteration [11]. It is also a time-consuming effort to manually tune the regularization parameter [12].

In the past few years, deep learning (DL) has been increasingly used for image reconstruction in optical tomography [13–20], and has shown promise toward reconstruction performance when compared to the conventional iterative reconstruction algorithms [21,22]. For example, the convolutional neural network (CNN) was utilized for superior image quality relative to traditional reconstruction operators like filtered back-projection [16]. Most of the DL-based reconstruction methods tested have been supervised, attempting to train a mapping model that directly reconstructs the luminescent concentration from the surface measurements. For instance, Zhang et al. proposed an end-to-end framework to reconstruct the fluorescent sources, which achieved an outstanding spatial resolution of 0.5 mm [17]. Such DL-based reconstruction is able to replace the explicit model light propagation, so as a result the computational time can be reduced to a few seconds [23]. These supervised learning methods achieve success under the training on sufficient data pairs (surface measurements and their corresponding ground truths) [24,25], yet these are often unavailable, especially in live animal imaging studies, and so the application of phantom trained reconstruction models to real in vivo imaging with all the tissue complexity is really a leap of faith.

To help address this issue, self-supervised learning can potentially alleviate the lack of in vivo training data because it only utilizes the acquired measurements rather than using the ground truths as the labels. Consequently, it has been well-suited for image reconstruction in optical tomography [26–28], and results obtained have competitive performance to supervised learning algorithms [29,30]. For instance, Ulyanov et al. presented the deep image prior (DIP) framework and achieved similar results to state-of-the-art super-resolution methods [30].

Inspired by the performance of self-supervised mechanisms in image reconstruction problems, we introduce it here for the first time, to reconstruct 3D CELST images. Naturally, a deep self-supervised reconstruction network (Selfrec-Net) cascading a 3D-Unet with the forward model is constructed. The output of the 3D-Unet are 3D reconstructed images from the captured measurement data sets, while the result of the forward model is the estimated measurements from the reconstructed images. As the loss between the estimated and the captured measurements descends slowly, parameters of the 3D-Unet tend to be optimal values. Ideally, the more the predicted measurements are similar to the acquired one, the more the reconstructed results are similar to the ground truths. In this study, the proposed Selfrec-Net is compared with two iterative algorithms and one supervised DL approach to valid its effectiveness.

2. Methods

2.1 Forward modeling

CELST aims to resolve the 3D distribution of an optical probe from the detecting secondary Cherenkov-induced fluorescence on the surface of tissue. The light propagation in biological tissue can be modeled as the following coupled diffusion equations [7,31]:

(1)$$\left\{ \begin{array}{l} - \nabla {D_x}(r)\nabla {\Phi _x}(r) + {\mu_{ax}}(r){\Phi _x}(r) = Q(r)\textrm{ }\\ - \nabla {D_m}(r)\nabla {\Phi _m}(r) + {\mu_{am}}(r){\Phi _m}(r) = {\Phi _x}(r)\chi (r)\textrm{ } \end{array} \right.(r \in \varOmega )$$

with the Robin-type boundary conditions [32,33]:

(2)$${\Phi _{x,m}}(r) + 2A\vec{n}(r){D_{x,m}}(r) \cdot \nabla {\Phi _{x,m}}(r) = 0\textrm{ }(r \in \partial \Omega )$$

where the subscripts x and m denote the excitation and emission processes, respectively; ${\Phi _{x,m}}(r)$ represents the photon flux density at location r in the imaging domain $\Omega $, ${D_{x,m}}(r) = 1/\textrm{3}({\mu _a}_{x,am}(r) + \mu _{sx,sm}^{\prime}(r))$ denotes the diffusion coefficient; and ${\mu _{ax,am}}(r)$ and $\mu _{sx,sm}^{\prime}(r)$ are the absorption and reduced scattering coefficients, respectively; $Q(r)$ is the Cherenkov excitation source term induced by sheet-shaped LINAC beams, and $\chi (r)$ to be reconstructed is the unknown distribution of luminescence yield [2]; $\vec{n}(r)$ is the unit normal vector outward at the boundary $\partial \varOmega$ of $\Omega $, and A is a constant associated with the deviation of the boundary optical refraction coefficient.

Based on the finite element method (FEM) [34], the relationship between boundary measurements y and the unknown distribution of luminescence yield $\chi $ can be obtained by discretizing Eqs. (1) and (2):

(3)$$y = f(\chi )$$

where $f({\cdot} )$ represents the forward model.

2.2 Deep self-supervised learning algorithm

Since the CELST reconstruction is ill-posed and under-conditioned, the distribution of $\chi $ cannot be obtained by directly inverting Eq. (3). Following the supervised DL methodology, the CELST reconstruction can be formulated as:

(4)$$\hat{\chi } = g({y^ \ast })$$

where $g({\cdot} )$ denotes the reconstruction network, which maps the acquired boundary measurements ${y^ \ast }$ to the reconstructed distribution of the luminescence yield $\hat{\chi }$.

Different from supervised learning approaches, a deep learning self-supervised method shown in Fig. 1(a) is developed (named Selfrec-Net). More specifically, the reconstructed distributions $\hat{\chi }$ are fed into the forward model to generate predicted measurements $\hat{y}$. The proposed Selfrec-Net can be very briefly summarized as:

(5)$$\hat{y} = f(\hat{\chi }) = f[{g({y^ \ast })} ]$$

Fig. 1. The framework of the developed Selfrec-Net. (a) Schematic for CELST reconstruction, and (b) details of the reconstruction Selfrec-Net network.

Download Full Size | PDF

The network parameters can be optimized under the supervision of the labels $\hat{y}$ to avoid the necessity of the ground truths. The mean squared error (MSE) loss between the acquired data ${y^ \ast }$ and the predicted data $\hat{y}$ is used to train the network, as:

(6)$$MSE = \frac{1}{N}\sum\limits_{i = 1}^N {{{\|{y_i^ \ast{-} {{\hat{y}}_i}} \|}^2}} $$

where N is the number of training samples.

The reconstruction network in Selfrec-Net is designed based on the 3D-UNet [35], which consists of a contracting encoder module to analyze the boundary measurements and an expansive decoder module to produce a certain distribution of the luminescence yield, as shown in Fig. 1(b). The encoder module contains 4 convolution blocks and 3 down-sampling layers. Each convolution block contains two 3×3×3 convolutions followed by a leaky rectified linear unit (ReLU). Batch normalization (BN) is also added between convolution and leaky ReLU [35]. In addition, we double the number of feature channels to avoid bottlenecks in [36] before down-sampling and convolutions with the kernel of 2×2×2 and stride of 2 are utilized for down-sampling operations to reduce information loss [8,37]. The decoder module contains 3 convolution blocks and 3 up-sampling layers. Corresponding to the encoder module, three transposed convolutions each with the kernel of 2×2×2 and stride of 2 are used for up-sampling operations in the decoder network [38]. In particular, a 1×1×1 convolution is used in the last layer to project the output channels to the number of desired which is 1 here. The low-level features extracted from the encoder network are fused with the high-level features in the decoder network correspondingly through the skip connection, to finally reconstruct the distribution of luminescent yield. In total, the network has 21 layers and more details of this network information are compiled in Supplement 1.

Numerical simulation and phantom experiments were performed to discuss the performance of the Selfrec-Net. To synthetize the simulation datasets, a commercially available mouse phantom (XFM-2, PerkinElmer Health Sciences) was used, as shown in Fig. 2(a). The absorption coefficient ${\mu _a}$ and reduced scattering coefficient $\mu _s^{\prime}$ for wavelength excitation and emission were assumed to be 0.007 mm^-1 and 1.0 mm^-1, respectively [3]. The quantum yield of the phantom was set as 2.0×10⁻⁴mm^-1 [2,3,39]. A range of tests were developed including single or two spherical interior targets with varied size, and varied target-to-background contrasts of quantum yield were randomly placed inside the phantom. For the phantoms with a single target, the diameter of the target was selected from 3 mm to 6 mm and the target-to-background contrast of quantum yield was chosen from 2:1 to 5:1. For the phantoms with two targets, the targets were configured with the same diameter (3, 4, or 5 mm), but varied edge-to-edge distances (from 0 to 10 mm), contrasts (from 2:1 to 5:1). A total of 2000 phantoms were created. To synthesize the forward data, these phantoms were discretized into the mesh contained 18,496 nodes and 93,818 elements, as shown in Fig. 2(b). Detectors were placed at the top surface of the phantoms to mimic the acquired fluorescence signals, as presented in Fig. 2(c). 36 sheets with a step size of approximately 0.5 mm were used to laterally scan the phantoms [3]. The forward data for each phantom were generated with the open-source software NIRFAST [31,40]. 1% Gaussian noise was added to the boundary measurements. The whole cohort of 2000 datasets were split into 1500 for training and 500 for validation. To perform a non-biased evaluation of the Selfrec-Net, testing datasets were additionally created, meaning that they were never used during the network training.

Fig. 2. The illustration of the phantom and the corresponding finite element mesh used in the experiments. (a) XFM-2 phantom, (b) the finite element mesh used in forward modeling, and (c) the placement of detectors.

Download Full Size | PDF

For comparative purposes, reconstruction with a deep supervised learning algorithm (DSL) was also applied to execute reconstruction experiments on the above data. Details of the DSL could be found in Supplement 1. Moreover, the proposed method was also compared with two iterative algorithms, including Tikhonov [7] and Graph-TV [10]. For the two DL algorithms, the reconstruction network was implemented in Python 3.8 with PyTorch [41], the Adam optimizer [42] was used to train for 500 epochs with the learning rate of 10⁻⁴, and the batch size of 8. All computations were run on a 64-bit PC, having an Intel Core i7-9700 CPU at 3.00 GHz with 32 GB RAM and two NVIDIA GeForce RTX 3090 graphic cards each with 24 GB memory. For the Tikhonov and Graph-TV algorithms, CELST reconstructions were performed using the regularization parameter value range of from 0.00001 to 0.1, and the results with best image quality in terms of MSE, PSNR, and SSIM were given.

3. Results

3.1 Reconstruction from simulated data

To quantitatively analyze the reconstruction performance in simulations, the mean squared error (MSE) [43], the peak signal-to-noise ratio (PSNR) [44], the structural similarity (SSIM) [45], the full width at half-maximum (FWHM) [46] are calculated. Larger PSNR and SSIM as well as smaller MSE and more accurate FWHM values signify a better reconstruction result.

3.1.1 Single target

A single target with a diameter of 6 mm was placed with different depths inside the XFM-2 phantom to evaluate the proposed Selfrec-Net. The depth to the top surface varied from 6 mm to 12 mm. The target-to-background contrast of quantum yield was 4:1 and 1% Gaussian noise was added. Some visual results for Tikhonov, Graph-TV, DSL and Selfrec-Net are shown in Fig. 3, where the intensity profiles provide a more quantitative assessment of the reconstructed images. The specific metrics are compiled in Table 1.

Fig. 3. Result illustrations when for a single target located in different depths. (a)-(d) are the reconstructed results by Tikhonov, Graph-TV, DSL, and Selfrec-Net methods for the depth of target is 6 mm, 8 mm, 10 mm and 12 mm, in turn. Intensity profiles are viewed along the z-axis direction (yellow dotted line in the second row), while the small red circles in 2D images represent the actual positions of the target.

Download Full Size | PDF

Table 1. Quantitative results for mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and full width at half-maximum (FWHM), for the case of a single target located at different depths inside the phantom

View Table | View all tables in this article

From Fig. 3, we can observe that over-smoothed images were obtained by Tikhonov regularization, and artifacts existed around the targets. Among the four algorithms, Tikhonov obtains the poorest image quality in terms of average MSE (1.2×10⁻³), average PSNR (16.7 dB), and average SSIM (0.65). Graph-TV yields better results than Tikhonov regularization, average MSE is reduced to 9.7×10⁻⁴, and average PSNR and SSIM are improved to 19.3 dB and 0.71, respectively. In contrast, enhancement of the reconstruction by deep learning is obvious over those achieved with the iterative algorithms. Average MSE, was reduced to 3.3×10⁻⁶ with DSL, which is 99.7% lower than the value yielded with Graph-TV. In addition, PSNR and SSIM were improved to 34.8 dB and 0.97, respectively, which are 79.9% and 35.9% higher than the values obtained with Graph-TV, respectively. Selfrec-Net yields comparative results with DSL, and average MSE, PSNR, and SSIM are 5.3×10⁻⁶, 33.3 dB and 0.95, respectively, which are 58.1% higher, 4.3% and 1.8% lower than those of the DSL, respectively. With the increased depth, the performances of Tikhonov and Graph-TV methods decreased, and the image quality was further degraded. However, both DSL and Selfrec-Net still can yield notably high performance with PSNR of 30 dB and SSIM of 0.9 even when the target locates at the depth of 12 mm.

We also experimented on the size of target. A single target with varied size from 3 mm to 5 mm was placed at the depth of 10 mm. The corresponding results are presented in Fig. 4. Reducing the size of target leads to poor image quality for the four algorithms because the CELST reconstruction becomes much harder. For the Selfrec-Net, reducing the size of target from 5 mm to 3 mm, caused MSE to increase from 8.16×10⁻⁶ to 6.37×10⁻⁵, PSNR and SSIM to reduce by 10.7% and 5.4%, respectively. However, both DSL and Selfrec-Net have similar results, and significantly outperform the other two methods.

Fig. 4. Experimental results on the size of target for four algorithms. (a) MSE, (b) PSNR, and (c) SSIM.

Download Full Size | PDF

3.1.2 Robustness to noise

We further test the performances of the DSL and Selfrec-Net under different levels of Gaussian noise. A single target was placed at depths from 8 mm to 15 mm, while 1%–8% Gaussian noise was added to the forward data. Representative results are shown in Fig. 5, and the corresponding quantitative results are shown in Table 2. The results reveal that the image quality degrades with increased noise level for the two algorithms. When the noise level increased from 3% to 5%, the average MSE for the four depths increased from 1.7×10⁻⁵±1.5×10⁻⁵ and 2.4×10⁻⁵±2.3×10⁻⁵ to 3.5×10⁻⁵ and 8.1×10⁻⁵±9.1×10⁻⁵ for the DSL and Selfrec-Net, respectively, and the average PSNR was reduced from 31.5±2.0 dB and 30.9±2.1 dB to 29.9±3.0 dB and 28.2±3.7 dB, respectively. When the noise level is lower than 3%, Selfrec-Net can accurately recover the distribution of targets. But when the level of noise is increased to 5% and the depth is larger than 10 mm, Selfrec-Net fails to recover the shape of targets. In addition, when the level of noise is 5%, the average SSIM of Selfrec-Net decreases from 0.91 to 0.87, which is 4.9% lower than that yielded with 3% noise. Similar trends were also observed for the DSL. Overall, the DSL can obtain better image quality in terms of MSE, PSNR and SSIM.

Fig. 5. CELST reconstruction results with the DSL and the Selfrec-Net under 3% and 5% noise. (a)-(d) 3D rendering of the reconstructed distribution of quantum yield and the corresponding 2D cross-sections taken from the $X = 50$mm and $Y = 65$mm, respectively.

Download Full Size | PDF

Table 2. Quantitative assessment metrics for the DSL and the Selfrec-Net under different noise levels

View Table | View all tables in this article

Figure 6 further shows the heat map of three evaluation metrics versus different noise levels for the Selfrec-Net, where the horizontal axis is the noise level and the vertical axis is the depth of targets. The results again show that the performance of Selfrec-Net degrades with increased noise level or depth. As shown in Fig. 6, when the noise level and the depth locate in the shaded area, Selfrec-Net can achieve better performance with MSE less than 3×10⁻⁵, the PSNR and the SSIM more than 30 dB and 0.90, respectively. These results indicate that Selfrec-Net is relatively robust to noise.

Fig. 6. The heat map of quantitative results with different levels of noise and different depths. (a) MSE, (b) PSNR, and (c) SSIM.

Download Full Size | PDF

3.1.3 Spatial resolution test

In this subsection, experiments were performed to test their ability to discriminate two adjacent targets. The two targets with the same diameter of 4 mm and the same target-to-background of 3:1 were placed at a depth of 10 mm, with their edge-to-edge distance (EED) varied from 1 mm to 4 mm. Furthermore, 1% Gaussian noise was added to simulated measurements. 3D distributions, 2D sections, and1D profiles along the x-axis direction in the cross-sections of $Y = 60$ mm were presented in Fig. 7. It was observed that the Tikhonov and Graph-TV algorithms could not discriminate the two targets when the EED was smaller than 4 mm, while the two DL algorithms had the capability to resolve the two targets with the EED of 2 mm. However, when the EED was decreased to 1 mm, DSL achieved better separating performance than Selfrec-Net. Overall, Selfrec-Net could achieve comparable performance to DSL when the EED is larger than 1 mm.

Fig. 7. Result illustrations for the EED decreases from 4 to 1 mm. (a)-(d) 3D rendering of the ground-truth images, the corresponding 2D cross-section, and intensity profiles.

Download Full Size | PDF

The effect of target depth on spatial resolution was also studied. The two targets were placed at two depths (8 mm and 12 mm), and the corresponding results were shown in Fig. 8. The results reveal that the spatial resolution is dependent on target depth. When the targets locate at the depth of 8 mm, the Selfrec-Net can separate the targets even with the EED of 1 mm. But when targets locate deeper, the ability of the Selfrec-Net to discriminate the two targets degrades. For targets at the depth of 12 mm, the locations of targets were not recovered accurately, and only one target was recovered when the EED was reduced to 1 mm.

Fig. 8. Reconstructed results when targets locate at different depths. (a)-(d) are the results with decreased the edge-to-edge distance from 4 mm to 1 mm, respectively.

Download Full Size | PDF

3.1.4 Generalization test

To test the generalization ability of the trained Selfrec-Net, experiments with three targets were performed. The three targets had the same diameter (4 mm), and were placed at different edge-to-edge distance, as shown in Fig. 9. The target-to-background contrast values were fixed to be 3:1, and 1% Gaussian noise was added to boundary measurements. Figure 9 shows the typical results, and the quantitative results were listed in Table 3. It can be seen that the Selfrec-Net still could recover the distribution of three targets when the EED was 2 mm, but the reconstructed effects of quantum field and the spatial resolution degrades compared to the above reported results. When the EED was reduced to 1 mm, both the two DL algorithms failed to resolve the three targets (Fig. 9(d)). From the PSNR and SSIM in Table 3, similar conduction could be obtained. Namely, Selfrec-Net has a similar generalization ability with DSL.

Fig. 9. Reconstructed results with three targets. (a-d) are the results with the edge-to-edge distance of 2 mm or 1 mm.

Download Full Size | PDF

Table 3. Quantitative results for DSL and Selfrec-Net with three targets with an EED of 2 mm

View Table | View all tables in this article

3.1.5 Reconstruction efficiency

The Selfrec-Net requires a long training time, and it took more than 26 hours for training on our unlabeled 3D datasets. In contrast, it took about 17 hours for training the DSL on the same datasets. The reason is that Selfrec-Net involves repeated calculation of the forward light propagation model during training. However, it tends to take about 2 seconds for CELST reconstruction once the two deep learning algorithms are well-trained, while it takes 1.1 and 1.4 hours for the Tikhonov and Graph-TV algorithms, respectively. The two algorithms are slower because they require several iterations and inversion/multiplication of large matrices. For example, each algorithm involves the computation of the Jacobian matrix $J$ and the inverse of ${J^T}J$ at one iteration, where the size of $J$ is 35604*35604 in our experiments. Thus the reconstruction time of Selfrec-Net is about three orders of magnitude shorter than that of iterative reconstruction algorithms, showing that the Selfrec-Net has the potential for real-time imaging.

3.2 Reconstruction from experiment data

To further assess the performance of the proposed algorithm, the XFM-2 mouse phantom was imaged with 1 mM Platinum Oxyphor G4 (PtG4) within 7 µl at the tip of the capillary tube inserted into the central part of the phantom. 36 sheet-shaped beams were generated to laterally scan the phantom using a clinical radiotherapy LINAC (Varian LINAC 2100CD, Varian Medical Systems, Palo Alto, USA). A gated, intensified charge-coupled device (ICCD, PI-MAX4 1024i, Princeton Instruments, USA) was used to acquire CELSI images. More detailed information about the system and experimental acquisition can be found in [3]. Note that the acquired images need to be normalized by their maximum values and used as input of the network. Figure 10(a) denote the fluorescent images overlaid on the microCT images. Figures 10(b-e) show the reconstructed results by the Tikhonov, Graph-TV, DSL and Selfrec-Net algorithms, successively. It can be observed that significant artifacts exist in the reconstructed image by Tikhonov regularization. A better localization was yielded for the Graph-TV, and less artifacts exist in its reconstructed images. In contrast, the two deep learning algorithms can obtain images with sharp edges, and artifacts are significantly reduced. In addition, the location error was also calculated by solving the relative location error between the center of the reconstructed distribution of quantum field and the center of truth distribution (i.e., capillary tube) determined from CT images to evaluate positioning performance of the two methods. The location errors for DSL and Selfrec-Net are 1.26 mm and 1.54 mm, respectively, which reveals that the Selfrec-Net can obtain comparative result to DSL again.

Fig. 10. CELST reconstructed images with experiment data. (a) Luminescent image was overlaid on the microCT image. (b)-(e) are the reconstructed results and the corresponding 2D cross-sections for the four considered algorithms, respectively.

Download Full Size | PDF

4. Discussion

In contrast with the conventional iterative algorithms, the DSL and Selfrec-Net can significantly improve image quality, however, each of their performance metrics does still slightly depend upon the datasets. When the amount of training datasets was reduced from 1500 to 800, the average PSNR of DSL was reduced from about 34 dB to 28.2 dB and SSIM reduced from 0.96 to 0.87, which are respectively 17.1% and 9.4% lower than the values yielded with the training dataset of 1500. Similar performance degradation also occurred in PSNR and SSIM for the Selfrec-Net. PSNR and SSIM were respectively reduced from 31.6 dB to 27.2 dB and from 0.94 to 0.86. The performance of both algorithms degraded probably caused by the problem of overfitting. So in conclusion the appropriate choice of data set training is critical, as would be expected. But when properly trained the algorithm is a very good reconstruction of a single object, with little depth sensitivity, as seen in Fig. 3. This can be contrasted with the explicit Tikhonov algorithm which is well known to have extreme depth sensitivity, making diffuse image reconstruction always problematic. The value of the DSL and Selfre-Net algorithms is most obvious in this aspect of reconstruction.

As shown in Fig. 1, we used the 3D-UNet as the reconstruction backbone to demonstrate the feasibility of self-supervised learning for CELST reconstruction. The 3D-UNet is used to map the acquired boundary measurements to the reconstructed distribution of luminescence yield. The adoption of 3D-UNet is due to that it has a small amount of network parameters and requires less computational cost. From a reconstruction point of view, the 3D-UNet can be replaced by other CNN structures. However, we have not explored other CNN structures such as a ResNet-based structure. Our previous work showed that incorporating residual blocks into UNet can achieve a superior performance over the original UNet [47]. Therefore, future work will access the performance of other CNN structures.

As shown in Table 1, DSL can yield relatively better image quality and less artifacts compared to the Selfrec-Net in terms of MSE, PSNR, and SSIM. However, the merit of the Selfrec-Net is that it is free of ground-truth images. So when it is applied to experimental images, such as in Fig. 10, it has stronger potential to retain its accuracy. To improve the performance of the Selfrec-Net on experimental data, an available method is to include experimental data in the training/validation set. However, to date, we only acquired a few phantom and in vivo data. Therefore, they were not included in the training/validation datasets. Future work will be needed to collect more experimental data to augment the dataset and compare the performance of other CNN structures.

The accuracy of the forward model can have an influence on the performance of the Selfrec-Net. We explored this by simulating the effect of different magnitude optical property errors on the forward model. Errors in the assumed absorption and reduced scattering coefficients were expressed as percentages of baseline values used in generating training datasets. A single target with size of 4 mm was positioned at the depth of 11 mm. The quantitative results are shown in Table 4. For ±5% errors in optical properties, MSE, PSNR, and SSIM were reduced by less than 23.7%, 3.7%, and 1%, respectively. For increasing optical errors, the quality of reconstructed image further degraded. The corresponding simulations with the DSL produced similar results.

Table 4. The effect of different magnitude optical property errors on reconstruction performance of the Selfrec-Net

View Table | View all tables in this article

There are clearly limits to the reconstruction though which needs to be interpreted carefully, as seen in Figs. 7,8 and 9, with two and three object reconstructions. The ability to retain accuracy is reasonable with two targets, but it does degrade with three targets. So as the complexity of the imaging field grows, the recovery intensity accuracy decreases. However, the strengths of the reconstruction are the accuracy of the spatial resolution is superior and the localization of the object centroids is superior to the iterative methods. This can be more generally discussed as an obvious limit of deep learning algorithm is the ability of generalization to arbitrary objects. Although the well-trained network works on a dataset with single or two targets, this may not necessarily be directly applied to the case with multiple targets or distributed complex target regions, as the quantitative accuracy and image quality are reduced. However, it is generally true that most mouse imaging in CELST involves singular regions of tissue that have medium-high contrast relative to the surrounding normal tissue, and that there is a desire for reconstruction algorithms that have higher accuracy in recovery of the intensity value and accuracy in the spatial resolution. From this context, the Selfrec-Net does appear to do a good job, as shown in Fig. 10.

Thinking about a transfer learning approach [48,49], this would enable cross-study adaptation and information sharing, allowing knowledge learned on one type of data to be applied to another. In the future work, it should be expected that a transfer learning approach could be incorporated to improve the generalizability to more complex situations.

5. Conclusions

In this study, a deep self-supervised learning based Selfrec-Net was proposed for CELST reconstruction, which ran with a 3D-Unet, followed by the forward model to recover a 3D distribution of the quantum yield from the acquired measurements in an input-to-input method. Thus, the reconstruction network could be trained by minimizing the MSE between the boundary measurements and the model output. The Selfrec-Net approach avoids the requirement of having ground-truth images during the training phase, thereby allowing it to be better directly applied to more complex experimental data. To the best of our knowledge, this is the first effort to introduce the self-supervised mechanism for CELST reconstruction. Its performance was verified with numerical simulations and physical phantom experiments. The experimental results show that the proposed net achieved preferred performance towards image reconstruction, noise robustness, spatial resolution, and scene generalization for singular objects at any depth inside the tissue. It could achieve comparable reconstruction performance compared to DSL, but it adds in the advantage of being a network that can be applied to solve the lacking ground truth problem.

Funding

National Natural Science Foundation of China (81871394, 82171992, 62105010).

Disclosures

The authors have no relevant financial interests in this work.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. R. Zhang, A. V. D’souza, J. R. Gunn, T. V. Esipova, S. A. Vinogradov, A. K. Glaser, L. A. Jarvis, D. J. Gladstone, and B. W. Pogue, “Cherenkov-excited luminescence scanned imaging,” Opt. Lett. 40(5), 827–830 (2015). [CrossRef]

2. P. Brůža, H. Lin, S. A. Vinogradov, L. A. Jarvis, D. J. Gladstone, and B. W. Pogue, “Light sheet luminescence imaging with Cherenkov excitation in thick scattering media,” Opt. Lett. 41(13), 2986–2989 (2016). [CrossRef]

3. B. W. Pogue, J. Feng, E. P. LaRochelle, P. Brůža, H. Lin, R. Zhang, J. R. Shell, H. Dehghani, S. C. Davis, S. A. Vinogradov, D. J. Gladstone, and L. A. Jarvis, “Map of in vivo oxygen pressure with submillimeter resolution and nanomolar sensitivity enabled buy Cherenkov-exited luminescence scanned imaging,” Nat. Biomed. Eng. 2(4), 254–264 (2018). [CrossRef]

4. K. Tanha, A. M. Pashazadeh, and B. W. Pogue, “Review of biomedical Čerenkov luminescence imaging applications,” Biomed. Opt. Express 6(8), 3053–3065 (2015). [CrossRef]

5. R. Zhang, S. C. Davis, J. H. Demers, A. K. Glaser, D. J. Gladstone, T. V. Esipova, S. A. Vinogradov, and B. W. Pogue, “Oxygen tomography by Cerenkov-excited phosphorescence during external beam irradiation,” J. Biomed. Opt 18(5), 050503 (2013). [CrossRef]

6. H. Lin, R. Zhang, J. R. Gunn, T. V. Esipova, S. Vinogradov, D. J. Gladstone, L. A. Jarvis, and B. W. Pogue, “Comparison of Cherenkov excited fluorescence and phosphorescence molecular sensing from tissue with external beam irradiation,” Phys. Med. Biol. 61(10), 3955–3968 (2016). [CrossRef]

7. J. Feng, P. Bruza, H. Dehghani, S. C. Davis, and B. W. Pogue, “Cherenkov-excited luminescence sheet imaging (CELSI) tomographic reconstruction,” in Progress in Biomedical Optics and Imaging - Proceedings of SPIE10049, (2017).

8. W. Zhang, J. Feng, Z. Li, Z. Sun, and K. Jia, “Reconstruction for cherenkov-excited luminescence scanned tomography based on Unet network,” Chin. J. Lasers 48(17), 1707001 (2021).

9. M. A. T. Figureueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems,” IEEE J. Sel. Topics Signal Process. 1(4), 586–597 (2017).

10. W. Lu, J. Duan, D. O. Miguel, L. Herve, and L. B. Styles, “Graph- and finite element-based total variation models for the inverse problem in diffuse optical tomography,” Biomed. Opt. Express 10(6), 2684–2707 (2019). [CrossRef]

11. J. Feng, S. Jiang, J. Xu, Y. Zhao, B. W. Pogue, and K. D. Paulsen, “Multiobjective guided priors improve the accuracy of near-infrared spectral tomography for breast imaging,” J. Biomed. Opt 21(9), 090506 (2016). [CrossRef]

12. P. C. Hansen, “Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion,” SIAM Monographs on Mathematical Modeling and Computation, SIAM, Philadelphia, PA, 1998.

13. J. Feng, W. Zhang, Z. Li, K. Jia, S. Jiang, H. Dehghani, B. W. Pogue, and K. D. Paulsen, “Deep-learning based image reconstruction for MRI-guided near-infrared spectral tomography,” Optica 9(3), 264–267 (2022). [CrossRef]

14. K. Wang, J. Dou, Q. Kemao, J. Di, and J. Zhao, “Y-Net: a one-to-two deep learning framework for digital holographic reconstruction,” Opt. Lett. 44(19), 4765–4768 (2019). [CrossRef]

15. J. Yoo, S. Sabir, D. Heo, K. H. Kim, A. Wahab, Y. Choi, S. I. Lee, E. Y. Chae, H. H. Kim, Y. M. Bae, Y. W. Choi, S. Cho, and J. C. Ye, “Deep learning diffuse optical tomography,” IEEE Trans. Med. Imaging 39(4), 877–887 (2020). [CrossRef]

16. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. on Image Process. 26(9), 4509–4522 (2017). [CrossRef]

17. P. Zhang, G. Fan, T. Xing, F. Song, and G. Zhang, “UHR-DeepFMT: Ultra-high spatial resolution reconstruction of fluorescence molecular tomography based on 3-D fusion dual-sampling deep neural network,” IEEE Trans. Med. Imaging 40(11), 3217–3228 (2021). [CrossRef]

18. H. Ben Yedder, A. BenTaieb, M. Shokoufi, A. Zahiremami, F. Golnaraghi, and G. Hamarneh, “Deep learning based image reconstruction for diffuse optical tomography,” in Machine Learning for Medical Image Reconstruction (MLMIR), (Springer, 2018), pp. 112–119.

19. Y. Zou, Y. Zeng, S. Li, and Q. Zhu, “Machine learning model with physical constraints for diffuse optical tomography,” Biomed. Opt. Express 12(9), 5720–5735 (2021). [CrossRef]

20. X. Fang, C. Gao, Y. Li, and T. Li, “Solving heterogenous region for diffuse optical tomography with a convolutional forward calculation model and the inverse neural network,” Proc. SPIE 115490, 18 (2020). [CrossRef]

21. L. Zhang, Y. Zhao, S. Jiang, B. W. Pogue, and K. D. Paulsen, “Direct regularization from co-registered anatomical images for MRI-guided near-infrared spectral tomographic image reconstruction,” Biomed. Opt. Express 6(9), 3618–3630 (2015). [CrossRef]

22. J. Prakash, C. B. Shaw, R. Manjappa, R. Kanhirodan, and P. K. Yalavarthy, “Sparse recovery methods hold promise for diffuse optical tomographic image reconstruction,” IEEE J. Select. Topics Quantum Electron. 20(2), 74–82 (2014). [CrossRef]

23. L. Zhang and G. Zhang, “Brief review on learning-based methods for optical tomography,” J. Innov. Opt. Health Sci. 12(06), 1930011 (2019). [CrossRef]

24. C. Belthangady and L. A. Royer, “Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction,” Nat. Methods 16(12), 1215–1225 (2019). [CrossRef]

25. M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, M. R. Martins, F. S. Miranda, C. Norden, R. Henriques, M. Zerial, M. Solimena, J. Rink, P. Tomancak, L. Royer, F. Jug, and E. W. Myers, “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nat. Methods 15(12), 1090–1097 (2018). [CrossRef]

26. Y. Wang, H. Pinkard, E. Khwaja, S. Zhou, L. Waller, and B. Huang, “Image denoising for fluorescence microscopy by supervised to self-supervised transfer learning,” Opt. Express 29(25), 41303–41312 (2021). [CrossRef]

27. J. Lehtinen, J. Munkberg, J. Hasselgren, S. Laine, T. Karras, M. Aittala, and T. Aila, “Noise2Noise: learning image restoration without clean data,” in Proceedings of the 35th International Conference on Machine Learning (ICML), (2018), pp. 2965–2974.

28. X. Li, G. Zhang, J. Wu, Y. Zhang, Z. Zhao, X. Lin, H. Qiao, H. Xie, H. Wang, L. Fang, and Q. Dai, “Reinforcing neuron extraction and spike inference in calcium imaging using deep self-supervised denoising,” Nat. Methods 18(11), 1395–1400 (2021). [CrossRef]

29. S. Shurrab and R. Duwairi, “Self-supervised learning methods and applications in medical imaging analysis: a survey,” arXiv, arXiv.2109.08685 (2022). [CrossRef]

30. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” Int J Comput Vis 128(7), 1867–1888 (2020). [CrossRef]

31. H. Dehghani, M. E. Eames, P. K. Yalavarthy, S. C. Davis, S. Srinivasan, C. M. Carpenter, B. W. Pogue, and K. D. Paulsen, “Near infrared optical tomography using NIRFAST: Algorithm for numerical model and image reconstruction,” Commun. Numer. Meth. Engng. 25(6), 711–732 (2009). [CrossRef]

32. A. Soubret, J. Ripoll, and V. Ntziachristos, “Accuracy of fluorescent tomography in the presence of heterogeneities: study of the normalized Born ratio,” IEEE Trans. Med. Imaging 24(10), 1377–1386 (2005). [CrossRef]

33. S. R. Arridge, “Optical tomography in medical imaging,” Inverse Probl. 15(2), R41–R93 (1999). [CrossRef]

34. A. Cong and G. Wang, “A finite-element-based reconstruction method for 3D fluorescence tomography,” Opt. Express 13(24), 9847–9857 (2005). [CrossRef]

35. Ö. Çiçek, A. Abdulkadir, S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), (Springer, 2016), pp. 424–432.

36. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), pp. 2818–2826.

37. J. Fu, J. B. Dong, and F. Zhao, “A deep learning reconstruction framework for differential phase-contrast computed tomography with incomplete data,” IEEE Trans. on Image Process. 29(1), 2190–2202 (2020). [CrossRef]

38. X. Ding, Y. Guo, G. Ding, and J. Han, “ACNet: strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks,” in 2019 IEEE International Conference on Computer Vision (ICCV), (2019), pp. 1911–1920.

39. J. Axelsson, A. K. Glaser, D. J. Gladstone, and B. W. Pogue, “Quantitative Cherenkov emission spectroscopy for tissue oxygenation assessment,” Opt. Express 20(5), 5133–5142 (2012). [CrossRef]

40. M. Jermyn, H. Ghadyani, M. A. Mastanduno, W. Turner, S. C. Davis, H. Dehghani, and B. W. Pogue, “Fast segmentation and high-quality three-dimensional volume mesh creation from medical images for diffuse optical tomography,” J. Biomed. Opt 18(8), 086007 (2013). [CrossRef]

41. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Devito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in Advances in Neural Information Processing Systems (NIPS), 2017 Workshop Autodiff (2017).

42. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

43. B. W. Pogue, X. Song, T. D. Tosteson, T. O. McBride, S. Jiang, and K. D. Paulsen, “Statistical analysis of nonlinearly reconstructed near-infrared tomographic images. I. Theory and simulations,” IEEE Trans. Med. Imaging 21(7), 755–763 (2002). [CrossRef]

44. A. P. Cuadros and G. R. Arce, “Coded aperture optimization in compressive X-ray tomography: a gradient descent approach,” Opt. Express 25(20), 23833–23849 (2017). [CrossRef]

45. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

46. G. Zhang, S. Tzoumas, K. Cheng, F. Liu, J. Liu, J. Luo, J. Bai, and L. Xing, “Generalized adaptive Gaussian Markov random field for X-ray luminescence computed tomography,” IEEE Trans. Biomed. Eng. 65(9), 2130–2133 (2018). [CrossRef]

47. J. Feng, J. Deng, Z. Li, Z. Sun, H. Dou, and K. Jia, “End-to-end Res-Unet based reconstruction algorithm for photoacoustic imaging,” Biomed. Opt. Express 11(9), 5321–5340 (2020). [CrossRef]

48. H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Günaydın, L. A. Bentolila, C. Kural, and A. Ozcan, “Deep learning enables cross-modality super-resolution in fluorescence microscopy,” Nat. Methods 16(1), 103–110 (2019). [CrossRef]

49. J. Wang, D. Agarwal, M. Huang, G. Hu, Z. Zhou, C. Ye, and N. R. Zhang, “Data denoising with transfer learning in single-cell transcriptomics,” Nat. Methods 16(9), 875–878 (2019). [CrossRef]

Metric	MSE				PSNR (dB)
Depth	6 mm	8 mm	10 mm	12 mm	6 mm	8 mm	10 mm	12 mm
Tikhonov	4.6×10⁻⁴	7.9×10⁻⁴	1.2×10⁻³	2.4×10⁻³	19.8	18.1	15.7	13.3
Graph-TV	2.7×10⁻⁴	5.3×10⁻⁴	9.4×10⁻⁴	2.1×10⁻³	22.1	20.5	17.9	16.9
DSL	1.3×10⁻⁶	2.1×10⁻⁶	3.2×10⁻⁶	6.6×10⁻⁶	36.8	35.4	34.7	32.3
Selfrec-Net	2.1×10⁻⁶	2.4×10⁻⁶	6.7×10⁻⁶	9.8×10⁻⁶	35.5	34.3	32.4	31.2
Metric	SSIM				FWHM (mm)
Depth	6 mm	8 mm	10 mm	12 mm	6 mm	8 mm	10 mm	12 mm
Tikhonov	0.72	0.69	0.63	0.56	9.52	10.7	11.2	12.1
Graph-TV	0.77	0.74	0.69	0.64	8.15	7.91	9.23	10.4
DSL	0.98	0.97	0.96	0.95	5.85	5.69	5.72	5.23
Selfrec-Net	0.97	0.96	0.94	0.92	5.87	5.68	5.39	5.18

Method	Depth	Noise level = 3%			Noise level = 5%
Method	Depth	MSE	PSNR (dB)	SSIM	MSE	PSNR (dB)	SSIM
DSL	8 mm	2.9×10⁻⁶	34.1	0.96	3.6×10⁻⁶	33.5	0.94
	10 mm	7.4×10⁻⁶	32.3	0.93	8.8×10⁻⁶	31.2	0.92
	12 mm	1.7×10⁻⁵	30.7	0.90	3.7×10⁻⁵	29.3	0.89
	14 mm	4.2×10⁻⁵	28.8	0.89	9.2×10⁻⁵	25.4	0.82
Selfrec-Net	8 mm	3.3×10⁻⁶	33.7	0.95	4.7×10⁻⁶	32.1	0.93
	10 mm	8.1×10⁻⁶	31.5	0.93	2.1×10⁻⁵	30.5	0.90
	12 mm	2.4×10⁻⁵	30.3	0.90	6.6×10⁻⁵	27.9	0.87
	14 mm	6.1×10⁻⁵	28.2	0.88	2.3×10⁻⁴	22.4	0.78

EED	DSL			Selfrec-Net
EED	MSE	PSNR(dB)	SSIM	MSE	PSNR(dB)	SSIM
2 mm	1.65×10⁻⁵	30.1	0.90	3.51×10⁻⁵	29.1	0.88
2 mm	1.29×10⁻⁴	25.6	0.84	9.75×10⁻⁵	26.3	0.85
2 mm	6.91×10⁻⁵	27.7	0.87	7.84×10⁻⁵	27.2	0.86

Metric	Error (%) in optical properties
Metric	-10%	-5%	0%	+5%	+10%
MSE	1.53×10⁻⁵	1.18×10⁻⁵	9.87×10⁻⁶	1.22×10⁻⁵	1.45×10⁻⁵
PSNR (dB)	27.8	29.5	30.6	30.4	28.2
SSIM	0.87	0.90	0.91	0.90	0.88

Metric	MSE				PSNR (dB)
Depth	6 mm	8 mm	10 mm	12 mm	6 mm	8 mm	10 mm	12 mm
Tikhonov	4.6×10⁻⁴	7.9×10⁻⁴	1.2×10⁻³	2.4×10⁻³	19.8	18.1	15.7	13.3
Graph-TV	2.7×10⁻⁴	5.3×10⁻⁴	9.4×10⁻⁴	2.1×10⁻³	22.1	20.5	17.9	16.9
DSL	1.3×10⁻⁶	2.1×10⁻⁶	3.2×10⁻⁶	6.6×10⁻⁶	36.8	35.4	34.7	32.3
Selfrec-Net	2.1×10⁻⁶	2.4×10⁻⁶	6.7×10⁻⁶	9.8×10⁻⁶	35.5	34.3	32.4	31.2
Metric	SSIM				FWHM (mm)
Depth	6 mm	8 mm	10 mm	12 mm	6 mm	8 mm	10 mm	12 mm
Tikhonov	0.72	0.69	0.63	0.56	9.52	10.7	11.2	12.1
Graph-TV	0.77	0.74	0.69	0.64	8.15	7.91	9.23	10.4
DSL	0.98	0.97	0.96	0.95	5.85	5.69	5.72	5.23
Selfrec-Net	0.97	0.96	0.94	0.92	5.87	5.68	5.39	5.18

Selfrec-Net: self-supervised deep learning approach for the reconstruction of Cherenkov-excited luminescence scanned tomography

Abstract

1. Introduction

2. Methods

2.1 Forward modeling

2.2 Deep self-supervised learning algorithm

3. Results

3.1 Reconstruction from simulated data

3.1.1 Single target

3.1.2 Robustness to noise

3.1.3 Spatial resolution test

3.1.4 Generalization test

3.1.5 Reconstruction efficiency

3.2 Reconstruction from experiment data

4. Discussion

5. Conclusions

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (10)

Tables (4)

Equations (6)

Biomedical Optics Express