Deep learning-based fusion of widefield diffuse optical tomography and micro-CT structural priors for accurate 3D reconstructions

Navid Ibtehaj Nizam; Marien Ochoa; Jason T. Smith; Xavier Intes

doi:10.1364/BOE.480091

1. Introduction

Diffuse optical tomography (DOT) remains an attractive imaging modality thanks to its unique potential to reveal the functional state of tissues via quantifying the intrinsic tissue optical properties (OPs) such as the absorption coefficient ($\mu _{a}$) [1,2]. Functional signatures of tissues have been demonstrated to be clinically useful biomarkers in oncology, particularly in breast cancer management [3]. Despite the appeal of DOT, its wide dissemination to the bedside has been hampered by several limiting factors. Along with several biological considerations, the fast acquisition of dense optical datasets enriched with spectral coding or temporal information is challenging. In that regard, there have been notable strides in developing fast 3D DOT systems thanks to widefield illumination and detection strategies [4–8]. Still, a notoriously ill-posed inverse problem hinders the application of 3D DOT in its various incarnations [9,10]. Conventional inverse solving approaches like the least squares (LSQ), Conjugate Gradient Descent (CGS), and Total Variance Minimization using an Augmented Lagrangian (TVAL) are plagued by the need for parameter optimization and heavy regularization to obtain satisfactory results. Hence, these model-based techniques are time-consuming and require lab-centric, expert approaches for optimization. Therefore, with high-powered Graphical Processing Units (GPUs) becoming widely available, the trend in recent years has been to move towards a Deep Learning (DL)-based workflow to solve the inverse problem [11,12]. In this regard, we have reported on the development of Deep Neural Network (DNN)-based solutions in Fluorescence Molecular Tomography (FMT), as well as widefield DOT using a Modified AUTOMAP (ModAM) architecture [13,14]. The work in [13] employed an efficient Monte Carlo (MC)-based in silico training scheme that was shown to replicate accurately experimental settings and provided excellent 3D reconstructions on experimental tissue-mimicking phantom datasets not seen by the network at the training/validation stage. The initially proposed ModAM architecture in [13] for k-Space Reflectance FMT (employing point detection) was seamlessly transferred to Widefield Transmission DOT in [14]. However, the reconstruction results suffer from poorer resolution than the work in [13], even for our proposed DL-based approach due to the widefield large volume probing, and hence, data feature redundancy.

A well-established method to improve the resolution in DOT is to use structural templates of the imaged sample. These templates are usually obtained via a complementary modality such as CT, ultrasound, or MRI [7,15–19]. Moreover, these approaches find utility in clinical scenarios such as in in vivo breast imaging through Digital Breast Tomosynthesis-guided DOT [20,21]. Typically, the templates are used to derive informed regularization maps that preserve tissue structural features, leading to improved spatial resolution [22–24]. Such prior-guided strategies have also been recently proposed for DL-based optical reconstructions [25,26]. Of particular interest, the work in [26] proposed an architecture called Z-Net that utilized MRI structural priors for improving the contrast of 2D optical tomography. Thus, inspired by our previous works with the ModAM architecture and the recently proposed Z-Net, we report an architecture called Z-AUTOMAP that integrates the structural priors from high-resolution spectral micro-CT along with the measurements obtained from widefield DOT to accurately reconstruct in 3D, the map of the difference between absorption coefficient of embedding(s) and the background ($\delta \mu _{a}$).

Herein, we expand on the previously proposed ModAM architecture for widefield DOT by demonstrating 3D reconstruction results both in silico and for experimental phantoms. The results show that the ModAM architecture can predict the $\delta \mu _{a}$ values with higher accuracy than a traditional regularized LSQ-based technique, with a drastically shorter reconstruction time. However, for more complicated in silico and experimental phantoms, that is, multiple structures in the same Field-of-View (FOV), the ModAM architecture, while outperforming the regularized LSQ, still reconstructs with losses in structural integrity and resolution. To overcome this issue, we employ the newly designed Z-AUTOMAP architecture to illustrate the benefit of multimodal fusion in DL. Z-AUTOMAP utilizes micro-CT structural priors in the DNN training workflow and is, therefore, able to resolve more complicated structures, both in silico and experimental, while still being accurate in reconstructing the $\delta \mu _{a}$ values. The training process for both ModAM and Z-AUTOMAP, as discussed already, is entirely in silico and makes use of a previously proposed Enhanced EMNIST (EEMNIST) dataset [13]. Moreover, the micro-CT scans for experimental phantoms are obtained from a high-powered MARS Spectral CT Scanner that provides high-resolution structural priors to reconstruct complicated phantoms. All the obtained results are benchmarked against a traditional regularized LSQ-based method and evaluated quantitatively in terms of the Mean Squared Error (MSE), Volumetric Error (VE), and Multi-Scale Structural Similarity Index (MS-SSIM). Additionally, we present an experimental phantom result obtained using a Laplacian-prior regularized LSQ-based technique for comparison.

2. Deep learning workflow

2.1 Training data generation

The in silico data generation workflow follows the methodology introduced in [27] and later successfully expanded upon and implemented in [28] and [13]. To better capture the spatial heterogeneity associated with many cancers and move away from traditional workflows in the literature that mainly uses geometric spherical shapes for training, we developed a training dataset called EEMNIST. The EEMNIST dataset consisted of characters from different databases like MNIST, FashionMNIST, Bengali.AI [29], etc. Examples of some of the characters present in the dataset are shown for clarity in Fig. 1(a). These characters are converted into 3D in silico phantoms having a range of $\delta \mu _{a}$ values (relevant to soft tissues) and a specific reduced scattering coefficient ($\mu _{s}^{'}$) value. The background OPs ($\mu ^{b}_{a}$, $\mu ^{'}_{s}$, and the anisotropy factor, $g$) are kept fixed for this work but may be varied depending upon the application. An example in silico optical phantom with a single random embedding (with random OPs) placed at a random depth from the illumination plane is illustrated in Fig. 1(b). After creating the optical phantoms, we leverage the open-source Monte-Carlo (MC)-based software, Monte-Carlo eXtreme (MCX) [30,31], to generate DOT measurement vectors. MCX is unique in its ability to model widefield illumination and detection, either generated in silico or acquired experimentally. For this paper, we employ half-space quantized patterns (quantized low frequency, phase encoding) for both illumination and detection that were first deployed in [4]. The patterns were later optimized for single-pixel DOT in [32] to reflect specific instrument features. The experimental patterns are obtained from the CCD camera, a part of our single-pixel hyperspectral system, for imaging the experimental phantoms in this paper. The resulting structured light basis for illumination and detection are illustrated in Fig. 1(c). At the same time, the MCX setup (for transmission widefield DOT) for a sample embedding is presented in Fig. 1(d). From the MCX simulation, two sets of measurement vectors are generated. The unperturbed measurement vector ($\phi _o$) is the one obtained with only background OPs present, while the perturbed measurement vector ($\phi$) is obtained with the embedding(s) present. The $\phi _o$ and $\phi$ vectors obtained from MCX for the sample embedding in Fig. 1(b) are given in Fig. 1(e). The measurement vectors are then used to generate a Rytov-normalized measurement for each sample ($log \frac {\phi _o}{\phi })$. These Rytov measurements are used as inputs for the DNNs. Additionally, we vary the depth and strength of the contrast embedded in the medium.

Fig. 1. (a) Snapshot of some of the EEMNIST characters used for training. (b) An example in silico phantom with a random embedding with a reduced scattering coefficient, $\mu ^{e}_{s^{'}}$, of $1$ $mm^{-1}$ and an absorption coefficient, $\mu _{a}^{e}$, of $0.008$ $mm^{-1}$. The background reducing scattering coefficient, $\mu ^{b}_{s^{'}}$, and background absorption coefficient, $\mu _{a}^{b}$, are $1$ $mm^{-1}$ and $0.004$ $mm^{-1}$, respectively. Hence, the $\delta \mu _{a}$ value is $0.004$ $mm^{-1}$. These values are relevant to soft tissues. The embedding is placed at a depth of $6$ mm from the illumination plane. (c) Illumination and detection bar patterns acquired experimentally and used on MCX to generate the measurement vectors. (d) Widefield transmission setup on MCX, showing the illumination and detection planes. Two cylinders (with different OPs) have been placed as embeddings for visualization. (e) Simulated perturbed ($\phi$, shown in red) and unperturbed ($\phi _{o}$, shown in blue) measurement vectors obtained from MCX for the sample embedding in (b).

Download Full Size | PDF

2.2 ModAM network

Inspired by the work in [33] that proposed AUTOMAP for K-space data in MRI, we developed the DNN ModAM for K-space Reflectance Fluorescence Tomography in [13] and later extended that work for Widefield Transmission DOT in [14]. Herein, we display the architecture of the ModAM network in Fig. 2(a). The network consists of fully connected dense and convolutional layers to translate the information from a 1D measurement vector to a 3D reconstruction. In addition, we add a dropout layer to the network to prevent overfitting. The dataset used for training the network has an 80/20 training/validation split while an Adam optimizer is deployed. Furthermore, the network uses ReLU activation. As an additional step to prevent overfitting, the network adopts a “patience” parameter for early stopping. The loss function deployed for training is the MSE. The details of the network parameters can be found in Table 1. A typical combination of training and validation loss curves is shown in Fig. 2(b). Additionally, we carry out a network stability study, similar to what our group previously performed [34], where the network is trained $10$ times (by randomly shuffling the training dataset) and a set of validation curves is generated. The results are shown in Fig. 2(c). It is observed that the network converges consistently in the range of $700-800$ iterations with a very low standard deviation between the curves. Hence, this demonstrates the stability of the network and the selected loss function (MSE) in terms of convergence.

Fig. 2. (a) The Modified AUTOMAP (ModAM) architecture used for 3D reconstructions in widefield DOT. For our work, the network inputs a Rytov-normalized measurement vector and outputs a 3D $\delta \mu _{a}$ reconstruction map of dimension $30\times 40\times 20$ $mm^3$. ReLU activation follows each convolutional layer. (b) Typical training and validation loss curves obtained by training the ModAM network.(c) A set of $10$ validation curves obtained using training the same network $10$ times by shuffling the training dataset.

Download Full Size | PDF

Table 1. Details of the Training Dataset and ModAM and Z-AUTOMAP Networks (GPU: NVIDIA GeForce RTX 2080Ti)

View Table

2.3 Z-AUTOMAP network

In [26], the authors designed a DNN architecture called Z-Net, inspired by the Y-net architecture proposed in [35]. The architecture used the structural priors obtained from 2D Dynamic-contrast Enhanced MRI images to guide reconstructions for oxy- and deoxyhemoglobin and water concentrations for 2D near-infrared spectral tomography. Z-Net combines the features of the information obtained from the two inputs, optical signals and the MRI images, in a bi-directional flow and provides three 2D spectral images as output. Based on the structure of Z-Net, we propose a novel, hybrid network called Z-AUTOMAP that takes two inputs, the DOT Rytov measurements and the 3D structure segmented from micro-CT, and outputs a single 3D map of the $\delta \mu _{a}$ values of the sample. For training in silico, the 3D Ground Truth (GT) structure (derived from the same EEMNIST character from which the Rytov measurement is generated) is normalized to an intensity map and input into the network. As we will discuss later on, a similar protocol is followed when we use experimental 3D micro-CT scans. The network structure is presented in Fig. 3(a). Unlike Z-Net, the network’s backbone is still the ModAM architecture. However, the layers have been slightly modified to adapt to the purposes of DOT-micro CT fusion. The idea of Z-AUTOMAP is to blend the features of the DOT measurements and the 3D structural priors by making them flow bi-directionally (á la Z-Net). For merging the features at each level, concatenation blocks are used. In summary, the Rytov normalized measurements are passed through three convolutional layers (forward-flowing ModAM branch) and the output of the final layer is concatenated with the 3D micro-CT mask. The 3D micro-CT mask is also fed into a convolutional layer that connects with other convolutional layers (flowing in the opposite direction to the branch that the Rytov measurement flows and hence, a backward-flowing ModAM branch). The output of matching convolutional layers (flowing in either direction) is concatenated at each level.

Fig. 3. (a) The proposed Z-AUTOMAP architecture. It consists of a bi-directional flow where the convulational layers are concatenated to combine features extracted from the normalized 3D micro-CT mask (for training purposes, they are the GT intensity masks derived from the same EEMNIST character from which the Rytov measurement is calculated) and the Rytov measurement vector. The output of the network is again a 3D $\delta \mu _{a}$ reconstruction map of dimension $30\times 40\times 20$ $mm^3$. ReLU activation is applied after each convolutional layer. (b) Typical training and validation loss curves obtained by training the Z-AUTOMAP network.(c) A set of $10$ validation curves obtained using training the same network $10$ times by shuffling the training dataset.

Download Full Size | PDF

Hence, the network can map the OPs (i.e., $\delta \mu _{a}$ values) from the DOT measurements, while the 3D micro-CT acts as a structural prior and guides the 3D reconstruction. The splitting of the learning process in the case of Z-AUTOMAP (the OPs and the 3D structure are learned separately) instead of direct mapping (as was the case with ModAM) enables the network to effectively reconstruct more complicated structures (even with widefield illumination and detection) compared to ModAM. In this case, we again employ an 80/20 training/validation split with an Adam optimizer and ReLU activation while using MSE as the loss function. Representative curves obtained during training Z-AUTOMAP are presented in Fig. 3(b). A similar study as in Fig. 2(c) is carried out for the Z-AUTOMAP network as well. Again, we obtain very stable validation curves with low standard deviations (albeit more than that obtained for the ModAM network) between them (as exhibited in Fig. 3(c)). The curves validate the network’s stable convergence with MSE as the loss function.

A complete description of the network parameters and training dataset of both the ModAM and Z-AUTOMAP networks are displayed in Table 1.

3. Results

In this section, we present the reconstruction results, obtained in silico and from experimental phantoms, for both ModAM and the proposed Z-AUTOMAP network. We compare the results against a traditional regularized LSQ-based technique. We use LSQ for comparison since it remains the most widely used model-based DOT reconstruction technique. For the regularized LSQ-based inverse solver, we adopt the process outlined in [36]. Herein, we demonstrate a progressive improvement in results as we move from LSQ to ModAM and, finally, to Z-AUTOMAP. Finally, we present the results obtained from an experimental phantom using a traditional Laplacian-prior regularized LSQ, obtained by following the method described in [24] for comparison.

3.1 In silico results

At first, to demonstrate the suitability of DL for widefield DOT reconstructions, we show the reconstruction results on a single embedding present at a shallow depth of 2 mm from the plane of illumination for the regularized LSQ-based technique and the ModAM network in Fig. 4(a). The embedding was not a part of the dataset used to train the network. The results are presented using iso-volumes, 2D cross-sections at a depth of 2 mm, and a graph showing the distribution of the reconstructed $\delta \mu _a$ values (averaged over the 2D cross-sections). In all the cases, the GT is presented as well. The reconstruction results are quantified in terms of the MSE and the VE. The GT 3D isovolume is used to calculate the VE. It is evident from the reconstruction results that ModAM outperforms the results obtained from LSQ, both qualitatively and quantitatively, as reflected by lower MSE and VE and higher MS-SSIM values.

Fig. 4. (a) The obtained in silico results for a phantom with a single embedding placed at a depth of $2 mm$ from the plane of illumination in terms of the iso-volume, 2D cross-sections at a depth of $2$ mm, and a graph showing the average distribution of $\delta \mu _{a}$ values over the 2D cross-sections. The results are tabulated quantitatively in terms of the MSE, the VE, and the MS-SSIM. (b) The results for a phantom with 3 embeddings placed at the same depth of $2$ mm and having the same $\delta \mu _{a}$ value. The results for LSQ are produced at an iso-volume $30$% of the maximum value. (c) Reconstructions results for 3 embeddings placed at 3 different depths from the plane of illumination. Each embedding has a different $\delta \mu _{a}$ value. The black lines represent the GT $\delta \mu _{a}$ values for each embedding. All the results for are produced at an iso-volume of $30\%$ of the maximum value.

Download Full Size | PDF

Regarding the reconstruction time, the regularized LSQ technique takes approximately 20 minutes to produce the output, while the corresponding time for ModAM is approximately 3 ms. Next, we present similar results for a more complicated in silico phantom where we have multiple letters in the same FOV, albeit all placed at the same shallow depth of 2 mm (Fig. 4(b)). The reconstruction results for the regularized LSQ deteriorate. In contrast, the results for ModAM are superior in terms of the 3D structural integrity and the reconstructed $\delta \mu _a$ values (as seen in the $\delta \mu _a$ distribution graph). hlThe advantage gained in the MSE, VE, and the MS-SSIM using ModAM are now more marked than the more straightforward single embedding case. Finally, in Fig. 4(c), we move onto a very complex in silico phantom with multiple letters in the same FOV but now at different depths. Each embedding also has a different $\mu _a$ value. For this case, at higher depths, the results of ModAM also deteriorate in terms of structural integrity. Although, it is evident from the $\delta \mu _a$ distribution plot that ModAM reconstructs $\delta \mu _a$ with much higher accuracy than the regularized LSQ-based method. Therefore, in such a challenging case, we deploy the Z-AUTOMAP network that has been trained using structural priors. From the results and the $\delta \mu _a$ plot, it is clear that the proposed Z-AUTOMAP network can reconstruct the phantom with a high degree of structural accuracy while maintaining accuracy in the reconstructed $\delta \mu _{a}$ values. This observation is quantitatively reflected in a much lower MSE and VE and a higher MS-SSIM than regularized LSQ and ModAM. Additionally, for the cases in (b) and (c), the gain in reconstruction times for the DNNs (both ModAM and Z-AUTOMAP) over the regularized LSQ is similar to the case in (a). Hence, the in silico results confirm the appropriateness of using multimodal fusion (with 3D micro-CT priors) in training the DNN.

3.2 Experimental phantom results

To further validate the performance of our proposed workflow, we carried out two experiments using agar-based phantoms. The DOT measurements of the experimental phantoms are acquired using the single-pixel hyperspectral system of our lab [37]. The setup employs a supercontinuum laser and DMDs to modulate the structured light patterns for illumination and detection. The detection is carried out using a 16-wavelength channel Photo-Multiplier Tube (PMT). However, for this work, we do not use the instrumentation’s hyperspectral (multi-wavelength) features and acquire the measurements (perturbed and unperturbed) at a single wavelength of $740$ nm. For each experimental phantom, a Rytov normalized measurement vector (as in the case of in silico) is produced to carry out the 3D reconstructions. For the convenience of the readers, a simplified schematic of our system is shown in Fig. 5. A more detailed description of the experimental setup can be found elsewhere [37].

Fig. 5. A simplified schematic of the experimental setup. It consists of an illumination DMD and a detection DMD in transmission configuration (along with their associated optics). The illumination DMD is fed by a single wavelength (of $740$ nm) from the Acousto-Optic Tunable Filter (AOTF) connected to a Supercontinnum (SuperK-EXR20) laser. The detection DMD applies the detection patterns and feeds the transmitted light to a 16-channel Spectro-photometer (PML-16C) which connects to a computer, containing the associated data-acquisition cards and software (SPC-150, DCC-100, and PCIe) for detecting and recording the data.

Download Full Size | PDF

The agar-based phantoms are embedded with thin capillary tubes (diameter$=3$ mm) filled with $\mu _a$ contrast. India ink and intralipid are used to generate and control the values of $\mu _a$ and $\mu _{s}^{'}$ of the embeddings and the background. For the experiments, we have phantoms of dimension $30\times 40\times 20$ $mm^3$. The first phantom is embedded with three thin capillaries at a depth of $8.5$ mm. There is a distance of $7$ mm between the centers of the capillaries. Each of the three capillaries has a different $\delta \mu _{a}$ contrast. We use a more complex phantom with three embeddings for the second experiment. In this case, one capillary is embedded at a depth of $6$ mm, while one capillary is embedded at a depth of $16$ mm. The central capillary is placed slanted extending from $6$ to $16$ mm. For this phantom, there is a distance of $10$ mm between the center of the capillaries. Here, the two side capillaries have the same $\mu _{a}$ while the slanted, central capillary has a different $\mu _{a}$ value. Detailed schematics of the experimental phantoms are shown in Figs. 6(a) and (b).

Fig. 6. (a) Reconstruction results for the first experimental phantom having three thin capillaries embedded at a depth of $8.5$ mm from the plane of illumination. Each embedding has a different $\delta \mu _{a}$ value ($0.004$ $mm^{-1}$, $0.008$ $mm^{-1}$, and $0.012$ $mm^{-1}$). (b) The results for the second experimental phantom with two of the tubes placed at depths of $6$ mm and $16$ mm from the plane of illumination. The middle tube is slanted and extends from $6$ mm to $16$ mm. The two tubes at the sides have a $\delta \mu _{a}$ value of $0.004$ $mm^{-1}$, while the middle tube has $\delta \mu _{a}$ value of $0.008$ $mm^{-1}$. The results are shown in terms of the reconstructed iso-volumes, averaged $\delta \mu _{a}$ distribution over the 2D planes, and quantitatively in terms of the MSE and the VE. The The results for LSQ, ModAM, and Z-AUTOMAP are produced at an iso-volume of $30\%$ of the maximum value. The black lines represent the GT $\delta \mu _{a}$ values for each tube. The segmented micro-CT volumes are also shown in both cases.

Download Full Size | PDF

Furthermore, to obtain the structural priors for these phantoms, micro-CT scans are obtained from a MARS spectral micro-CT scanner (AXIS Lab, RPI). Finally, a manual segmentation algorithm (based on intensity) is applied to the 2D slices of the micro-CT scans to produce the normalized 3D volumes that are used as inputs to the Z-AUTOMAP network. The segmented 3D micro-CT volumes of the two phantoms are shown in Figs. 6(a) and (b).

Of importance, the networks need not be trained again for reconstructing the experimental data owing to the spatial heterogeneity present in our dataset. Instead, the networks trained in silico are utilized for the experimental phantoms, provided that the phantom dimensions and background OPs remain unchanged.

In Fig. 6(a), we display the reconstruction results for the first phantom in terms of the isovolume and the distribution of the averaged reconstructed $\delta \mu _a$ values over the 2D planes. The GTs are also shown. It can be observed that the reconstruction results are far superior for ModAM compared to LSQ, both qualitatively and quantitatively. However, the best results are clearly obtained from the Z-AUTOMAP network, which outperforms both LSQ and ModAM. The gain in performance in deploying Z-AUTOMAP is further evident when we perform the reconstruction on the second, more complicated phantom, as presented in Fig. 6(b). Although ModAM is much more successful in resolving the three tubes than regularized LSQ, it fails to preserve the structural integrity and depth information for the middle, slanted tube. However, Z-AUTOMAP, boosted by the structural priors gleamed from micro-CT, is again successful in almost perfectly reconstructing the phantom. Furthermore, the MSE and VE (calculated using the micro-CT segmented volume as the reference) are lower for ModAM compared to regularized LSQ, while that of Z-AUTOMAP is even lower than that of ModAM. Concurrently, the MS-SSIM for Z-AUTOMAP is the highest of the three techniques. Moreover, for both cases, the gain in reconstruction times using the DNNs instead of the traditional LSQ is manifold. Therefore, the experimental results illustrate the importance of utilizing priors from a complementary modality like micro-CT in DL, especially for complex scenarios.

As mentioned previously, methods exist to incorporate priors in the traditional regularized LSQ formulation [24]. This method usually involves encoding the prior information into a Laplacian matrix, which is used in the regularized LSQ-based inversion process to guide the reconstruction. In Fig. 7, we show the reconstruction results obtained for the first experimental phantom (as shown in Fig. 6(a)). Although the Laplacian formulation (because of a strict guideline on the boundary) reduces the MSE and VE from the regularized LSQ-based technique while improving the MS-SSIM, it comes at the cost of a higher reconstruction time ($\sim 65$ minutes for the inversion). The higher reconstruction time stems from incorporating the large Laplacian matrix into the inversion equation. Also, this technique has a tendency to underestimate the reconstructed $\delta \mu _a$ values (compared to the other techniques) because of the smoothing effect imposed by the Laplacian-based regularization as seen from the distribution of the $\delta \mu _a$ values.

Fig. 7. Reconstruction results for the first experimental phantom having three thin capillaries embedded at a depth of $8.5$ mm from the plane of illumination. Each embedding has a different $\delta \mu _{a}$ value ($0.004$ $mm^{-1}$, $0.008$ $mm^{-1}$, and $0.012$ $mm^{-1}$), obtained using a Laplacian-guided regularized LSQ-based technique. As before, the 3D reconstruction results are obtained using an iso-volume of 30% of the maximum value.

Download Full Size | PDF

4. Discussion

Despite the inherent advantages of widefield DOT using structured light, the process is limited because of the high computational times required to optimize the inverse-solving process. In dealing with this disadvantage, the field has been shifting towards DL-based reconstructions to speed up the process. Previously, our group successfully introduced ModAM for k-space reflectance fluorescence tomography. In this paper, we have demonstrated and expanded the applicability of this network to quantitative widefield DOT in transmittance and for geometrical sample sizes relevant to preclinical settings. Besides improved accuracy in terms of localization and quantification, the DL approach demonstrates a significant gain in the time needed for image reconstruction after robust training.

Still, widefield tomography is inherently characterized by sensitivity profiles covering large volumes [5], which, in turn, typically leads to a loss of resolution in the transverse direction of the illumination and detection basis. This loss of resolution is highlighted in the results for the tilted capillary for both regularized LSQ and ModAM. Although the LSQ reconstructions can be improved by incorporating micro-CT structural priors in a Laplacian-based regularization technique [24], this will come at the cost of an even higher reconstruction time than the general LSQ and the need for further parameter optimization. Moreover, through the EEMINST dataset, we have tried to enable the network to reconstruct a large class of structures while also trying to account for a variation in OPs (in terms of $\delta \mu _{a}$) and physical properties (depth and number of embeddings). Other OPs (such as the scattering coefficient) and physical properties (like, thickness) may also be easily included in the dataset by adjusting ours in silico data generation workflow.

Furthermore, as preclinical imaging is preferably performed with the animal in a prone position to reduce physiological stress (or planar compression for breast imaging), this can limit widefield DOT’s ability to monitor small focal pathologies. Therefore, following well-established precepts in the field, we have investigated the DL-based fusion of widefield DOT and spectral micro-CT in optical image formation. Incorporating structural priors in DL has led to successfully reconstructing more complicated phantoms, especially with greatly enhanced fidelity in the transverse direction. Further, enhanced by our novel in silico MC-based data generation pipeline and a flexible training dataset (EEMNIST), the proposed Z-AUTOMAP architecture should be translatable to other applications of optical tomography, such as Fluorescence Lifetime Imaging (FLI) and Mesoscopic Fluorescence Molecular Tomography (MFMT).

However, there are some limitations to the proposed workflow. Firstly, we have employed the classical Rytov formulation in which the background OPs are assumed constant. This simplification allows us to efficiently generate large datasets for training the DL models. Nevertheless, in our proposed workflow, the measurement vectors are generated directly from the MCX simulation and not by using the inverse crime problem (direct multiplication of a simulated Jacobian matrix with the image). Hence, we could account for more physiologically accurate background OPs in the data generation and evaluate the robustness of DL methods in such cases. Additionally, in the current state, the DL-based workflow is more suited to be expanded to pre-clinical applications, such as imaging of ex-vivo tumors or in vivo small animal models. Hence, the dimensions of the 3D dataset used are selected having these applications in mind. If the work needs to be expanded to clinical applications, for instance, breast imaging, the dataset may be modified to account for the dimensions and thickness associated with such imaging. The simulation studies may also be enhanced with databases like DigiBreast (where X-ray mammography is used as a complementary modality) [38] in those applications. Furthermore, the focus on accounting for large spatial heterogeneity (over a range of OPs and physical properties) has come at the cost of a large dataset and long network training times (as shown in Table 1). The dataset size and training time may be reduced if we focus on specific DOT scenarios (for instance, single tumor imaging in the breast) where we try to reconstruct only a small class of structures. Also, both the traditional prior-guided reconstruction (Laplacian-based) and Z-AUTOMAP would perform poorly if a wrong structural prior is given as input. Future studies can look to design networks robust to such wrong inputs.

Moreover, as photon counting micro-CT matures, incorporating priors with soft tissue differentiation is expected even further to improve performances. Secondly, although we have used a time-gated hyperspectral system, we have yet to utilize its full features, which allows for estimating the OPs at different wavelengths and at different time points. Using the spectro-temporal approach allows us to estimate oxy- and deoxy-hemoglobin concentrations, which are more significant biomarkers for cancer. Such a workflow will be the focus of future developments. We also plan to leverage Mesh-based Monte Carlo (MMC) to allow more accurate boundary simulation than the voxel-based MCX. Thirdly, a shortcoming of the micro-CT prior generation lies in the manual processing needed for the segmentation. Manual segmentation is a limiting factor in real-time reconstructions in Z-AUTOMAP. In the future, this DL-based workflow can be further extended by having the segmented micro-CT volumes as the output of one DNN and having that output as input to Z-AUTOMAP. Such a methodology will make the DNN-based multimodal fusion more efficient and allow us to translate to real-time, preclinical, and clinical scenarios.

5. Conclusion

This work proposed a novel DNN, Z-AUTOMAP, for widefield DOT and micro-CT-based fusion for 3D $\delta \mu _a$ reconstructions. The results show a progressive improvement as we move from a traditional regularized LSQ-based method to a previously proposed unimodal (DOT-based) ModAM network to the designed multimodal network. Furthermore, the networks trained in silico are almost equally successful in reconstructing experimental phantoms. Moreover, with suitable adjustments in silico, the proposed workflow can be translated to other optical tomography paradigms. Hence, this novel pipeline promises to be a valuable tool for pre-clinical applications for visualizing tumor structure and morphology.

Funding

National Institutes of Health (R01-CA207725, R01-CA237267, R01-CA250636).

Acknowledgments

We would like to thank Mr. Mengzhou Li and Mr. Xiaodong Guo (of AXIS Lab, RPI) for providing the raw micro-CT data.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request. They are planned to be made publicly available in due course.

References

1. A. Yodh and B. Chance, “Spectroscopy and imaging with diffusing light,” Phys. Today 48(3), 34–40 (1995). [CrossRef]

2. X. Intes and B. Chance, “Non-pet functional imaging techniques: optical,” Radiol. Clin. North Am. 43(1), 221–234 (2005). [CrossRef]

3. D. Grosenick, H. Rinneberg, R. Cubeddu, and P. Taroni, “Review of optical breast imaging and spectroscopy,” J. Biomed. Opt. 21(9), 091311 (2016). [CrossRef]

4. S. Bélanger, M. Abran, X. Intes, C. Casanova, and F. Lesage, “Real-time diffuse optical tomography based on structured illumination,” J. Biomed. Opt. 15(1), 016006 (2010). [CrossRef]

5. J. Chen, V. Venugopal, F. Lesage, and X. Intes, “Time-resolved diffuse optical tomography with patterned-light illumination and detection,” Opt. Lett. 35(13), 2121–2123 (2010). [CrossRef]

6. J. Chen and X. Intes, “Comparison of monte carlo methods for fluorescence molecular tomography-computational efficiency,” Med. Phys. 38(10), 5788–5798 (2011). [CrossRef]

7. A. Muldoon, A. Kabeer, J. Cormier, M. A. Saksena, Q. Fang, S. A. Carp, and B. Deng, “Method to improve the localization accuracy and contrast recovery of lesions in separately acquired x-ray and diffuse optical tomographic breast imaging,” Biomed. Opt. Express 13(10), 5295–5310 (2022). [CrossRef]

8. T. Li, Z. Qin, X. Hou, M. Dan, J. Li, L. Zhang, Z. Zhou, and F. Gao, “Multi-wavelength spatial frequency domain diffuse optical tomography using single-pixel imaging based on lock-in photon counting,” Opt. Express 27(16), 23138–23156 (2019). [CrossRef]

9. S. R. Arridge and J. C. Schotland, “Optical tomography: forward and inverse problems,” Inverse Problems 25(12), 123010 (2009). [CrossRef]

10. J. P. Angelo, S.-J. K. Chen, M. Ochoa, U. Sunar, S. Gioux, and X. Intes, “Review of structured light in diffuse optical imaging,” J. Biomed. Opt. 24(7), 071602 (2018). [CrossRef]

11. H. Ben Yedder, A. BenTaieb, M. Shokoufi, A. Zahiremami, F. Golnaraghi, and G. Hamarneh, “Deep learning based image reconstruction for diffuse optical tomography,” in International Workshop on Machine Learning for Medical Image Reconstruction, (Springer, 2018), pp. 112–119.

12. J. Yoo, S. Sabir, D. Heo, K. H. Kim, A. Wahab, Y. Choi, S.-I. Lee, E. Y. Chae, H. H. Kim, Y. M. Bae, Y.-W. Choi, S. Cho, and J. C. Ye, “Deep learning diffuse optical tomography,” IEEE Trans. Med. Imaging 39(4), 877–887 (2020). [CrossRef]

13. N. Nizam, M. Ochoa, J. T. Smith, and X. Intes, “3d k-space reflectance fluorescence tomography via deep learning,” Opt. Lett. 47(6), 1533–1536 (2022). [CrossRef]

14. N. I. Nizam, M. Ochoa, J. T. Smith, S. Gao, and X. Intes, “Monte Carlo-based data generation for efficient deep learning reconstruction of macroscopic diffuse optical tomography and topography applications,” J. Biomed. Opt. 27(8), 083016 (2022). [CrossRef]

15. V. Ntziachristos, A. Yodh, M. D. Schnall, and B. Chance, “Mri-guided diffuse optical spectroscopy of malignant and benign breast lesions,” Neoplasia 4(4), 347–354 (2002). [CrossRef]

16. C. Xu, H. Vavadi, A. Merkulov, H. Li, M. Erfanzadeh, A. Mostafa, Y. Gong, H. Salehi, S. Tannenbaum, and Q. Zhu, “Ultrasound-guided diffuse optical tomography for predicting and monitoring neoadjuvant chemotherapy of breast cancers: recent progress,” Ultrasonic Imaging 38(1), 5–18 (2016). [CrossRef]

17. A. T. Luk, S. Ha, F. Nouizi, D. Thayer, Y. Lin, and G. Gulsen, “A true multi-modality approach for high resolution optical imaging: photo-magnetic imaging,” Proc. SPIE 8937, 89370G (2014). [CrossRef]

18. R. Baikejiang, W. Zhang, and C. Li, “Diffuse optical tomography for breast cancer imaging guided by computed tomography: A feasibility study,” J. X-Ray Sci. Technol. 25(3), 341–355 (2017). [CrossRef]

19. Q. Zhu and S. Poplack, “A review of optical breast imaging: Multi-modality systems for breast cancer diagnosis,” Eur. J. Radiol. 129, 109067 (2020). [CrossRef]

20. E. Y. Chae, H. H. Kim, S. Sabir, Y. Kim, H. Kim, S. Yoon, J. C. Ye, S. Cho, D. Heo, K. H. Kim, Y. M. Bae, and Y.-W. Choi, “Development of digital breast tomosynthesis and diffuse optical tomography fusion imaging for breast cancer detection,” Sci. Rep. 10(1), 13127 (2020). [CrossRef]

21. S. Yun, Y. Kim, H. Kim, S. Lee, U. Jeong, H. Lee, Y.-W. Choi, and S. Cho, “Three-compartment-breast (3CB) prior-guided diffuse optical tomography based on dual-energy digital breast tomosynthesis (DBT),” Biomed. Opt. Express 12(8), 4837–4851 (2021). [CrossRef]

22. S. C. Davis, H. Dehghani, J. Wang, S. Jiang, B. W. Pogue, and K. D. Paulsen, “Image-guided diffuse optical fluorescence tomography implemented with Laplacian-type regularization,” Opt. Express 15(7), 4066–4082 (2007). [CrossRef]

23. L. Zhang, F. Gao, H. He, and H. Zhao, “Three-dimensional scheme for time-domain fluorescence molecular tomography based on Laplace transforms with noise-robust factors,” Opt. Express 16(10), 7214–7223 (2008). [CrossRef]

24. A. Ale, R. B. Schulz, A. Sarantopoulos, and V. Ntziachristos, “Imaging performance of a hybrid X-ray computed tomography-fluorescence molecular tomography system using priors,” Med. Phys. 37(5), 1976–1986 (2010). [CrossRef]

25. Y. Zou, Y. Zeng, S. Li, and Q. Zhu, “Machine learning model with physical constraints for diffuse optical tomography,” Biomed. Opt. Express 12(9), 5720–5735 (2021). [CrossRef]

26. J. Feng, W. Zhang, Z. Li, K. Jia, S. Jiang, H. Dehghani, B. W. Pogue, and K. D. Paulsen, “Deep-learning based image reconstruction for MRI-guided near-infrared spectral tomography,” Optica 9(3), 264–267 (2022). [CrossRef]

27. R. Yao, M. Ochoa, P. Yan, and X. Intes, “Net-flics: fast quantitative wide-field fluorescence lifetime imaging with compressed sensing–a deep learning approach,” Light: Sci. Appl. 8(1), 26–27 (2019). [CrossRef]

28. J. T. Smith, E. Aguénounon, S. Gioux, and X. Intes, “Macroscopic fluorescence lifetime topography enhanced via spatial frequency domain imaging,” Opt. Lett. 45(15), 4232–4235 (2020). [CrossRef]

29. S. Alam, T. Reasat, R. M. Doha, and A. I. Humayun, “Numtadb-assembled Bengali handwritten digits,” arXiv, arXiv:1806.02452 (2018). [CrossRef]

30. Q. Fang and D. A. Boas, “Monte Carlo simulation of photon migration in 3d turbid media accelerated by graphics processing units,” Opt. Express 17(22), 20178–20190 (2009). [CrossRef]

31. R. Yao, X. Intes, and Q. Fang, “Direct approach to compute Jacobians for diffuse optical tomography using perturbation Monte Carlo-based photon replay,” Biomed. Opt. Express 9(10), 4588–4603 (2018). [CrossRef]

32. Q. Pian, R. Yao, L. Zhao, and X. Intes, “Hyperspectral time-resolved wide-field fluorescence molecular tomography based on structured light and single-pixel detection,” Opt. Lett. 40(3), 431–434 (2015). [CrossRef]

33. B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image reconstruction by domain-transform manifold learning,” Nature 555(7697), 487–492 (2018). [CrossRef]

34. J. T. Smith, R. Yao, N. Sinsuebphon, A. Rudkouskaya, N. Un, J. Mazurkiewicz, M. Barroso, P. Yan, and X. Intes, “Fast fit-free analysis of fluorescence lifetime imaging via deep learning,” Proc. Natl. Acad. Sci. 116(48), 24019–24030 (2019). [CrossRef]

35. H. Lan, D. Jiang, C. Yang, F. Gao, and F. Gao, “Y-net: Hybrid deep learning image reconstruction for photoacoustic tomography in vivo,” Photoacoustics 20, 100197 (2020). [CrossRef]

36. S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “An interior-point method for large-scale ℓ₁-regularized least squares,” IEEE J. Sel. Top. Signal Process. 1(4), 606–617 (2007). [CrossRef]

37. Q. Pian, R. Yao, N. Sinsuebphon, and X. Intes, “Compressive hyperspectral time-resolved wide-field fluorescence lifetime imaging,” Nat. Photonics 11(7), 411–414 (2017). [CrossRef]

38. B. Deng, D. H. Brooks, D. A. Boas, M. Lundqvist, and Q. Fang, “Characterization of structural-prior guided optical tomography using realistic breast models derived from dual-energy X-ray mammography,” Biomed. Opt. Express 6(7), 2366–2379 (2015). [CrossRef]

Network	ModAM	Z-AUTOMAP
Size of Dataset	250000(1, 2, or 3embeddings)
OPs	$δ μ_{a} = 0.002 - 0.2 m m^{- 1}$ depth= 2-16 mm $μ_{s}^{^{'}} = 1 m m^{- 1}$ g=0.90No of photons (in simulation)= $10^{8}$
Time to Generate Dataset	30 hrs
Training/Validation Split	80/20
Batch Size	32
No. of epochs	796	832
Loss Function	MSE
Optimizer	Adam
Learning Rate	$10^{- 5}$
Training Time	$\sim$ 17 hours	$\sim$ 19 hours

Deep learning-based fusion of widefield diffuse optical tomography and micro-CT structural priors for accurate 3D reconstructions

Abstract

1. Introduction

2. Deep learning workflow

2.1 Training data generation

2.2 ModAM network

2.3 Z-AUTOMAP network

3. Results

3.1 In silico results

3.2 Experimental phantom results

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (1)

Biomedical Optics Express