Deep learning facilitates fully automated brain image registration of optoacoustic tomography and magnetic resonance imaging

Yexing Hu; Yexing Hu; Berkan Lafci; Berkan Lafci; Berkan Lafci; Artur Luzgin; Artur Luzgin; Hao Wang; Hao Wang; Jan Klohs; Xose Luis Dean-Ben; Xose Luis Dean-Ben; Ruiqing Ni; Ruiqing Ni; Ruiqing Ni; Daniel Razansky; Daniel Razansky; Daniel Razansky; Wuwei Ren

doi:10.1364/BOE.458182

1. Introduction

Multispectral optoacoustic tomography (MSOT) can provide new functional and molecular imaging capabilities by combining the rich contrast of optical imaging with the high resolution of ultrasound at depths up to several centimeters within biological tissues [1]. This has been exploited in preclinical studies involving small animals, such as mice, to visualize vascular anatomy and hemodynamic responses, cancer angiogenesis or neuronal activity [2–9]. MSOT was further shown to be a valuable tool in several clinical applications such as the diagnosis of breast cancer or melanomas [10–12]. The operational principle of MSOT is based on the detection of acoustic signals induced by excitation with short laser pulses at multiple optical wavelengths. This enables differentiating spectrally-distinctive endogenous and extrinsically-administered contrast agents via spectral unmixing, which represents a powerful approach for visualizing molecular activity [13–15]. However, despite its wide use in biomedical research applications [16], in vivo MSOT suffers from poor soft-tissue contrast, thus identification of organs or tissue types remains challenging. In comparison, magnetic resonance imaging (MRI) is a well-established high-resolution imaging modality that provides rich and versatile soft-tissue contrasts [17]. MRI is capable of delivering multiplex structural information exploiting different mechanisms of contrast generation [18], which can help localize the specific molecular information derived from MSOT signals. A multimodal strategy combining MSOT and MRI can then offer highly complementary information, thus an accurate and robust registration algorithm capable of matching these two modalities can be of great value in many biomedical fields.

Previously reported MSOT-MRI registration methods were based on manual registration to certain extent [19–21]. A straightforward piecewise linear mapping algorithm to co-register MSOT and MRI images was proposed to solve the registration problem [22]. The concentration maps of different fluorescence probes and hemoglobin provided by MSOT can be localized with the anatomical reference given by MRI. The transformation matrix was determined based on manually selected fiducial markers. We recently developed a semi-automated MSOT-MRI registration toolbox named RegOA based on the self-adaptive optimization of mutual information (MI) [23]. The accuracy and robustness of RegOA have been demonstrated using in vivo and ex vivo mouse brain data. Alternatively, an integrated framework for registration of MSOT and MRI data based on combining a novel MRI animal holder and a landmark-based software co-registration algorithm was also proposed [24]. The protocol was shown to significantly improve the registration between the two modalities for both the entire body and a localized tumor, thus enabling more precise multi-modal tumor characterization and staging. Regardless of either software- or hardware-based protocols used, the registration methods mentioned above involve human intervention at a certain degree. For example, the first step of the RegOA protocol requires preprocessing and segmentation of the MSOT and MRI images [25], for which an initial rough estimation of a brain contour should be defined by the users. Therefore, the registration accuracy and speed are highly dependent on the experience of the users.

Recently, deep learning (DL) has shown a powerful impact on optoacoustic imaging from several different aspects, including image understanding [26], image reconstruction [27] and quantitative analysis [28]. A detailed review of this topic can be found in [29–31]. Nevertheless, to the best of our knowledge, the application of DL in any registration involving optoacoustics has not been reported previously. Apart from that, DL has demonstrated its effectiveness in mono- and multi-modal image registration targeted on well-established imaging modalities such as CT, MRI, and ultrasound [32]. In general, the learning strategies can be classified into unsupervised and supervised strategies [32]. The former class can avoid manual annotation by exploiting the unlabeled images without any ground truth. For example, a spatial transformer network (STN) was proposed as a new method of unsupervised learning [33] and was used in the task of T1-weighted (T1W)-T2-weighted(T2W) MRI image registration by inserting into an existing convolutional architecture to facilitate the spatial manipulation of data [34]. Similarly, a Generative Adversarial Networks (GAN)-based deformable registration method was applied to T1W-T2W MRI mapping without the requirement of pre-aligned multi-modal image pairs for a training purpose [35]. An end-to-end unsupervised learning system with fully convolutional neural networks was applied to MRI-CT registration [36]. The major disadvantages of unsupervised learning lie in high complexity and long computational time. In contrast, the latter class of supervised learning-based registration requires the ground truth of the registration results. In return, these methods usually have better performance than unsupervised methods. A FLowNet-adapted architecture using optical flow estimation was proposed to improve the accuracy of MRI-ultrasound registration [37]. A weakly supervised method that learns a convolution neural network (CNN)-formed registration function was used to predict a spatial deformation mapping MRI images [38].

There are several obstacles to be considered before applying a DL-based method in the automated registration between MSOT and MRI. First, the intrinsic image contrast of optoacoustic images originating from initial pressure changes and light absorption is significantly different from the contrast in conventional modalities, such as CT and MRI. Therefore, any translation of DL-based registration methods from the existing multimodal pipeline (e.g., CT-MRI [36], or T1W-T2W MRI [39]) may not effectively work. Second, despite the existing various system configurations, MSOT images feature some unique noise and artifact patterns [40] that can potentially hinder the direct application of any DL-based method. Third, the choice between various supervised and unsupervised networks should be determined depending on the availability of the measurement or simulation datasets and the specific purpose of the registration task.

Herein, we introduce a supervised DL-based method for the registration of MSOT and MRI images for the first time. Experimental datasets acquired from commercially available MSOT and MRI scanners were labeled and used for training. A U-Net structure of image segmentation was designed to eliminate the influence of background noise of MSOT/MRI images. Afterwards, a transformation matrix was obtained via a customized CNN, which accurately maps the MSOT image onto the fixed MRI image space. We also compared an unsupervised method with our method. All results were quantitively analyzed. Our proposed data processing pipeline is shown to reduce manual intervention to a minimum and improve the registration accuracy; thus, this fully automated method can greatly enhance the high-throughput analysis of multi-modal imaging of MSOT and MRI.

2. Methods

The MSOT-MRI registration procedure consists of three steps (Fig. 1), i.e., 1) acquisition of MSOT and MRI images and preparation of training datasets, 2) segmentation of the brain region on the MSOT and MRI datasets, and 3) image transformation mapping MSOT data onto MRI data. Separate deep neural networks were developed for steps 2 and 3 as described below. The registration results were evaluated by comparing the positions of manually selected reference landmarks in both images. Three networks were involved in the DL-based MSOT-MRI registration pipeline. We used PyTorch 3.8.0 with CUDA 11.0 to build and train all networks.

Fig. 1. The workflow of the proposed DL-based MSOT-MRI registration method, containing three steps, i.e., data acquisition, image segmentation, and transformation. In the first stage of data acquisition, MSOT and MRI datasets were acquired with a MSOT system equipped with a ring-shaped ultrasound transducer and a high-field preclinical MRI scanner respectively (left panel). Secondly, the brain region of a mouse was segmented by using a U-Net-like architecture with a selected convolution kernel and padding size for different input and output on each dataset (middle panel). Finally, the segmented MSOT and MRI images were used for training a CNN network and yielding a transformation matrix that can map the MSOT image onto the MRI reference. An overlaid MSOT-MRI image rendered with the DL-based registration method is shown (right panel).

Download Full Size | PDF

2.1 Data acquisition and preparation of MSOT and MRI images

C57BL/6 mice (N=5, aged 16 month-old, female) were used to acquire MRI and MSOT images as described previously [20]. Animals were housed in ventilated cages inside a temperature-controlled room under a 12-h dark/light cycle. Pelleted food (3437PXL15, CARGILL) and water were provided ad-libitum. All experiments were performed in accordance with the Swiss Federal Act on Animal Protection and were approved by the Cantonal Veterinary Office Zurich (permit number: ZH090-16). All procedures fulfilled the ARRIVE 2.0 guidelines on reporting animal experiments [41].

All MSOT images were acquired by using a commercial MSOT system (inVision 128, iThera Medical, Germany) as described in detail in [19,42,43]. Briefly, a tunable pulsed laser was used to illuminate the sample with multiple wavelengths between 680 nm and 980 nm. The generated ultrasound waves were detected by 128 cylindrically distributed transducers with a central frequency of 5 MHz and bandwidth of 60 %. The raw data was reconstructed by using a filtered back-projection algorithm implemented on a graphics processing unit (GPU). Imaging was performed at five wavelengths (715, 730, 760, 800, and 850 nm) on coronal slices, with 10 times averaging per slice. Rostral-caudal brain coverage was extended over a length of approximately 12 mm with a step size of 0.3 mm resulting in 40 slices. The reconstructed MSOT images were analyzed using single wavelength without any spectral unmixing.

The MRI data were acquired by a 7T preclinical MRI scanner (Bruker Biospin GmbH, Ettlingen, Germany) equipped with an actively shielded gradient set of 760 mT/m with 80 ms rise time and operated with a Paravision 6.0 software platform (Bruker Biospin GmbH, Germany) [44–46]. A circularly polarized volume resonator for signal transmission and an actively decoupled mouse brain quadrature surface coil with an integrated combiner and preamplifier for signal receiving were used. T2W anatomical reference images were acquired in coronal and sagittal orientations. A spin-echo sequence was used with rapid acquisition relaxation enhancement, echo time = 33 ms; relaxation time = 2500 ms; rapid acquisition relaxation enhancement factor = 8; 15 sagittal slices of 1 mm thickness; interslice distance = 0, field-of-view (FOV) = 20 $\times$ 20 mm²; image matrix = 256 $\times$ 256; spatial resolution = 78 $\times$ 78 mm². For each mouse brain in the current study, a T2W anatomical image of 1 mm thickness was acquired approximately at Bregma −1.46 mm with the current setting of FOV.

The datasets used in this study consist of three parts: 1) Dataset-1: T2W anatomical MRI images of mouse brains and the corresponding segmented brain masks; 2) Dataset-2: Anatomical MSOT images of mouse brains and the corresponding segmented brain masks; 3) Dataset-3: Ground truth of registered MSOT-MRI image pairs. For Dataset-1, 75 slices of T2W MRI images and the corresponding brain masks were selected. The segmentation masks were generated by using active contour model [47], which operates in a semi-automatic way. To increase training dataset variability, the MRI dataset was augmented by applying rotating and scaling operations [48], yielding 600 pairs of MRI images and the corresponding masks. For Dataset-2, 415 anatomy-segmentation pairs were generated out of the raw MSOT images acquired from 5 mice. For Dataset-3, 415 pairs of MSOT-MRI images were acquired. Random affine transformations were used on the MSOT dataset to augment the dataset, finally to 21,051 pairs. The ground truth of the registration was generated by experts using a manual registration method. To ease the data processing, the image dimensions of all MSOT and MRI images, including the masks, were set to 256 $\times$ 256.

2.2 Image segmentation of the brain region

To segment the brain region in both MRI and MSOT images, we utilized a U-Net structure that was initially proposed to solve segmentation problems with a limited number of image inputs [49]. The MRI segmentation network is schematically depicted in Fig. 2. For small datasets, the U-Net structure is expected to have a better performance than traditional CNNs [49]. The MRI segmentation network consists of a down-sampling path (Fig. 2, upper left) and an up-sampling path (Fig. 2, upper right). The network takes a 256 $\times$ 256-sized MRI image as input and 4 levels of down sampling were used. In the first down-sampling level, a convolution on the input image using a 3 $\times$ 3 kernel size with stride 1, padding size of 1 and Rectified Linear Unit (ReLU) produces 64 feature channels. Another convolution using a 3 $\times$ 3 kernel size with stride 1, padding size of 1 and ReLU was applied next. After that, a 2 $\times$ 2 max pooling operation was used and the channel size was doubled. Each encoder level has the same convolution and max pooling parts. 1024 feature channels were involved in the last level. In the up-sampling path, a 2 $\times$ 2 up-sampling with nearest-neighbor interpolation was used and the channel size was halved. Subsequently, these channels were concatenated with the channels in the down-sampling part which have the same size as up-sampling channels. Two 3 $\times$ 3 convolution with stride 1, padding size of 1 and ReLU activation were used. In the end, binary masks were generated by the network, which represents the segmentation results of MRI. A similar network architecture was designed for MSOT image segmentation (Fig. 2, upper).

Fig. 2. Architecture of MRI and MSOT segmentation networks and image transformation network. Upper: A MSOT/MRI image sized 256 $\times$ 256 pixels ($D_{1}$, left side) is used as the input of the network. The binary segmentation masks of the brain region ($D_{2}$, right side) were used as the ground truth. The middle panel represents a U-Net-like structure with the boxes denoting the features to be extracted at each step (encoder in dark green; decoder in light green). The number of channels and the size of features are marked on the top and left of the box. Lower: Architecture of the image transformation network. A 256 $\times$ 256 masked MSOT mouse brain image ($D_{3}$) and a 256 $\times$ 256 masked mouse brain MRI image ($D_{4}$) are used as the input. Firstly, MSOT and MRI images were concatenated and taken as a new input into a convolution part. After the convolution part, the feature map was flattened and a linear part was used to predict the parameters in the transformation matrix.

Download Full Size | PDF

The difference between the MRI segmentation network and the MSOT segmentation network was mainly in the input images and network hyperparameters, including learning rate, epochs, and batch size. Both MRI dataset and MSOT dataset for segmentation were separated as 70 percent for training and 30 percent for validation. The hyperparameters are given below. 1) MRI segmentation: learning rate = 0.00001, batch size = 128, training epochs for convergence = 200; 2) MSOT segmentation: learning rate = 0.00005, batch size = 32, training epochs for convergence = 5000. Besides, the convolution part of each encoder and decoder level is different. The MSOT segmentation network has kernel size = 5 $\times$ 5 and padding = 2 for the first convolution and kernel size = 3 $\times$ 3 and padding = 1 for the second convolution.

A binary cross-entropy loss was chosen as the target function for both networks. Specifically, $D_{1}$ and $D_{2}$ denote the unsegmented and segmented input images, respectively. A transform $T_{1}$ is used to transform $D_{1}$ into a binary mask $T_{1}(D_{1};\theta _{1})$, where $\theta _{1}$ represents an additional network parameter. The parameter $\theta _{1}$ is learned from training, whilst the transformation $T_{1}(D_{1};\theta _{1})$ minimizes the following loss function $L_{1}$:

(1)$$L_{1}(D_{1},D_{2},T_{1};\theta_{1}) ={-}\frac{1}{N}\sum_{N} D_{2}\log T_{1}(D_{1};\theta_{1})+(1-D_{2})\log (1-T_{1}(D_{1};\theta_{1})),$$

$N$ is the total number of the pixels. $L_{1}$ is minimized using an Adam optimizer [50]. To keep the image size consistent, all convolution layers in the network were set with a padding size of 1.

2.3 MSOT-MRI image transformation

Once the brain region was segmented, the MSOT image was then mapped onto the MRI image using image transformation (Fig. 2, lower). To achieve this, an optimized transformation matrix was generated based on the segmentation results from the previous step. Considering the minimal elastic changes caused by sequential MSOT-MRI data acquisition of the mouse brain, an affine transformation was adopted based on the segmented brain regions. The affine transform is defined as

(2)$$J(x,y) = A \cdot I(x,y),$$

where the transformation matrix was expressed as the 3 $\times$ 3 matrix:

(3)$$A = \begin{bmatrix} a & b & sx\\ c & d & sy\\ 0 & 0 & 1 \end{bmatrix}.$$

More specifically, the parameters a, b, c, and d indicate the scaling, rotating, and shearing on the input image, while the parameters sx, sy indicate image translation. The positions in the moving space, or projected space, is denoted as $J(x,y) = [x',y',1]^{T}$, whereas the position of the fixed space is denoted as $I(x,y) = [x,y,1]^{T}$. The parameters in the transformation matrix were optimized through training a CNN network (Fig. 2, lower). For the CNN structure, the segmented MSOT and MRI images were first concatenated as the input of the CNN network. In the first layer, 16 convolution channels with 7 $\times$ 7 convolution kernel, stride 1 and padding = 3 were used with ReLU and another 16 convolution channels with 5 $\times$ 5 convolution kernel, stride 1 and padding = 2 setting with maxpool and ReLU were considered. In the second layer, 32 convolution channels with 7 $\times$ 7 convolution kernel, stride 1 and padding = 3 were used with ReLU and another 32 convolution channels with 5 $\times$ 5 convolution kernel, stride 1 and padding = 2 setting with maxpool and ReLU were considered. The same structure was used twice more with 16 more channels for each layer. Five fully connected layers with sizes of 1024, 512, 256, 128, and 64 were used with ReLU. Finally fully connected layers with a size of 6 lead to 6 parameters of affine transformation. Paired MSOT-MRI datasets from the same mouse brain were used for training the transformation network.

The target function of the transformation network was given by the root mean square error (RMSE) between the trained transformation and the ground truth, which were obtained with manual registration method. Specifically, $D_{3}$ and $D_{4}$ denote the MSOT and MRI images, respectively. An affine transform $T_{2}(D_{3};\theta _{2})$ is applied to $D_{3}$ for aligning with $D_{4}$, where the unknown parameter $\theta _{2}$ is learned from training. The transformation $T_{2}$ minimizes the loss function $L_{2}$ given by:

(4)$$L_{2}(D_{3},D_{4},T_{2};\theta_{2}) = \sqrt{\frac{1}{N}\sum_{N}T_{2}(D_{3};\theta_{2})^{2}-D_{4}^{2}},$$

$N$ denotes the size of the image. Similarly, $L_{2}$ is minimized using an Adam optimizer with a weighting decay factor of 0.001 [33]. The learning rate was set at 0.0003 and the batch size was 128. 80 percent of the dataset were set for training and 20 percent of the dataset were set for validation. After 1000 epochs, the optimized parameters with a minimum loss on the validation set were obtained. We applied these parameters on the whole training set.

2.4 Evaluation

2.4.1 Evaluation of brain segmentation

Firstly, we evaluated the accuracy of the segmentation algorithm on both MSOT and MRI images. The data used for validation purposes were separated from those used for training the MRI segmentation and MSOT segmentation networks. In total, 415 slices out of all original MSOT dataset were obtained from 5 mice. 81 MSOT images from one mouse were selected as the test set, while 334 MSOT images from another 4 mice were selected as the training and validation sets. 48 MRI images were selected as the test set and 552 MRI images were selected as the training set and the validation set. Images of the test set and the training set are from different mice.

For both MSOT and MRI segmentation, a Dice coefficient [51] was introduced to evaluate the obtained result, which is commonly used to assess the accuracy and robustness of image segmentation. The Dice coefficient can be calculated based on the true positive (TP), false positive (FP), and false negative (FN) ratios by comparing pixel by pixel. TP means that both the ground truth and the result are positive. FP means that the ground truth is negative but the result is positive. FN means that the ground truth is negative but the result is positive. The expression of the Dice coefficient is given below:

(5)$$Dice = \frac{2TP}{2TP+FP+FN}.$$

The value of Dice ranges from 0 to 1, with 0 representing a non-overlapped case and 1 representing the highest spatial correlation [51].

2.4.2 Evaluation of MSOT-MRI registration

Once image registration is done, it is important to assess the positions of several key anatomical reference landmarks. Herein, an index of Target Registration Error (TRE) [52] was introduced to quantify the accuracy of the MSOT-MRI registration method. To measure TRE, several representative reference landmarks were selected on both the original and transformed images by experts. Then, we calculated the distance between each point pair in the two images. The TRE value of a single landmark is formulated as Eq. (6):

(6)$$TRE = \sqrt{(x-x')^{2}+(y-y')^{2}},$$

Here $x$ and $y$ denote the coordinate values of the point on the transformed image, whereas $x'$ and $y'$ are the values for the point on the reference image. Regularly, multiple landmarks were selected in which case, the mean value and standard deviation of TRE can be measured.

2.4.3 Computational cost

We tested the computational performance of the proposed DL-based method and the previously reported MI-based method on three different datasets. Dataset-1 contains 81 pairs of raw MSOT input and raw MRI input. Dataset-2 contains 562 pairs of MSOT images after affine transform and raw MRI input. Dataset-3 contains 1000 pairs of MSOT images after affine transform and MRI images after affine transform. All transformed data were generated from raw images. The results were shown in Table 1. The MI-method test and the whole training of all networks and computation were performed on a desktop computer with the CPU i9 - 10980XE (3 GHz) and GPU RTX 3090 (10496 cores). It should be noted that the segmented images were used in MI-based algorithm as input. The most time-consuming part of MI-based algorithm namely segmentation is not included in this table. For MRI images, there exist several automated segmentation methods as atlas-based methods [53], Markov-Random-Field-based methods [54,55] and DL-based methods [51]. However, to the best of our knowledge, fully automated MSOT segmentation methods are less reported. In other words, the segmentation results need to be generated manually or semi-automated, which takes several minutes even for experts. The required time for manual segmentation is difficult to quantify as it highly depends on the experience of users. Therefore, we did not add this part to the total time. In this sense, the proposed DL-based method shows a large computational advantage over the MI-based method.

Table 1. Comparison of computational time between different methods

View Table | View all tables in this article

3. Results

3.1 Segmentation of MSOT and MRI images

The MSOT and MRI images were set as the input of their corresponding segmentation networks without any preprocessing (Fig. 3(a), (e)). The network successfully segmented a brain region from the whole head anatomy and background noise in both MSOT and MRI images. Such a region was represented by a binary mask (Fig. 3(c), (g)). For comparison, the ground truth of segmentation is illustrated in Figs. 3(b) and 3(f). The brain region in the MRI image was clearly defined and segmented due to the high soft-tissue contrast (Fig. 3(d)). Regardless of the appearance of strong background noise in MSOT images, the MSOT segmentation network was capable of outlining the boundary of the brain region (Fig. 3(h)). The Dice coefficient between MRI mask and its ground truth was 0.991. Notably, a high Dice value of 0.989 was achieved using the MSOT segmentation network, which further demonstrates the accuracy of the DL-based segmentation.

Fig. 3. Segmentation results for MSOT and MRI images. A raw T2W MRI image (a) or a MSOT image (e) were set as the input of the segmentation network structure (both in a coronal view). For comparison, the ground truth of segmentation was given for each modality (b, f). (c) and (g) show the segmentation results in a form of binary images. (d) and (h) are the overlaid MR and MSOT images with contours illustrating the brain region.

Download Full Size | PDF

In order to prevent over-fitting and get full use of the limited dataset for more robust networks, 10-folds cross-validation was applied for choosing the best parameters. The Dice coefficients of each fold test are given in Table 2. The means and standard deviations of the 10-fold validation were 0.990 $\pm$ 0.001 and 0.979 $\pm$ 0.002 for MRI and MSOT, respectively. Both of the mean values were close to 1 which demonstrated the robustness of both segmentation networks.

Table 2. Dice similarity coefficients of the DL-based MRI and MSOT segmentation for a cross-validation purpose

View Table | View all tables in this article

3.2 MSOT-MRI transformation

After MSOT and MRI images were segmented, the binary masks were used as the input of the follow-up transformation network, which can predict the transformation matrix for mapping the MSOT image onto the MRI image. The parameters of the transformation matrix were learned through the transformation network. Then, MSOT images were wrapped by using the generated transformation matrix. Here, we compared the final registration results by using the suggested DL-based method and MI-based method [23]. To validate that the segmentation part plays an essential role in our method, we also added a method that uses the same registration network with unsegmented image inputs. For the same coronal-view MSOT slice, the DL-based method with segmentation and the MI method achieved similar results, while the DL-based method without segmentation resulted in poor registration (Fig. 4). The boundaries of both the head and brain regions match well with those of MRI images as shown in the overlaid MSOT-MRI image pairs for our method.

Fig. 4. Three different datasets of MSOT-MRI image pairs were used to test different registration methods. Raw MSOT and MRI images in the first two rows were used as the input for each registration method. The overlaid MSOT-MRI images from the third to sixth rows represent the registration results using DL-based segmentation, DL-based without segmentation, MI-based, and manual registration methods, respectively. In all three datasets, the transformed MSOT images using the DL-based segmentation and MI-based methods reach a good agreement and are similar to the result achieved by manual registration. Only DL-based without segmentation method leads to bad results. For validation purposes, five anatomical landmarks were selected in both MSOT and MRI images to calculate the TRE in a later step. Among these five landmarks, landmarks 1, 2, and 3 are located in the upper layers of the cerebral cortex, whereas landmarks 4, and 5 are located in the piriform cortex of the brain. All five points are recognized in both modalities by experts.

Download Full Size | PDF

We applied the TRE index to evaluate the final registration results and to quantitatively assess the performance of the suggested DL-based method. Five reference landmarks were manually selected by an expert. As shown in Fig. 4, landmarks of points 1-3 were located at the upper layer of the cerebral cortex, whereas points 4, 5 correspond to the piriform cortex of the brain. These landmarks were selected because of the appearance of vessels at all five points, making them highly visible and recognized in both MSOT and MRI images. The TRE values of each landmark point indicate the differences between the registration results and manual results at the specific reference point. A smaller TRE value represents a better registration result. The TREs of these points were calculated for the performance evaluation and compared with the MI method (Table 2). We calculated the average TRE values of these five landmarks for all images in Dataset-1. For Dataset-2 and Dataset-3, we randomly selected 100 image pairs and calculated their average TRE values. Dataset-2 and Dataset-3 were transformed from Dataset-1, thus the value can also be the reference of these two datasets. The TRE results of Dataset-1, Dataset-2, and Dataset-3 were shown in Table 3.

Table 3. Target Registration Error values for the MI-based and DL-based registration

View Table | View all tables in this article

There exist some failure cases when we were using the MI-based method, where severely distorted MSOT-MRI inputs may cause the divergence of the MI-based optimization procedure. In this situation, the DL-based method can still achieve accurately mapping results that are comparable to the manual ones. One of the special cases is shown in Fig. 5.

Fig. 5. In a special case where the MSOT input (a) is distorted by a randomized affine transform, the DL-based method can still correctly map the MSOT image to the MRI image (b) compared with the manual result (c, e). However, the MI-based method leads to a completely incorrect result (d), because the MI-based optimization procedure diverges, generating a mismatched transformation matrix.

Download Full Size | PDF

3.3 Appearance of noise in input images

In order to show the robustness of our network, a noise test was made for our method. The noise that appears in MRI images can be described using a Rician distribution [56]. We denote the image pixel intensity in the absence of noise as A and the measured pixel intensity as M. In the presence of noise, the probability distribution $P_{M}(M)$ for M can be defined by:

(7)$$p_{M}(M) = \frac{M}{\sigma^{2}}e^{-(M^{2}+A^{2})/2\sigma^{2}}I_{0}(\frac{A\cdot M}{\sigma^{2}}),$$

where $I_{0}$ is a modified $0^{th}$-order Bessel function [57] of the first kind and $\sigma$ denotes the standard deviation of the Gaussian noise in the real and imaginary images. The additional noise level is thus defined as $\sigma$. We have tested the task of registering the raw MSOT image with MRI images containing different levels of noise (no noise, $\sigma$ = 5 and $\sigma$ = 10). Dataset-4 contains noise generated from Dataset-1 with $\sigma$ = 5, and Dataset-5 contains noise generated from Dataset-1 with $\sigma$ = 10. The average TREs of Dataset-4 and Dataset-5 were shown in Table 3. Both segmentation results of MRI and the final overlaid MSOT-MRI images are shown in Fig. 6.

Fig. 6. Different levels of Rician-type noise were added to the MRI image for testing the robustness of our segmentation method. For Dataset-4, medium-level noise appearance in MRI ($\sigma$ = 5) does not alter the segmentation result as shown in the first two rows. For Dataset-5, increasing the noise level ($\sigma$ = 10) slightly changes the segmented part of the brain in the lower left corner (second column, bottom). After the transformation using the network, the outputs are shown in the third column. The overlaid results (fourth column) are compared with the manual registration results (fifth column), indicating that our registration method has a high tolerance for different noise levels.

Download Full Size | PDF

4. Discussion

Similar to many other molecular imaging methods including PET and single photon emission computed tomography (SPECT), there is a rising trend of multi-modal imaging that combines the emerging MSOT technology with a high-resolution structural imaging modality such as MRI or computed tomography (CT) [58,59]. In an analogy to the clinically established PET-CT [60], the combination of MSOT and MRI can provide both molecular and anatomical information, which is better than PET-CT as CT has no soft tissue contrast. Multi-modal MSOT-MRI can be implemented in either a sequential mode with successive data acquisition [19] on each modality or by simultaneously using a truly hybrid imaging device [61]. In both cases, image registration is required. Although several MSOT-MRI registration methods have been previously reported [19–21], manual selection of reference points or manual segmentation was always needed. In contrast, the registration algorithm proposed in this work is based on a deep learning method involving no human intervention if the network is properly trained, thus enabling full automation of the whole process. This fully automated workflow requires only raw MSOT and MRI images without any pre-processing step like smoothing or de-noising. The speed of the suggested method was also shown to be significantly higher than the previously reported methods [19–21]. In addition, no technical background was needed to perform the registration task, thus users can obtain registered images more rapidly and more directly. The deep-learning-based MSOT-MRI method can thus be used in high throughput data analysis in preclinical research as well as early diagnosis for future clinical application.

Both MSOT and MRI segmentation networks have been shown to provide high accuracy and can be used individually. Deep learning techniques have extensively been used for segmentation [62]. Additionally, a few deep-learning-based methods have successfully been used for MSOT image reconstruction [25]. The suggested network then enriches MSOT processing choices. Combined with the MRI segmentation and transformation networks, it provides fast and expert-level MSOT-MRI registration results. As shown in Fig. 4, the DL-based transformation method achieved similar performance compared with a MI-based method on both raw MSOT and MRI input and transformed inputs. This was further validated by using TRE measures (Table 3). The DL-based method also shows more robustness compared with the MI-based method, as we show that the MI-based algorithm leads to poor registration in Fig. 5, while in all 3 datasets, no totally unregistered results appeared using our DL-based method. As shown in Table 1, the computational time of the proposed method was significantly lower than that of existing MI-based registration and manual registration methods. Besides, once the model was trained, no user experience in interpreting MSOT or MRI images is required. Therefore, the DL-based registration approach leads to a high-throughput data analysis in multi-modal MSOT imaging.

In our proposed registration framework, the step of image segmentation plays an important role in improving the final registration quality. MSOT images feature a ring-shaped background noise caused by the reconstruction process. The first U-Net-like network facilitates automated segmentation of the brain section and removal of a large portion of background noise. Subsequently, we have trained our registration network using both segmented and unsegmented raw MSOT/MRI as the input. The resulted registered images of these two methods are compared (Fig. 4) demonstrating the necessity of segmentation in our method. An intuitive reason for this is that the segmented part in both images in fact trains the network focusing on a small region of interest suppressing the strange influence of the background noise appeared in MSOT, making the registration results more accurate. We also tested other networks using both segmented and unsegmented raw MSOT/MRI as the input, and draw a similar conclusion (supplementary material).

Furthermore, the proposed method can potentially be employed for registering 3D datasets of the whole brain [21,63–65]. The first approach is by performing 2D registration slice-by-slice across the whole MSOT-MRI image stack. This approach assumes that the animal holder of the MSOT moves along the horizontal direction, which is parallel to the central axis of the MR scanner bore. The assumption is mostly true for brain registration tasks. Showing well aligned high-resolution coronal planes requires the same slice thickness and gap between adjacent slices of MRI and MSOT. We have showcased the registered 3D stack of MSOT and MRI datasets using our proposed pipeline (Fig. 7). The second approach is 3D registration between MSOT-MRI datasets directly, i.e., finding the 3D-3D transformation matrix. The 3D-3D registration is independent of the assumption of central axis parallelism of MSOT and MRI; however, it is computationally more expensive. We plan to design a new network using labeled 3D image pairs as the training input.

Fig. 7. The registered 3D stacks of MSOT and MRI.

Download Full Size | PDF

Notably, the registration workflow can be conveniently adapted to other types of MRI or optoacoustic imaging datasets. In this work, T$_{2}$-weighted MRI data were used to train the network. Other types of MRI images, such as diffusion MRI or MR angiography, can also be used as the training data [18]. Similarly, the raw MSOT image input used in this work can be replaced by unmixed MSOT images signaling the appearance of specific molecules, e.g., deoxygenated and oxygenated hemoglobin. In the current study, MSOT images at wavelengths of 715, 730, 760, 800, and 850 $nm$ were successfully segmented, but the network structure can be also extended to other types of optoacoustic images. Thus, an even wider range of wavelengths can be tried using this method without significant modifications of network structures, reducing the time cost for different settings in a traditional registration method. More generally, other modal imaging frameworks such as MSOT-CT, MSOT-ultrasound, or even PET-MRI [66] could potentially benefit from the suggested DL-based registration workflow. In addition, by segmenting each brain region, the registration can be applied to a standard brain atlas. In this way, MSOT images can be mapped with an atlas, helping researchers to better localize the optoacoustic signal.

Despite the success of the DL-based registration method, there are still some limitations in this work. Firstly, the validated data are still limited to the two-dimensional cross-sectional MSOT images acquired from the ring-shaped transducer array. The correctness of both segmentation and transformation networks has not yet been tested on other types of optoacoustic images, in particular, the reconstructed MSOT from a spherical transducer array [67] or optoacoustic microscopic images [68]. Increasing the training data size or image types could potentially facilitate a wide application of our proposed method. Secondly, the up-sampling procedure of both MRI segmentation and MSOT segmentation networks influences the final registration results because of the randomness of up-sampling interpolation part [69]. Network structure without sampling part can avoid this problem. The input of MSOT can also be improved by using fluence correction or non-negative reconstruction [23,64, 70, 71].

5. Conclusion

In this work, we proposed a novel DL-based method for registering MSOT and MRI images. The suggested method features full automation without the need for user experience and saves tremendous time, thus, it can be used for a high-throughput MSOT data analysis. The simultaneously rendered molecular and structural information of the MSOT-MRI multimodal imaging method can then be highly valuable in biomedical research.

Funding

Helmut Horten Stiftung; Universität Zürich (MEDEF-20021); Vontobel-Stiftung; Stiftung Synapsis - Alzheimer Forschung Schweiz AFS (2017CDA-03); Swiss Data Science Center (C19-04); ShanghaiTech University.

Acknowledgments

We thank Dr. Mark-Aurel Augath (ETH Zurich) for the guidance in MRI measurements. X.L.D.B and R.N received funding from the Helmut Horten Stiftung.

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. A. Oraevsky and A. Karabutov, “Optoacoustic Tomography,” in Biomedical Photonics Handbook (CRC Press, 2003), Chapter 34.

2. X. L. Deán-Ben, T. F. Fehm, M. Gostic, and D. Razansky, “Volumetric hand-held optoacoustic angiography as a tool for real-time screening of dense breast,” J. Biophotonics 9(3), 253–259 (2016). [CrossRef]

3. B. McLarney, M. A. Hutter, O. Degtyaruk, X. L. Deán-Ben, and D. Razansky, “Monitoring of stimulus evoked murine somatosensory cortex hemodynamic activity with volumetric multi-spectral optoacoustic tomography,” Front. Neurosci. 14, 536 (2020). [CrossRef]

4. M. R. Tomaszewski, G. I. Quiros, J. P. O’Connor James, A. Oshaani, G. J. Parker, K. J. Williams, F. J. Gilbert, and S. E. Bohndiek, “Oxygen enhanced optoacoustic tomography (OE-OT) reveals vascular dynamics in murine models of prostate cancer,” Theranostics 7(11), 2900–2913 (2017). [CrossRef]

5. B. Lafci, E. Mercep, J. L. Herraiz, X. L. Deán-Ben, and D. Razansky, “Noninvasive multiparametric charac-terization of mammary tumors with transmission-reflection optoacoustic ultrasound,” Neoplasia 22(12), 770–777 (2020). [CrossRef]

6. S. Gottschalk, O. Degtyaruk, B. M. Larney, J. Rebling, M. A. Hutter, X. L. Deán-Ben, S. Shoham, and D. Razansky, “Rapid volumetric optoacoustic imaging of neural dynamics across the mouse brain,” Nat. Biomed. Eng. 3(5), 392–401 (2019). [CrossRef]

7. X. L. Deán-Ben, S. Gottschalk, B. McLarney, S. Shoham, and D. Razansky, “Advanced optoacoustic methods for multiscale imaging of in vivo dynamics,” Chem. Soc. Rev. 46(8), 2158–2198 (2017). [CrossRef]

8. M. Xu and L. V. Wang, “Photoacoustic imaging in biomedicine,” Rev. Sci. Instrum. 77(4), 041101 (2006). [CrossRef]

9. W. Huang, K. Wang, Y. An, H. Meng, and S. Zhang, “In vivo three-dimensional evaluation of tumour hypoxia in nasopharyngeal carcinomas using FMT-CT and MSOT,” Eur. J. Nucl. Med. Mol. Imaging 47(5), 1027–1038 (2020). [CrossRef]

10. S. A. Ermilov, T. Khamapirad, A. Conjusteau, M. H. Leonard, R. Lacewell, K. Mehta, T. Miller, and A. Oraevsky, “Laser optoacoustic imaging system for detection of breast cancer,” J. Biomed. Opt. 14(2), 024007 (2009). [CrossRef]

11. V. Neuschmelting, H. Lockau, V. Ntziachristos, J. Grimm, and M. F. Kircher, “Lymph node micrometastases and in-transit metastases from melanoma: in vivo detection with multispectral optoacoustic imaging in a mouse model,” Radiology 280(1), 137–150 (2016). [CrossRef]

12. A. A. Oraevsky, E. V. Savateeva, S. V. Solomatin, A. A. Karabutov, and T. Khamapirad, “Diagnostic imaging of breast cancer microvasculature with optoacoustic tomography,” in Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society [Engineering in Medicine and Biology] (2002), Volume 3, 2329–2330.

13. D. Razansky, C. Vinegoni, and V. Ntziachristos, “Multispectral photoacoustic imaging of fluorochromes in small animals,” Opt. Lett. 32(19), 2891–2893 (2007). [CrossRef]

14. D. Razansky, M. Distel, C. Vinegoni, R. Ma, N. Perrimon, R. W. Köster, and V. Ntziachristos, “Multispectral opto-acoustic tomography of deep-seated fluorescent proteins in vivo,” Nat. Photonics 3(7), 412–417 (2009). [CrossRef]

15. A. C. Stiel, X. Deán-Ben, Y. Jiang, V. Ntziachristos, D. Razansky, and G. G. Westmeyer, “High-contrast imaging of reversibly switchable fluorescent proteins via temporally unmixed multispectral optoacoustic tomography,” Opt. Lett. 40(3), 367–370 (2015). [CrossRef]

16. A. A. Oraevsky, “Three-dimensional optoacoustic tomography: preclinical research and clinical applications,” Med. Phys. 38(6Part29), 3763 (2011). [CrossRef]

17. C. Rijswijk, M. Geirnaerdt, P. Hogendoorn, J. Peterse, F. Coevorden, A. Taminiau, R. Tollenaar, B. Kroon, and J. Bloem, “Dynamic contrast-enhanced MR imaging in monitoring response to isolated limb perfusion in high-grade soft tissue sarcoma: initial results,” Eur. Radiol. 13(8), 1849–1858 (2003). [CrossRef]

18. R. W. Brown, Y. C. N. Cheng, E. M. Haacke, M. R. Thompson, and R. Venkatesan, Magnetic Resonance Imaging: Physical Principles and Sequence Design (John Wiley and Sons, 1999).

19. R. Ni, M. Vaas, W. Ren, and J. Klohs, “Noninvasive detection of acute cerebral hypoxia and subsequent matrix-metalloproteinase activity in a mouse model of cerebral ischemia using multispectral-optoacoustic-tomography,” Neurophotonics 5(01), 1 (2018). [CrossRef]

20. R. Ni, M. Rudin, and J. Klohs, “Cortical hypoperfusion and reduced cerebral metabolic rate of oxygen in the arcAb mouse model of Alzheimer’s disease,” Med. Phys. 10, 38–47 (2018). [CrossRef]

21. R. Ni, X. L. Deán-Ben, D. Kirschenbaum, M. Rudin, and J. Klohs, “Whole brain optoacoustic tomography reveals strain-specific regional beta-amyloid densities in Alzheimer‘s disease amyloidosis models,” bioRxiv:2020.02.25.964064 (2020).

22. A. B. Attia, C. J. H. Ho, P. Chandrasekharan, G. Balasundaram, H. C. Tay, N. C. Burton, K. Chuang, V. Ntziachristos, and M. Olivo, “Multispectral optoacoustic and MRI coregistration for molecular imaging of orthotopic model of human glioblastoma,” J. Biophotonics 9(7), 701–708 (2016). [CrossRef]

23. W. Ren, H. Skulason, F. Schlegel, M. Rudin, J. Klohs, and R. Ni, “Automated registration of magnetic resonance imaging and optoacoustic tomography data for experimental studies,” Neurophotonics 6(02), 1 (2019). [CrossRef]

24. M. Gehrung, M. Tomaszewski, D. McIntyre, J. Disselhorst, and S. Bohndiek, “Co-registration of optoacoustic tomography and magnetic resonance imaging data from murine tumour models,” Photoacoustics 18, 100147 (2020). [CrossRef]

25. B. Lafci, E. Mercep, S. Morscher, X. L. Deán-Ben, and D. Razansky, “Deep learning for automatic segmentation of hybrid optoacoustic ultrasound (OPUS) images,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr. 68(3), 688–696 (2021). [CrossRef]

26. U. S. Alqasemi, P. D. Kumavor, A. Aguirre, and Q. Zhu, “Recognition algorithm for assisting ovarian cancer diagnosis from coregistered ultrasound and photoacoustic images: ex vivo study,” J. Biomed. Opt. 17(12), 126003 (2012). [CrossRef]

27. A. DiSpirito, D. Li, T. Vu, M. Chen, D. Zhang, J. Luo, R. Horstmeyer, and J. Yao, “Reconstructing undersampled photoacoustic microscopy images using deep learning,” IEEE Trans. Med. Imaging 40(2), 562–570 (2021). [CrossRef]

28. C. Cai, K. Deng, C. Ma, and J. Luo, “End-to-end deep neural network for optical inversion in quantitative photoacoustic imaging,” Opt. Lett. 43(12), 2752–2755 (2018). [CrossRef]

29. H. Deng, H. Qiao, Q. Dai, and C. Ma, “Deep learning in photoacoustic imaging: a review,” J. Biomed. Opt. 26(04), 040901 (2021). [CrossRef]

30. C. Yang, H. Lan, and F. Gao, “Deep learning for photoacoustic imaging: a survey,” arXiv:2008.04221 (2020).

31. W. Ren, B. Ji, Y. Guan, L. Cao, and R. Ni, “Recent technical advances in accelerating the clinical translation of small animal brain imaging: hybrid imaging, deep learning, and transcriptomics,” Front. Med. 9, 1 (2022). [CrossRef]

32. Y. Fu, Y. Lei, T. Wang, W. J. Curran, T. Liu, and X. Yang, “Deep learning in medical image registration: a review,” Phys. Med. Biol. 65(20), 20TR01 (2020). [CrossRef]

33. M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial Transformer Networks,” Advances in 517 neural information processing systems 28 (2015).

34. G. Zhang, W. Guo, L. Kong, Z. Gong, D. Zhao, C. He, and C. Guo, “Unsupervised learning-based registration for T1 and T2 breast MRI images,” The Fourth International Symposium on Image Computing and Digital Medicine (2020), pp. 225–228.

35. C. Qin, B. Shi, R. Liao, T. Mansi, D. Rueckert, and A. Kamen, “Unsupervised deformable registration for multi-modal images via disentangled representations,” International Conference on Information Processing in Medical Imaging (Springer, 2019), pp. 249–261.

36. S. Shan, W. Yan, X. Guo, E. Chang, Y. Fan, and Y. Xu, “Unsupervised end-to-end learning for deformable medical image registration,” arXiv:1711.08608 (2017).

37. Y. Hu, M. Marc, G. Eli, W. Li, G. Nooshin, B. Ester, G. Wang, B. Steven, M. Caroline, and E. Mark, “Weakly-supervised convolutional neural networks for multimodal image registration,” Med. Image Anal. 49, 1–13 (2018). [CrossRef]

38. J. Lv, M. Yang, J. Zhang, and X. Wang, “Respiratory motion correction for free-breathing 3D abdominal MRI using CNN-based image registration: a feasibility study,” BJR 91, 20170788 (2018). [CrossRef]

39. A. Sedghi, J. Luo, A. Mehrtash, S. Pieper, C. M. Tempany, T. Kapur, P. Mousavi, and W. M. Wells III, “Semi-supervised deep metrics for image registration,” arXiv:1804.01565 (2018).

40. M. Zhou, H. Xia, H. Zhong, J. Zhang, and F. Gao, “A noise reduction method for photoacoustic imaging in vivo based on EMD and conditional mutual information,” IEEE Photonics J. 11(1), 1–10 (2019). [CrossRef]

41. N. Sert, V. Hurst, A. Ahluwalia, S. Alam, M. Avey, M. T. Baker, W. J. Browne, A. Clark, I. C. Cuthill, and U. Dirnagl, “The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research,” Exp. Physiol. 105, 1–2 (2020). [CrossRef]

42. M. Vaas, R. Ni, M. Rudin, A. Kipar, and J. Klohs, “Extracerebral Tissue Damage in the Intraluminal Filament Mouse Model of Middle Cerebral Artery Occlusion,” Front. Neurol. 8, 85 (2017). [CrossRef]

43. R. Ni, A. Villois, X. L. Deán-Ben, Z. Chen, and J. Klohs, “In-vitro and in-vivo characterization of CRANAD-2 for multi-spectral optoacoustic tomography and fluorescence imaging of amyloid-beta deposits in Alzheimer mice,” Photoacoustics 23, 100285 (2021). [CrossRef]

44. R. Ni, D. R. Kindler, R. Waag, M. Rouault, P. Ravikumar, R. Nitsch, M. Rudin, G. G. Camici, L. Liberale, L. Kulic, and J. Klohs, “fMRI Reveals Mitigation of Cerebrovascular Dysfunction by Bradykinin Receptors 1 and 2 Inhibitor Noscapine in a Mouse Model of Cerebral Amyloidosis,” Front. Aging Neurosci. 11, 27 (2019). [CrossRef]

45. R. Ni, Y. Zarb, G. A. Kuhn, R. Müller, Y. Yundung, R. M. Nitsch, L. Kulic, A. Keller, and J. Klohs, “SWI and phase imaging reveal intracranial calcifications in the P301L mouse model of human tauopathy,” MAGMA 33(6), 769–781 (2020). [CrossRef]

46. A. Massalimova, R. Ni, R. M. Nitsch, M. Reisert, D. von Elverfeldt, and J. Klohs, “Diffusion Tensor Imaging Reveals Whole-Brain Microstructural Changes in the P301L Mouse Model of Tauopathy,” Neurodegener Dis. 20(5-6), 173–184 (2020). [CrossRef]

47. M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int. J. Comput. Vision 1(4), 321–331 (1988). [CrossRef]

48. L. Taylor and G. Nitschke, “Improving Deep Learning using Generic Data Augmentation,” IEEE Symposium Series on Computational Intelligence (SSCI) (2018), pp. 1542–1547 2017.

49. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (2015).

50. D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 (2014).

51. F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV) (2016), 565–571.

52. J. M. Fitzpatrick and J. B. West, “The Distribution of Target Registration Error in Rigid-Body, Point-Based Registration,” IEEE Trans. Med. Imaging 20(9), 917–927 (2001). [CrossRef]

53. J. Lee, J. Jomier, S. Aylward, M. Tyszka, and M. Styner, “Evaluation of atlas based mouse brain segmentation,” Proc. SPIE 7259, 725943–725949 (2009). [CrossRef]

54. A. A. Ali, A. M. Dale, A. Badea, and G. A. Johnson, “Automated segmentation of neuroanatomical structures in multispectral MR microscopy of the mouse brain,” NeuroImage 27(2), 425–435 (2005). [CrossRef]

55. H. Min, P. Rong, T. Wu, and A. Badea, “Automated Segmentation of Mouse Brain Images Using Extended MRF,” NeuroImage 46(3), 717–725 (2009). [CrossRef]

56. H. Gudbjartsson and S. Patz, “The Rician distribution of noisy MRI data,” Magn. Reson. Med. 34(6), 910–914 (1995). [CrossRef]

57. C. C. Furnas, “Evaluation of the modified Bessel function of the first kind and zeroth order,” Amer. Math. Monthly 37(6), 282–287 (1930). [CrossRef]

58. Q. You, K. Zhang, J. Liu, C. Liu, and Y. Yang, “Persistent Regulation of Tumor Hypoxia Microenvironment via a Bioinspired Pt-Based Oxygen Nanogenerator for Multimodal Imaging-Guided Synergistic Phototherapy,” Adv. Sci. 7(17), 1903341 (2020). [CrossRef]

59. G. Bell, G. Balasundaram, A. Attia, F. Mandino, M. Olivo, and I. P. Parkin, “Functionalised iron oxide nanoparticles for multimodal optoacoustic and magnetic resonance imaging,” J. Mater. Chem. B 7(13), 2212–2219 (2019). [CrossRef]

60. D. W. Townsend, J. P. Carney, J. T. Yap, and N. C. Hall, “PET/CT today and tomorrow,” J. Nucl. Med. 45(Suppl 1), 4S–14S (2004).

61. W. Ren, X. L. Dean-Ben, M. A. Augath, and D. Razansky, “Development of concurrent magnetic resonance imaging and volumetric optoacoustic tomography: A phantom feasibility study,” J. Biophotonics 14(2), e202000293 (2021). [CrossRef]

62. C. Belthangady and L. A. Royer, “Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction,” Nat. Methods 16(12), 1215–1225 (2019). [CrossRef]

63. X. L. Dean-Ben and D. Razansky, “Adding fifth dimension to optoacoustic imaging: volumetric time-resolved spectrally enriched tomography,” Light: Sci. Appl. 3(1), e137 (2014). https://www.nature.com/articles/lsa201418#supplementary-information [CrossRef] .

64. X. L. Deán-Ben, G. Sela, A. Lauri, M. Kneipp, V. Ntziachristos, G. G. Westmeyer, S. Shoham, and D. Razansky, “Functional optoacoustic neuro-tomography for scalable whole-brain monitoring of calcium indicators,” Light: Sci. Appl. 5(12), e16201 (2016). [CrossRef]

65. X. L. Dean-Ben, J. Robin, R. Ni, and D. Razansky, “Noninvasive three-dimensional optoacoustic localization microangiography of deep tissues,” arXiv:2007.00372 (2020).

66. C. Hage, F. Gremse, C. M. Griessinger, A. Maurer, S. H. L. Hoffmann, F. Osl, B. J. Pichler, F. Kiessling, W. Scheuer, and T. Pöschinger, “Comparison of the Accuracy of FMT/CT and PET/MRI for the Assessment of Antibody Biodistribution in Squamous Cell Carcinoma Xenografts,” J. Nucl. Med. 59(1), 44–50 (2018). [CrossRef]

67. X. L. Dean-Ben and D. Razansky, “Portable spherical array probe for volumetric real-time optoacoustic imaging at centimeter-scale depths,” Opt. Express 21(23), 28062–28071 (2013) [CrossRef]

68. Z. Chen, A. Ozbek, J. Rebling, Q. Zhou, X. L. Dean-Ben, and D. Razansky, “Multifocal structured illumination optoacoustic microscopy,” Light: Sci. Appl. 9(1), 152 (2020). [CrossRef]

69. R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Mach. Learn. 8(3-4), 229–256 (1992). [CrossRef]

70. M. Schweiger and S. Arridge, “The Toast++ software suite for forward and inverse modeling in optical tomography,” J. Biomed. Opt. 19(4), 040801 (2014). [CrossRef]

71. W. Ren, H. Isler, M. Wolf, J. Ripoll, and M. Rudin, “Smart Toolkit for Fluorescence Tomography: Simulation, Reconstruction, and Validation,” IEEE Trans. Biomed. Eng. 67(1), 16–26 (2020). [CrossRef]

Datasets	Dataset-1	Dataset-2	Dataset-3	Speed Per Registration
DL-based	5.11s	15.01s	28.20s	0.029s
MI-based	91.19s	541.3s	1115s	1.064s
By experts		-		>60s

Registration method	Dataset-1	Dataset-2	Dataset-3	Dataset-4	Dataset-5
DL-based w/ seg	7.373	8.051	8.233	7.653	7.532
DL-based w/o seg	26.413	20.017	25.764	29.121	29.455
MI-based	6.812	7.859	8.244	9.418	9.358

Datasets	Dataset-1	Dataset-2	Dataset-3	Speed Per Registration
DL-based	5.11s	15.01s	28.20s	0.029s
MI-based	91.19s	541.3s	1115s	1.064s
By experts		-		>60s

Registration method	Dataset-1	Dataset-2	Dataset-3	Dataset-4	Dataset-5
DL-based w/ seg	7.373	8.051	8.233	7.653	7.532
DL-based w/o seg	26.413	20.017	25.764	29.121	29.455
MI-based	6.812	7.859	8.244	9.418	9.358

Deep learning facilitates fully automated brain image registration of optoacoustic tomography and magnetic resonance imaging

Abstract

1. Introduction

2. Methods

2.1 Data acquisition and preparation of MSOT and MRI images

2.2 Image segmentation of the brain region

2.3 MSOT-MRI image transformation

2.4 Evaluation

2.4.1 Evaluation of brain segmentation

2.4.2 Evaluation of MSOT-MRI registration

2.4.3 Computational cost

3. Results

3.1 Segmentation of MSOT and MRI images

3.2 MSOT-MRI transformation

3.3 Appearance of noise in input images

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data Availability

Supplemental document

References

Supplementary Material (1)

Data Availability

Cited By

Figures (7)

Tables (3)

Equations (7)

Biomedical Optics Express