## Abstract

Bioluminescence imaging is a very sensitive imaging modality, used in preclinical molecular imaging. However, in its planar projection form, it is non-quantitative and has poor spatial resolution. In contrast, bioluminescence tomography (BLT) promises to provide three dimensional quantitative source information. Currently, nearly all BLT reconstruction algorithms in use employ the diffusion approximation theory to determine light propagation in tissues. In this process, several approximations and assumptions that are made severely affect the reconstruction quality of BLT. It is therefore necessary to develop novel reconstruction methods using high-order approximation models to the radiative transfer equation (RTE) as well as more complex geometries for the whole-body of small animals. However, these methodologies introduce significant challenges not only in terms of reconstruction speed but also for the overall reconstruction strategy. In this paper, a novel fully-parallel reconstruction framework is proposed which uses a simplified spherical harmonics approximation (SPN). Using this framework, a simple linear relationship between the unknown source distribution and the surface measured photon density can be established. The distributed storage and parallel operations of the finite element-based matrix make *SP _{N}*-based spectrally resolved reconstruction feasible at the small animal whole body level. Performance optimization of the major steps of the framework remarkably improves reconstruction speed. Experimental reconstructions with mouse-shaped phantoms and real mice show the effectiveness and potential of this framework. This work constitutes an important advance towards developing more precise BLT reconstruction algorithms that utilize high-order approximations, particularly second-order self-adjoint forms to the RTE for

*in vivo*small animal experiments.

©2009 Optical Society of America

## 1. Introduction

Bioluminescence Imaging (BLI) has become an increasingly important tool for *in vivo* preclinical research [1][2]. Currently, planar (two-dimensional) BLI is the conventional imaging modality used in optical imaging. Optical photons emitted from bioluminescence sources in the small animal body (usually mice) propagate in biological tissues and are detected once emitted from the subject surface. Since *in vivo* biological tissues have high absorption and scattering characteristics, planar BLI only indirectly reflects the activity of the targeted biological object via the surface photon distribution [3]. Optical signals are easily confounded by the tissue properties in *in vivo* mouse experiments, especially when monitoring the initiation and progression of tumors over time within the same subject. The aim of BLT is to quantitatively acquire 3*D* information of bioluminescence sources, significantly improving the information quality of bioluminescence imaging.

Monte Carlo (MC) methods used with statistical models can obtain very precise simulations by tracking photon propagation in biological tissues [4]. However, these methods are severely time-consuming. Solving the radiative transfer equation (RTE) (i.e. Boltzmann equation) can also obtain precise simulation results as there is no Poisson noise in the simulation data [5]. Currently, nearly all 3*D* BLT reconstruction algorithms are based on the diffusion equation, which is a simple approximation to the RTE [6][7] [8][9]. Simulation and experimental reconstructions have shown that the diffusion approximation (DA) introduces significant artifacts in BLT reconstruction [3][10]. Therefore, it is necessary to develop high-order approximation-based reconstruction methods to improve BLT reconstruction. The first- and second-order formulations of the RTE are usually used to directly solve the RTE [5]. Discrete ordinates (*S _{N}*) and spherical harmonics (

*P*) methods, as two usual numerical approximations, can yield simulation solutions based on two types of formulations. Compared with first-order formulations, the operators acting on the second-order forms such as the even-odd parity (EOP) equations are self-adjoint [5]. This distinct advantage provides a straightforward application of the finite element methods (FEM) easily executed on complex heterogeneous geometries [11]. Furthermore, the acquired FEM matrix in the second-order equations is sparse positive-definite (SPD), which yields better numerical stability and efficiency and benefits the development of reconstruction algorithms [12]. In order to generate an accurate simulation model, and regardless of the first- and second-order equations, one has to set

_{N}*N*as large as possible and then

*N*(

*N*+2) and (

*N*+1)

^{2}coupled equations corresponding to

*S*and

_{N}*P*methods need to be solved. This computational complexity, especially for the whole-body of small animals, creates a substantial challenge in the development of novel BLT reconstruction algorithms. Recently, a novel type of second-order approximation form, the simplified spherical harmonics (

_{N}*SP*) method, has been developed for optical imaging [13], improving computational efficiency. Furthermore, a fully parallel adaptive FEM method was proposed to improve the simulation speed [14]. However, to obtain more accurate BLT reconstructions, a novel BLT reconstruction framework needs to be developed with the radiative transfer-based approximations to the RTE.

_{N}BLT is an inverse source problem and in the general case, its solution is not unique [15]. *A priori* information plays an indispensable role in BLT reconstruction. Among the various types of *a priori* information, multispectral measurements are important for achieving BLT reconstructions [16][17][18][19]. However, spectrally-resolved data sets can significantly increase the computational burden, especially when non-contact measurements are made using highly sensitive CCD cameras which acquire detailed surface photon distribution. In addition, due to the curved surface topography of the mouse and the necessity of using the heterogeneous characteristics of mouse tissues, numerical reconstruction algorithms, such as those based on FEM, are more suitable compared to analytical and statistical modeling-based methods [4].

BLT reconstruction is, in principle, similar to that of single photon emission computed tomography (SPECT) and positron emission tomography (PET) imaging. Therefore, reconstruction algorithms appropriate for PET and SPECT can be introduced to realize the BLT reconstruction [17]. In this case, the system response *P*-matrix needs to be computed, which is a very time-consuming step, although it can be obtained prior to acquiring the measured data. The BLT reconstruction is sensitive to various factors. Precalculating the *P*-matrix can affect the reconstruction quality due to the use of different heterogeneous geometries between the calculation and the experiment. Diffuse optical tomography (DOT) has been investigated for several decades and its reconstruction algorithms are easily applied to BLT. In this case, the Jacobian matrix needs to be calculated for each iteration, which is time-consuming [7]. Since the BLT problem is linear, the least-square problem can be solved to realize the BLT reconstruction by establishing the linear relationship between the unknown source variable and the measured surface data [6]. Until now, this method has not been extended to the radiative-transfer-based model. In addition, matrix inversion needs to be performed when using this strategy. Additional investigations should be performed to improve the reconstruction speed.

In this paper, a radiative-transfer-based fully parallel BLT reconstruction framework is developed using simplified spherical harmonics (*SP _{N}*) equations. This framework uses finite element methods to process complex reconstruction domain geometries. The linear relationship between the unknown source and the spectrally-resolved measured data using the

*SP*approximation is established to achieve BLT reconstruction. To improve the reconstruction speed and enable BLT reconstruction for the whole body of the mouse, the finite element-based matrices are stored and operated in a parallel distribution mode. Furthermore, for the time-consuming problems of the key steps in the reconstruction, corresponding improvements are also performed, which significantly accelerate the reconstruction. Timing analysis demonstrates the improved performance of the proposed framework. Experimental reconstructions using mouse-shaped phantoms and real mice show the potential of this framework for practical BLT applications. The next section introduces the proposed fully parallel framework using the

_{N}*SP*approximation. In the third section, the performance tests and analysis are described and experimental BLT reconstructions also are demonstrated. Finally, we discuss relevant issues.

_{N}## 2. Methods

#### 2.1. Radiative transfer equation and SPN approximation

The radiative transfer equation (RTE) is an approximation to Maxwell’s equations. In bioluminescence imaging, the source intensity is generally assumed to be time invariant during the data acquisition. In addition, the photons at different wavelengths are considered to be independent, therefore we get

$$={\mu}_{s}(\mathbf{r},\lambda ){\int}_{4\pi}p(\hat{\mathbf{s}},{\hat{\mathbf{s}}}^{\prime})\psi (\mathbf{r},{\hat{\mathbf{s}}}^{\prime},\lambda )d{\hat{\mathbf{s}}}^{\prime}+S(\mathbf{r},\hat{\mathbf{s}},\lambda )$$

where *ψ*(**r**, ŝ,*λ*) denotes photons in the unit volume traveling from point r in direction ŝ. Based on the principle of energy conservation, the RTE suggests that the radiance *ψ*(**r**, ŝ,,*λ*) is equal to the sum of all factors affecting it (including absorption *µ _{a}*(

**r**,

*λ*), scattering

*µ*(

_{s}**r**,

*λ*), and source energy

*S*(

**r**, ŝ,

*λ*)) when light photons cross a unit volume [20].

*p*(ŝ, ŝ′) is the scattering phase function and gives the probability of a photon scattering anisotropically from direction ŝ′ to direction ŝ. Generally, the Henyey-Greenstein (HG) phase function is used to characterize this probability [21]:

where *g* is the anisotropy parameter; cos*θ* denotes the scattering angle and is equal to ŝ·ŝ′ when we assume that the scattering probability only depends on the angle between the incoming and scattering directions. When photons reach the body surface of a mouse, that is *r*∈*∂*Ω, some of them are reflected and cannot escape from the mouse body Ω because of the mismatch between the refractive indices *n _{b}* for Ω and

*n*for the external medium. When the incidence angle

_{m}*θ*from the mouse body is not larger than the critical angle

_{b}*θ*(

_{c}*θ*=arcsin(

_{c}*n*) based on Snell’s law), the reflectivity

_{m}/n_{b}*R*(cos

*θ*) is given by [22]:

_{b}where *θ _{m}* is the transmission angle. Furthermore, we can get the exiting partial current

*J*+(

**r**) at each boundary point

**r**[13]:

where v is the unit outer normal vector. After a series of deductions with the *P _{N}* method, the

*SP*approximation is obtained [13]

_{N}$$-\left(\genfrac{}{}{0.1ex}{}{1}{2n+1}\right)\nabla \xb7\genfrac{}{}{0.1ex}{}{1}{{\mu}_{a,n-1}\left(\lambda \right)}\nabla (\left(\genfrac{}{}{0.1ex}{}{n}{2n-1}\right){\varphi}_{n}\left(\lambda \right)+\left(\genfrac{}{}{0.1ex}{}{n-1}{2n-1}\right){\varphi}_{n-2})+{\mu}_{a,n}\left(\lambda \right){\varphi}_{n}\left(\lambda \right)=s\left(\lambda \right)$$

where *µ _{a,n}*(

*λ*)=

*µ*(

_{a}*λ*)+

*µ*(

_{s}*λ*)(1-

*g*); when

^{n}*ψ*(λ) is expanded by the

*P*approximation,

_{N}*ϕ*(λ) are the

_{n}*Legendre moments*of

*ψ*(

*λ*) (2≤

*n*≤

*N*,

*N*is an odd positive integer). Although the

*SP*solution is asymptotic and cannot converge to an exact radiative transfer solution with the increase of

_{N}*N*, the simulation results have shown good agreement between the

*SP*approximation and Monte Carlo methods [14]. Through some further deductions, (

_{7}*N*+1)/2 boundary conditions can be obtained corresponding to (

*N*+1)/2 Eqs. (5). These boundary conditions are mixed and consist of linear combinations of the even-order

*ϕ*and their first derivatives. With respect to the

_{n}*composite moments*

*φ*of

_{n}*ϕ*, which are

_{n}$${\phi}_{2}=3{\varphi}_{2}+4{\varphi}_{4},$$

$$\dots ,$$

$${\phi}_{n}=\left(2n-1\right){\varphi}_{2n-2}+\left(2n\right){\varphi}_{2n},$$

$$\dots ,$$

$${\phi}_{\left(N+1\right)\u20442}={\mathit{N\varphi}}_{N-1}.$$

We can get the general equations of the *SP _{N}* approximation and its boundary conditions when practical measurements are performed at the wavelength

*λ*using bandpass filter:

_{k}where ${\U0001d4d2}_{i,\nabla {\phi}_{i}}\left({\lambda}_{k}\right)$, ${\U0001d4d2}_{i,\nabla {\phi}_{j}}\left({\lambda}_{k}\right)$, ${\U0001d4d2}_{i,{\nabla \phi}_{j}}^{b}\left({\lambda}_{k}\right)$, and ${\U0001d4d2}_{i,\nabla {\phi}_{j}}^{b}\left({\lambda}_{k}\right)$ can be calculated. The details and the above coefficients of the *SP*
_{1} to *SP _{7}* approximations can be found in [13].

#### 2.2. Fully parallel reconstruction algorithm

Based on the finite element analysis, we can get a general weak formulation for the *SP _{N}* approximation [23]:

$$-{\int}_{\partial \mathbf{\Omega}}{\U0001d4d2}_{i,\nabla {\phi}_{i}}\sum _{j=1}^{(N+1)\u20442}{f}_{\mathbf{v}\xb7{\phi}_{i}}\left({\phi}_{j}\right)\xb7\upsilon d\partial \Omega ={\int}_{\mathbf{\Omega}}{\U0001d4d2}_{i,S}{S}_{i}\left({\lambda}_{k}\right)\xb7\upsilon d\Omega $$

To avoid the processing of v·*φ _{i}* in boundary integration, we assume v·

*φ*are unknown variables in the boundary equations (Eq. (7b)). We obtain ${f}_{\mathbf{v}\xb7{\phi}_{i}}(\xb7)$ by solving a set of first order equations. The boundary conditions can be easily processed using this method.

_{i}Figure 1 shows the flowchart of the proposed fully-parallel framework. Fully-parallel reconstruction means that almost all of the steps in the reconstruction framework should work in parallel mode. After the reconstruction domain Ω is discretized into the volumetric mesh T, the next step is to partition this mesh into *N _{c}* mesh subdomains

*𝓣*(1≤

_{c}*𝓣*≤

_{c}*N*), where

_{c}*N*is the number of the utilized CPUs. Regarding the finite element implementation, the space of linear finite elements

_{c}*𝓥*is introduced on

*𝓣.φ*(

_{i}*λ*) and

_{k}*S*(

_{i}*λ*) are approximated:

_{k}where *φ _{i,p}*(

*λ*) and

_{k}*s*(

_{i,p}*λ*) are the discretized values at a discretized point

_{k}*p*when using the basis function

*υ*(

_{p}**r**);

*N𝓟*is the total number of the discretized points over the entire domain. Considering Eqs. (8), (9a), and (9b), for a volumetric element

*τ*, we have

_{e}$$\left[\begin{array}{cccc}\multicolumn{1}{c}{{b}_{1{\phi}_{1}}\left({\lambda}_{k}\right)}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}\\ \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{{b}_{2{\phi}_{2}}\left({\lambda}_{k}\right)}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}\\ \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\ddots}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}\\ \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{{b}_{\left(N+1\right)\u20442{\phi}_{\left(N+1\right)\u20442}}\left({\lambda}_{k}\right)}\end{array}\right]\left[\begin{array}{c}\multicolumn{1}{c}{{s}_{1,{\tau}_{e}}\left({\lambda}_{k}\right)}\\ \multicolumn{1}{c}{{S}_{2,{\tau}_{e}}\left({\lambda}_{k}\right)}\\ \multicolumn{1}{c}{\vdots}\\ \multicolumn{1}{c}{{s}_{\left(N+1\right)\u20442,{\tau}_{e}}\left({\lambda}_{k}\right)}\end{array}\right]$$

where

${m}_{{\mathrm{i\phi}}_{j}}\left({\lambda}_{k}\right)\{\begin{array}{c}\multicolumn{1}{c}{\begin{array}{cc}\multicolumn{1}{c}{\begin{array}{c}\multicolumn{1}{c}{{\int}_{\mathrm{\tau e}}\left\{{\U0001d4d2}_{i,\nabla {\phi}_{i}}\left({\lambda}_{k}\right)\nabla {\upsilon}_{p}\xb7\nabla {\upsilon}_{p}+{\U0001d4d2}_{i,{\phi}_{i}}\left({\lambda}_{k}\right){\upsilon}_{p}{\upsilon}_{q}\right\}d\mathbf{r}}\\ \multicolumn{1}{c}{-{\int}_{\partial {\tau}_{e}}{\U0001d4d2}_{i,\nabla {\phi}_{i}}\left({\lambda}_{k}\right){f}_{\mathbf{v}\xb7{\phi}_{i}}\left({\upsilon}_{p}\right){\upsilon}_{q}d\mathbf{r}}\end{array}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.9em}{0ex}}\mathrm{if}i=j}\\ \multicolumn{1}{c}{{\int}_{{\tau}_{e}}{\U0001d4d2}_{i,{\phi}_{j}}\left({\lambda}_{k}\right){\upsilon}_{p}{\upsilon}_{q}d\mathbf{r}-{\int}_{\partial {\tau}_{e}}{\U0001d4d2}_{i,\nabla {\phi}_{i}}\left({\lambda}_{k}\right){f}_{\mathbf{v}\xb7{\phi}_{i}}\left({\upsilon}_{p}\right){\upsilon}_{q}d\mathbf{r}\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\mathrm{if}\text{}\ne j}\end{array}}\end{array}$

and

${b}_{i,{\phi}_{i}}\left({\lambda}_{k}\right)={\int}_{{\tau}_{e}}{\upsilon}_{p}{\upsilon}_{q}d\mathbf{r}$

*∂τ _{e}* is the boundary element if

*τ*is on the boundary and belongs to the respective subdomain in the parallel implementation. After assembling the submatrices on all the elements, we get

_{e}$$\left[\begin{array}{cccc}\multicolumn{1}{c}{{B}^{{\mathcal{T}}_{c}}}& \multicolumn{1}{c}{}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}\\ \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{{B}^{{\mathcal{T}}_{c}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}\\ \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\ddots}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}\\ \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{\phantom{\rule[-0ex]{.2em}{0ex}}}& \multicolumn{1}{c}{{B}^{{\mathcal{T}}_{c}}}\end{array}\right]\phantom{\rule[-0ex]{.2em}{0ex}}\left[\begin{array}{c}\multicolumn{1}{c}{{S}_{1}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)}\\ \multicolumn{1}{c}{{S}_{2}^{{\mathcal{T}}_{c}}\left({\lambda}^{k}\right)}\\ \multicolumn{1}{c}{\vdots}\\ \multicolumn{1}{c}{{S}_{\left(N+1\right)\u20442}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)}\end{array}\right]$$

After inverting the entire matrix at the left side of Eq. (11), we have

Where ${\mathrm{IM}}_{{\mathrm{i\phi}}_{j}}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)$ are the submatrices of the entire inverse matrix *IM*(*λ _{k}*) corresponding to ${M}_{{\mathrm{i\phi}}_{j}}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)$. Note that matrix inversion is calculated with respect to the entire matrix

*M*(

*λ*) although ${\mathrm{IM}}_{{\mathrm{i\phi}}_{j}}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)$ are used to describe the subdomain submatrices. Because matrix inversion is always time-consuming, to accelerate the reconstruction, direct and iterative matrix inversion methods are compared to optimize the execution time. Details can be found in Section 2.3. After removal of the rows in the matrices $\sum _{j=1}^{\left(N+1\right)\u20442}{\U0001d4d2}_{j,S}{\mathrm{IM}}_{{\mathrm{i\phi}}_{j}}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)\xb7{B}^{{\mathcal{T}}_{c}}\xb7{S}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)$ corresponding to the non-boundary measurable discretized points, we further manipulate Eq. (4) to get [13]

_{k}$$={G}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right){S}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)$$

where *β _{j}*(

*λ*) can be calculated with respect to Eq. (4) when

_{k}*ψ*(

**r**, ŝ,

*λ*) is expanded; ${G}_{j}^{{\mathcal{T}}_{c}}\left({\lambda}_{k}\right)$ are the corresponding matrices on subdomain

*𝓣*after removing the rows in Eqs. (12). When the surface optical data are collected at

_{c}*K*wavelengths, we get

where

${\mathcal{A}}^{{\mathcal{T}}_{c}}$ is the relationship matrix between ${J}^{{\mathcal{T}}_{c},+,b}$ and ${S}^{{\mathcal{T}}_{c}}$ ; *γ _{k}* is the percentage at the wavelength λk of the total energy. It is usually considered as an ill-conditioned matrix because of the ill-posedness of BLT. The surface measured data ${J}^{{\mathcal{T}}_{c},+,m}$ corresponding to ${J}^{{\mathcal{T}}_{c},+,b}$ will likely lead to reconstruction failure when solving Eq. (14) directly due to the noise factor. Through solving the bound-constrained least-square problem

We can generate the BLT reconstruction, where *S ^{sup}* is the upper bound of the source density; δ the regularization parameter; and

*η*(·) the penalty function. Since all the data in Eq. (16) are distributed on

*N*CPUs, the optimization algorithms should work in parallel mode.

_{c}#### 2.3. Further demonstration

Overall, BLT reconstruction can be obtained by solving Eq. (16) using parallel optimization algorithms. Although a fully parallel spectrally-resolved reconstruction framework can be realized, performance optimization is necessary to accelerate the reconstruction. With respect to the steps of the proposed framework, three aspects are further demonstrated:

Load balancing is critical for high performance parallel reconstructions. If there are large differences between the sizes of the mesh subdomains on different CPUs, the performance is adversely affected. At the beginning of the proposed framework, mesh partitioning is an important step. In this framework, a multilevel k-way method is used to perform this [24].

The method results in improved performance by reducing the dimensions of the mesh, partitioning it into smaller sizes, and refining it to the original one. Another consideration is the distribution of the relationship matrix *𝓐* between the spectrally-resolved measured data and the unknown source variable. The distribution of the measurable boundary discretized points is not uniform on each CPU. In addition, ${\mathcal{A}}^{{\mathcal{T}}_{c}}$ is formed by combining K $K{G}^{{\mathcal{T}}_{c}}$. The redistribution of 𝓐 is necessary in order to optimize the performance of parallel optimization algorithms.

Matrix inversion is the major component in the *Relationship Forming* step. Although it is very time-consuming, it is unavoidable in the proposed framework. Furthermore, with the increase of the number *N* in the expansion to the radiance, the dimensions of the matrix *M*(*λ _{k}*) become very large. It is essential to find a very efficient inversion matrix algorithm [25]. Since

*M*(

*λ*) is sparse positive-definite,

_{k}*LU*factorization has good performance in various direct inversion algorithms. Another strategy is to use iterative methods. Preconditioning strategies in recent years have significantly improved the performance of iterative methods [26]. Since the performance of two strategies depends on the matrix properties, it is necessary to decide the type of strategy suitable for the specified problem.

Optimization methods Eq. (16) is a least squares problem, and it is easy to obtain the Hessian matrix in Newton-type optimization methods [27]. However, due to the large scale of the problem, a significant amount of memory is required during the optimization procedure. Even if the Hessian matrix can be calculated at each iteration, the process is extremely time-consuming. In addition, when computing the search direction, it is necessary to invert the Hessian matrix, which is also time-consuming. The speed of BLT reconstruction is therefore severely affected using Hessian matrix-based optimization algorithms. One solution to this problem is to use quasi-Newton methods. Generally, these methods build up an approximate Hessian matrix by using gradient and iteration algorithms. This approximate matrix is obtained by vector-vector multiplications in real time and is easy to inverse, saving memory and time costs. Here, the limited memory variable metric bound constrained quasi-Newton method (BLMVM) in parallel mode is used for BLT reconstruction [28].

## 3. Results

The bioluminescence imaging experiments were performed on a Maestro 2 *in vivo* imaging system (CRI, Woburn, Massachusetts). This system uses a cooled CCD camera with a custom lens as the detector. The distinct characteristic of this system is that a liquid crystal tunable filter (LCTF) is used to acquire multispectral data. Generally, the filter bandpass width was set to 20*nm* and the optical data was collected from a single view. The exposure time for each wavelength was adjusted to obtain high signal-to-noise ratio (SNR). After completing each optical signal acquisition, the phantom or mouse were scanned using an X-ray microCAT system (Siemens Preclinical Solutions, Knoxville, TN) to obtain CT images. The software Amira (Mercury Computer Systems, Inc. Chelmsford, MA) used the CT images to generate volumetric meshes for BLT reconstructions.

The framework was implemented in libMesh [29]. LibMesh is an open-source, high-quality software package and is developed to meet the needs of parallel FEM-based simulation. LibMesh provides almost all of the components used in parallel PDE-based simulation with unstructured discretization. Its design concept is to use existing software packages as far as possible. PETSc developed by Argonne National Laboratory (ANL) was used to solve the linear systems in parallel mode [30]. By default, an open-source serial graph partitioning package, METIS, realizing the multilevel k-way partitioning algorithm was used to partition the whole domain in libMesh [24]. In addition, in order to observe the effect of the model errors in the reconstruction quality, the regularization parameter *δ* was set to zero in the reconstructions. In order to test the proposed framework, we selected the *SP*
_{1}(*DA*), *SP*
_{3} and *SP _{7}* approximations to perform BLT reconstructions. All the simulations were performed on a cluster of 27 nodes (2 CPUs of 3.2GHz and 4 GB RAM at each node).

#### 3.1. Mouse-shaped phantom experiments

In the first case, a commercial mouse-shaped phantom (Caliper Life Sciences, Hopkinton, Massachusetts, USA) was used to acquire multispectral data. The phantom was fabricated from a polyester resin, TiO_{2} and Disperse Red. To imitate the bioluminescence source, an optical fiber coupled to a green LED was embedded within the phantom. The emission spectrum of the LED was similar to that of a bioluminescence source. Its wavelength range was from 500*nm* to 700*nm* with a peak at about 567*nm*. The photon distribution data at two wavelengths (580*nm* and 640*nm*) were used in the BLT reconstruction. Table 1 shows the optical properties (*µ _{a}* and

*µ*′

*) at two wavelengths measured with the inverse adding-doubling method [31]. For the high-order*

_{s}*SP*approximations, we set the anisotropic parameter g to 0.9. More detailed information about this phantom can be obtained elsewhere [16]. Figure 2(a) shows a photograph of the phantom in the Maestro 2 system. To avoid the curved surface effect in the measured data, the bottom flat surface was used as the detection surface. Figures 2(b) and 2(c) show the photon distribution at 580

_{N}*nm*and 640

*nm*respectively. They were acquired using an exposure time of 5

*min*. There are distinct differences between them because of the different optical properties at different wavelengths, which benefit the 3

*D*source localization.

With respect to the photon distribution, about two-thirds of the overall phantom was selected for mesh generation. The volumetric mesh, as shown in Fig. 3(a), contained 4,969 nodes and 21,348 tetrahedral elements. Figure 3(a) also shows that the photon distribution was mapped on the mesh surface using a manual co-registration method. Figure 3(b) shows the results after the mesh was partitioned when 10 CPUs were used for the BLT reconstruction. The number of discretized points in each subdomain is similar, avoiding a load imbalance. Figure 4 further shows the reconstructed results based on *SP*
_{1}, *SP*
_{3} and *SP _{7}* approximations. Due to the absence of a regularization parameter, the

*SP*

_{1}-based reconstruction was very sensitive to the measured noise and we could not obtain good source localization, as shown in Fig. 4(a). Figures 4(b) and 4(c) show the reconstructed results when

*SP*

_{3}and

*SP*approximations were used. From the figures it is clear that the source positions are reconstructed well when using high-order approximations. The localization errors are less than 1

_{7}*mm*in two directions, which can clearly be observed from Figs. 4(b) and 4(c). These reconstructed results show the importance and performance of high-order

*SP*approximations for BLT reconstruction.

_{N}#### 3.2. Reconstruction performance optimization

Although the high-order SPN-based BLT reconstructions yield good source localization, the reconstruction memory and time costs are significantly increased with respect to the first order approximation (*SP*
_{1}). In order to save a dense inverse matrix *IM*(*λ _{k}*), the

*SP*

_{1}-based reconstruction requires only about 94 MB of space compared to about 1.5 GB in the

*SP*-based reconstructions. Although the proposed fully parallel framework has the ability to process matrices with large dimensions by distributing the storage, reconstruction time becomes impractical with the increase of the approximation order

_{7}*N*and the number of used total wavelengths

*K*. Performance optimization is indispensable to improve the efficiency of the proposed framework. The quasi-Newton optimization method (BLMVM) has been selected to significantly reduce reconstruction time when compared to general Newton-type methods. Additionally, matrix inversion and the number of utilized CPUs further optimizes the reconstruction framework.

### 3.2.1. Direct vs. iterative inversions

When the reconstruction domain is discretized into *N _{𝓟}* points, the

*SP*-based BLT reconstruction must process a

_{N}*N**

*N*𝓟×

*N**

*N*

*matrix. The computational complexity of the matrix inversion is*

_{𝓟}*O*((

*N**

*N*

_{𝓟})

^{3}) if direct inversion methods are used. The computational burden is significantly increased with the increase of

*N*and

*N*

*. When 10 CPUs were used in the above BLT reconstructions, the*

_{𝓟}*SP*

_{1}-based reconstruction required only 1,163.1sec, as opposed to 3,086.1sec for

*SP*

_{3}and 10,754.5sec for

*SP7*(Table 2).

*LU*-factorization-based relationship forming (i.e. forming Eq. (14)) utilized most of the total reconstruction time. For the

*SP*

_{1}and

*SP*approximations, the percentage increased from 58.0% to 96.8%, making it critically important to improve the performance of the matrix inversion. For iterative matrix inversion methods, the parallel incomplete

_{7}*LU*(

*ILU*) conjugate gradient (CG) method was used to accelerate the inversion. This preconditioner was provided by the Hypre open source package [32], developed by Lawrence Livermore National Laboratory (LLNL). For the

*SP*

_{1}- and

*SP*-based reconstructions, the total reconstruction time sharply decreased from 644.5 to 3,367.5sec, as shown in Table 2. Although the percentage of the relationship forming part in the total time increased regardless of

_{7}*SP*

_{1}and

*SP*approximations, the reconstruction speed was improved by a factor of 1.80 and 3.19 corresponding to the SP1 and SP7 approximations.

_{7}### 3.2.2. Parallel reconstruction performance

Iterative matrix inversion methods show the improved performance in the Relationship Forming step. Ideally, parallel execution on an increased number of CPUs should provide improved reconstruction performance. In order to evaluate the proposed fully-parallel reconstruction framework, BLT reconstructions with *SP*
_{1}, *SP*
_{3} and *SP _{7}* approximations were performed using different number of CPUs. Iterative matrix inversion was used in these evaluations. Figure 5(a) shows the total reconstruction time depending on the CPU number. The

*SP*

_{1}-based reconstruction time increased with an increased number of CPUs. For

*SP*

_{3}- and

*SP*-based reconstructions, there were an optimal CPU number which will provide the shortest reconstruction time (3 and 23 CPUs corresponding to

_{7}*SP*

_{3}and

*SP*). In the proposed framework, the four main steps are 1)

_{7}*Mesh Partitioning*, 2)

*Matrix Assembly*, 3)

*Relationship Forming*, and 4)

*Optimization*. To further observe the above behavior, time analysis was performed while observing these steps, as shown in Figs. 5(b) and 5(c). Since

*Mesh Partitioning*required the least time among the four steps, it was negligible with respect to the entire reconstruction (data is not shown in Figs. 5(b) and 5(c)). With the increase in CPU number, the time cost of

*Matrix Assembly*was gradually reduced. Despite the fact that the

*matrix assembly*time increased with the application of high-order approximation, Matrix Assembly required a small percentage of the overall reconstruction.

*Relationship Forming*and

*Optimization*comprised nearly all the reconstruction time. Furthermore, both of these steps had an optimal number of CPUs to obtain the minimal time cost. Since BLT reconstructions are performed on a cluster, the time cost of the communication between the CPUs becomes significant compared with the performance improvement from parallel execution. Higher speed communication methods, such as shared memory mode, could significantly improve the reconstruction speed. With the current hardware architecture and software settings, the number of CPUs must be preselected to obtain optimal reconstruction time.

### 3.3. Real mouse experiments

To further validate the proposed framework, experiments with a living mouse were performed in the Maestro 2 system. To simulate the bioluminescence source, a luminescent bead (Mb-Microtec, Bern, Switzerland) whose emission spectrum is similar to the *in vivo* spectrum of a firefly luciferase-based source was used. This bead uses tritium (the half life is about 12 years) to excite phosphor which generates photons, making it a very stable source. The bead dimensions are 0.9*mm* in diameter and 2.5*mm* in length. Prior to performing the experiments, the mouse was anesthetized and the bead was injected into the thigh using a syringe. The photon distribution data at 580*nm* and 660*nm* were collected from the ventral view. The exposure time was set to 1.5*min*. The volumetric mesh used in the reconstruction was generated using CT images of the mouse and contained 5,932 points and 24,120 tetrahedral elements. Figure 6(a) shows the mapped photon distribution after co-registration between the photograph of the mouse and the volumetric mesh. From the CT images, the tritium source can be clearly identified, as shown in Fig. 7. Since the photon propagation path consists almost totally of muscle, the reconstruction domain was considered to be homogeneous muscle tissue. The corresponding mouse muscle optical properties were then used in the reconstruction (Table 1).

Figure 6(b) shows the partitioned mesh when 30 CPUs were used in the BLT reconstruction. The reconstructed results corresponding to *SP*
_{1}, *SP*
_{3}, and *SP _{7}* approximations are shown in Fig. 7. The actual center position of the tritium source was at (51.8,-0.1). The reconstructed center position obtained from

*SP*

_{1},

*SP*

_{3}, and

*SP*approximations was at about (51.1,0.2). The

_{7}*SP*-based reconstruction was similar with the

_{7}*SP*

_{3}-based one regarding the source center position. Although the

*SP*

_{1}-based reconstruction was somewhat different compared to the

*SP*

_{3}-and

*SP*-based results, the source localization errors were measured to be less than 1

_{7}*mm*in two directions. This result was similar to the

*SP*

_{3}- and

*SP*-based reconstruction in the phantom experiments. The difference between phantom- and real mouse-based BLT reconstructions was that the tritium source could be localized well in the

_{7}*SP*

_{1}-based reconstruction. This was likely because the tritium source was superficial compared to the LED source in the mouse-shaped phantom. In addition, the mouse surface was more irregular than the mouse-shaped phantom surface, it should have the effect in the reconstruction.

## 4. Discussions and Conclusion

In this paper, a radiative-transfer-based fully-parallel reconstruction framework was developed for spectrally-resolved bioluminescence tomography. Although the BLT reconstruction was performed based on the simplified spherical harmonics approximation, the proposed framework was also suitable for other high-order self-adjoint approximations to the RTE. The application of the finite element methods made the framework suitable for processing complex geometries. Fully-parallel execution made the BLT reconstruction for the whole-body of a small animal feasible. The reconstruction performance optimization significantly improved the reconstruction speed. The experimental reconstruction using the mouse-shaped phantom and real mouse further demonstrated the effectiveness of the proposed framework.

Since bioluminescence tomography can provide more accurate bioluminescent source information, the importance of developing mature BLT technologies is critical, given the successful and extensive application of planar bioluminescence imaging in biological research. Diffusion theory approximation leads to inaccurate reconstructions and more accurate approximation models are necessary for BLT reconstruction. However, the corresponding computational burden prevents the realization of such reconstruction algorithms. The proposed framework addresses this problem. Although good reconstruction performance can be obtained using high-order approximation models, the memory and time cost cannot be neglected. The balance between reconstruction quality and cost should be further explored. From the reconstruction cases presented in this paper, we find that the SP3 approximation is suitable for obtaining good BLT reconstructions after comparing its results with those based on *SP*
_{1} and *SP _{7}* approximations.

The performance optimization presented here is relevant not only to the improvement of the algorithms involved, but also to the development of the computational hardware. While the application of quasi-Newton optimization methods and iterative matrix inversion have effectively improved the reconstruction performance, further optimization strategies should be developed to obtain higher reconstruction speed. Since the matrix *M*(*λ _{k}*) is sparse, sparse approximate matrix inversion can be considered to accelerate the reconstruction. With respect to the hardware, it is necessary to improve the data communication speed between CPUs. Developing multi-core CPU technology and shared-memory high performance computer will significantly benefit the proposed algorithm.

In conclusion, we have developed a fully-parallel spectrally-resolved BLT reconstruction framework for radiative-transfer-based high-order approximations. A performance optimization was also performed and described. Preliminary experimental reconstruction verifications show the feasibility and effectiveness of the proposed framework. Further investigations will focus on real mouse experiments used as disease models.

## Acknowledgement

We would like to thank Dr. Laurent Bentolila from Department of Chemistry & Biochemistry, University of California Los Angeles for providing us with access to the Maestro 2 system. We are grateful to Judy Edwards and Waldemar Ladno at the small-animal imaging facility of the Crump Institute for Molecular Imaging for their assistance with mouse experiments. This work is supported by the NIBIB R01-EB001458, a NIH/NCI 2U24 CA092865 cooperative agreement, the Department of Energy DE-FC02-02ER63520, and the NCI grant 5-R01 CA08572.

## References and links

**1. **V. Ntziachristos, J. Ripoll, L. V. Wang, and R. Weisslder, “Looking and listening to light: the evolution of whole body photonic imaging,” Nat. Biotechnol. **23**, 313–320 (2005).
[CrossRef] [PubMed]

**2. **R. Weissleder, “Scaling down imaging: Molecular mapping of cancer in mice,” Nat. Rev. Cancer **2**, 11–18 (2002).
[CrossRef] [PubMed]

**3. **J. Virostko, A. C. Powers, and E. D. Jansen, “Validation of luminescent source reconstruction using single-view spectrally resolved bioluminescence images,” Appl. Opt. **46**, 2540–2547 (2007), http://www.opticsinfobase.org/abstract.cfm?URI=ao-46-13-2540.
[CrossRef] [PubMed]

**4. **A. P. Gibson, J. C. Hebden, and S. R. Arridge, “Recent advances in diffuse optical imaging,” Phys. Med. Biol. **50**, R1–R43 (2005).
[CrossRef] [PubMed]

**5. **E. E. Lewis and W. F. Miller Jr., , *Computational Methods of Neutron Transport*, (JohnWiley & Sons, New York, 1984).

**6. **W. Cong, G. Wang, D. Kumar, Y. Liu, M. Jiang, L. V. Wang, E. A. Hoffman, G. McLennan, P. B. McCray, J. Zabner, and A. Cong, “Practical reconstruction method for bioluminescence tomography,” Opt. Express **13**, 6756–6771 (2005), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-13-18-6756.
[CrossRef] [PubMed]

**7. **X. Gu, Q. Zhang, L. Larcom, and H. Jiang, “Three-dimensional bioluminescence tomography with model-based reconstruction,” Opt. Express **12**, 3996–4000 (2004), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-12-17-3996.
[CrossRef] [PubMed]

**8. **Y. Lv, J. Tian, W. Cong, G. Wang, J. Luo, W. Yang, and H. Li, “A multilevel adaptive finite element algorithm for bioluminescence tomography,” Opt. Express **14**, 8211–8223 (2006), http://www.opticsinfobase.org/abstract.cfm?URI=oe-14-18-8211.
[CrossRef] [PubMed]

**9. **H. Dehghani, S. C. Davis, S. Jiang, B. W. Pogue, K. D. Paulsen, and M. S. Patterson, “Spectrally resolved bioluminescence optical tomography,” Opt. Lett. **31**, 365–367 (2006), http://www.opticsinfobase.org/abstract.cfm?URI=ol-31-3-365.
[CrossRef] [PubMed]

**10. **A. D. Klose, “Transport-theory-based stochastic image reconstruction of bioluminescent sources,” J. Opt. Soc. Am. A **24**, 1601–1608 (2007), http://www.opticsinfobase.org/josaa/abstract.cfm?URI=josaa-24-6-1601.
[CrossRef]

**11. **C. R. E. de Oliveira, “An arbitrary geometry finite element method for multigroup neutron transport with anisotropic scattering,” Progr. Nucl. Energ. **18**, 227–236 (1986).
[CrossRef]

**12. **S. Wright, M. Schweiger, and S. R. Arridge, “Reconstruction in optical tomography using the PN approximations,” Meas. Sci. Technol. **18**, 79–86 (2007).
[CrossRef]

**13. **A. D. Klose and E. W. Larsen, “Light transport in biological tissue based on the simplified spherical harmonics equations,” J. Comput. Phys. **220**, 441–470 (2006).
[CrossRef]

**14. **Y. Lu and A. F. Chatziioannou, “A parallel adaptive finite element method for the simulation of photon migration with the radiative-transfer-based model,” Commun. Numer. Methods Eng. **25**, 751–770 (2009).
[CrossRef]

**15. **G. Wang, Y. Li, and M. Jiang, “Uniqueness theorems in bioluminescence tomography,” Med. Phys. **31**, 2289–2299 (2004).
[CrossRef] [PubMed]

**16. **C. Kuo, O. Coquoz, T. L. Troy, H. Xu, and B.W. Rice, “Three-dimensional reconstruction of in vivo bioluminescent sources based on multispectral imaging,” J. Biomed. Opt. **12**, 024007 (2007).
[CrossRef] [PubMed]

**17. **A. J. Chaudhari, F. Darvas, J. R. Bading, R. A. Moats, P. S. Conti, D. J. Smith, S. R. Cherry, and R. M. Leahy, “Hyperspectral and multispectral bioluminescence optical tomography for small animal imaging,” Phys. Med. Biol. **50**, 5421–5441 (2005).
[CrossRef] [PubMed]

**18. **Y. Lv, J. Tian, W. Cong, G. Wang, W. Yang, C. Qin, and M. Xu, “Spectrally resolved bioluminescence tomography with adaptive finite element analysis: methodology and simulation,” Phys. Med. Biol. **52**, 4497–4512 (2007).
[CrossRef] [PubMed]

**19. **G. Alexandrakis, F. R. Rannou, and A. F. Chatziioannou, “Tomographic bioluminescence imaging by use of a combined optical-PET (OPET) system: a computer simulation feasibility study,” Phys. Med. Biol. **50**, 4225–4241 (2005).
[CrossRef] [PubMed]

**20. **T. Vo-Dinh, *Biomedical Photonics Handbook*, (CRC Press, 2002).

**21. **A. Ishimaru, *Wave propagation and scattering in random media*, (IEEE Press, 1997).

**22. **R. C. Haskell, L. O. Svaasand, T. Tsay, T. Feng, M. S. McAdams, and B. J. Tromberg, “Boundary conditions for the diffusion equation in radiative transfer,” J. Opt. Soc. Am. A **11**, 2727–2741 (1994), http://www.opticsinfobase.org/abstract.cfm?URI=josaa-11-10-2727.
[CrossRef]

**23. **S. S. Rao, *The finite element method in engineering*, (Butterworth-Heinemann, Boston, 1999).

**24. **G. Karypis and V. Kumar, “Multilevel k-way partitioning scheme for irregular graphs,” J. Parallel Distrib. Comput. **48**, 96–129 (1998).
[CrossRef]

**25. **G. H. Golub and C. F. Van Loan, *Matrix computations* (3rd ed.), (Johns Hopkins University Press, 1996).

**26. **M. Benzi, “Preconditioning techniques for large linear systems: a survey,” J. Comput. Phys. **182**, 418–477 (2002).
[CrossRef]

**27. **J. Nocedal and S. J. Wright, *Numerical Optimization*, (Springer, New York, 1999).
[CrossRef]

**28. **S. J. Benson and J. Moré, “A limited-memory variable-metric algorithm for bound-constrained minimization,” Technical Report ANL/MCS-P909-0901, Mathematics and Computer Science Division, Argonne National Laboratory (2001).

**29. **B. Kirk, J. W. Peterson, R. H. Stogner, and G. F. Carey, “libMesh: A C++ Library for Parallel Adaptive Mesh Refinement/Coarsening Simulations,” Eng. Comput. **22**, 237–254 (2006).
[CrossRef]

**30. **S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang, PETSc Web page, 2001, http://www.mcs.anl.gov/petsc.

**31. **S. A. Prahl, M. J. C. van Gemert, and A. J. Welch, “Determining the optical properties of turbid media by using the adding-doubling method,” Appl. Opt. **32**, 559–568 (1993), http://www.opticsinfobase.org/abstract.cfm?URI=ao-32-4-559.
[CrossRef] [PubMed]

**32. **R. D. Falgout and U. M. Yang, “hypre: A library of high performance preconditioners,” In *Proceedings of the International Conference on Computational Science-Part III*, p. 632–641 (2002).