## Abstract

High-density diffuse optical tomography (HD-DOT) is an emerging technique for visualizing the internal state of biological tissues. The large number of overlapping measurement channels due to the use of high-density probe arrays permits the reconstruction of the internal optical properties, even with a reflectance-only measurement. However, accurate three-dimensional reconstruction is still a challenging problem. First, the exponentially decaying sensitivity causes a systematic depth-localization error. Second, the nature of diffusive light makes the image blurred. In this paper, we propose a three-dimensional reconstruction method that overcomes these two problems by introducing sensitivity-normalized regularization and sparsity into the hierarchical Bayesian method. Phantom experiments were performed to validate the proposed method under three conditions of probe interval: 26 mm, 18.4 mm, and 13 mm. We found that two absorbers with distances shorter than the probe interval could be discriminated under the high-density conditions of 18.4-mm and 13-mm intervals. This discrimination ability was possible even if the depths of the two absorbers were different from each other. These results show the high spatial resolution of the proposed method in both depth and horizontal directions.

© 2012 OSA

## 1. Introduction

Diffuse optical tomography (DOT) is an advanced imaging method to increase spatial resolution of near-infrared (NIR) light measurement. It enables a non-invasive and safe monitoring of biological tissues, such as functional neuroimaging [1–8] and breast cancer detection [9–12]. In measurements, pairs of light sources and detectors are used. NIR light is emitted into the tissue from the source probes and then the light that passes through the tissue is measured by the detector probes. DOT reconstructs the three-dimensional distribution of the optical properties inside the tissue from the measured light intensities by using knowledge of the NIR photon migration process, which is highly diffusive in biological tissues.

NIR light measurement can be classified from two viewpoints: the arrangement of source/detector probes and the instrumentation used. From the first viewpoint, a distinction is made between transmission measurement and reflectance measurement. If NIR light is strong enough to transmit an object, transmission measurement can be made by setting the source probes on one side of the object and the detector probes on the opposite side. However, this is only applicable to thin biological tissues such as those of hands and arms. Otherwise, reflectance measurement, which detects reflected diffusive light by setting source and detector probes on the same side of the object, can be made. Transmission measurement provides richer depth information than does reflectance measurement. From the second viewpoint, a distinction is made between continuous-wave (CW) systems and time-resolved (TR) systems. While a CW system provides only the light intensity of continuous-wave light, a TR system provides a flight time distribution of the incident pulsed light. The TR system provides richer information than does the CW system [13] because the optical paths can be inferred from the flight time in the TR system.

In this paper, we focus on the development of DOT for reflectance and CW measurement because of its potentially wide-ranging applicability. Reflectance measurement is not restricted to the size of the target object. The CW system is low-cost and has a higher frame rate, which makes it possible to measure broad areas of the object almost simultaneously. However, DOT with reflectance and CW measurement is more difficult because it provides less information on inner optical properties, especially depth information, compared with transmission measurement and TR system measurement. Currently, this approach has been investigated in functional neuroimaging studies, and it might also be applicable to breast cancer detection.

Recently, high-density diffuse optical tomography (HD-DOT) has received much attention as a unique solution of DOT for reflectance and CW measurement. This permits three-dimensional reconstruction by using a large number of spatially-overlapping and multi-distance measurement channels by means of high-density probe arrays. Cortical maps of angle and eccentricity in the visual field were measured with a 13-mm probe-interval arrangement [7], and a somatosensory activation of two fingers was discriminated with a 7.5-mm probe-interval arrangement [8].

Although the application of HD-DOT is promising, there are still problems to solve. First, the difficulty in depth localization causes large quantitative errors in the reconstructed optical properties. Second, the reconstructed image becomes blurred due to the nature of diffusive light. Although the above problems can be mitigated by using a higher-density probe arrangement [8], it is also necessary to improve the reconstruction method. If these problems were effectively dealt with by improving this method, we could reduce, to a certain extent, the requirements on probe density. This would reduce the cost and preparation time for experiments. The use of spatial variant regularization is considered an effective approach for the first problem, and the introduction of sparsity is considered an effective approach for the second problem. In this study, we incorporate these approaches in the hierarchical Bayesian estimation method [14,15].

Let us explain the first problem in more detail. NIR light intensity exponentially decreases with increased distance from the source at the surface. Accordingly, the amplitude of sensitivity, which relates the inner optical properties to the measured light intensity, is highly non-uniform in the reflectance measurements. It is strong in the neighborhood of the surface where the light sources and detectors are placed but weak in the deep layers. This leads to the reconstructed images becoming biased toward the surface. This systematic depth-localization error causes large quantitative errors because the sensitivity varies exponentially with depth direction. Several approaches have been proposed to overcome this problem. Pogue et al. proposed spatially variant regularization, in which the regularization is weighted by exponential function according to the distance from the surface [16]. Culver et al. proposed similar spatially variant regularization in which the regularization weight is calculated from the sensitivity matrix [17]. Niu et al. proposed a depth-compensation algorithm in which the sensitivity matrix in each layer is normalized with the maximum singular value of the sensitivity matrix in that layer [18, 19]. These methods are mathematically similar to each other, and they improve the depth localization. However, they are still insufficient if the Tikhonov regularization is used for reconstruction because it cannot localize the solution, which leads to an unwanted compensation to irrelevant regions. The introduction of sparsity, which is explained in the next paragraph, is also required for accurate depth localization.

In regard to the second problem, sparse regularization has recently been applied in optical imaging [20–23]. Previously, the Tikhonov regularization, which is also called the *L*_{2} norm regularization, has been conventionally used in optical imaging because of its ease of implementation and the robustness of this solution. However, the *L*_{2} norm penalty makes the solution overly smoothed because it excessively suppresses strong values. In contrast, the localized solution is obtained by the *L*_{1} norm regularization or another sparse estimation method, which favors the solution with a small number of non-zero elements. Functional neuroimaging and breast cancer detection are considered potential applications of DOT. Task-related brain activities are localized. Tumors in the breast, especially in the early stage, are also localized. For these purposes, accurate localization of activations or tumors is the most crucial issue. However, the solution of the conventional *L*_{2} norm approaches results in localization bias and blurred reconstruction. The sparse estimation method can detect accurate positions with high resolution by using prior information that favors spatially localized patterns. This sparse estimation technique has been investigated for the source-localization problem of electroencephalography (EEG) and magnetoencephalography (MEG) [24, 25]. A hierarchical Bayesian model that incorporates an automatic relevance determination (ARD) prior is another way to introduce sparsity. It has been argued that the ARD approach is superior to separable regularization constraints such as *L*_{1} norm regularization [26]. Since it has advantages in expandability, it has recently been applied in multi-modality imaging, such as the combination of MEG and fMRI [27] and that of EEG and fNIRS [28]. In this study, we introduce sparsity by using the ARD prior in the hierarchical Bayesian model.

We developed a three-dimensional reconstruction method by introducing the above two approaches, that is, spatial variant regularization and sparsity. Then, we examined the proposed method by applying it to the data obtained by a phantom experiment measured with a commercially available near-infrared imaging system. We evaluated the depth accuracy and the spatial resolution of the proposed method by placing one or two absorbers in the phantom. The evaluation was done in the three interval conditions of the probe arrangement (i.e. 26 mm, 18.4 mm, and 13 mm) to investigate the relationship between probe density and estimation accuracy. As a result, we found that the depth of the absorber can be estimated accurately using the proposed method in the high-density arrangements of 18.4-mm and 13-mm probe intervals. It was also possible to discriminate two absorbers whose distance is shorter than the probe interval. These results demonstrate that the proposed method allows three-dimensional reconstruction with high resolution in both depth and horizontal directions.

The following notations are used throughout this paper. Scalars are represented in normal-type letters as *x*. Vectors are represented in lower-case boldface letters as ** x**. Matrices are represented in upper-case boldface letters as

**. The element in the**

*X**i*-th row and the

*j*-th column of matrix

**is denoted as (**

*X***)**

*X**=*

_{ij}*X*. The transpose, trace, and determinant of matrix

_{ij}**are denoted as**

*X*

*X**, tr(*

^{T}**), and |**

*X***|, respectively. A diagonal matrix with diagonal elements**

*X***is denoted as diag(**

*x***). A vector consisting of the diagonal elements of matrix**

*x***is denoted as diag(**

*X***). The**

*X**L*

_{2}norm of vector

**is denoted as ${\Vert \mathit{x}\Vert}_{2}^{2}$. The mean value $\frac{1}{T}{\sum}_{t=1}^{T}{\mathit{x}}_{t}$ is denoted as mean(**

*x***). The**

*x**N*-dimensional vector consisting of 1s and the identity matrix of size

*N*are denoted as

**1**

*and*

_{N}**I**

*, respectively.*

_{N}*N*(

**;**

*x***,**

*μ***Σ**) denotes that a vector random variable

**follows a multivariate Gaussian distribution with mean vector**

*x***and covariance matrix**

*μ***Σ**. Γ(

*x*;

*μ*,

*γ*) denotes that a scalar random variable

*x*follows a gamma distribution with mean

*μ*and shape parameter

*γ*(note that the scale parameter corresponds to

*μγ*

^{−1}).

## 2. Methods

In this DOT study, we reconstruct a three-dimensional distribution of the absorption change in a scattering medium from the observed change in light intensity. First, we describe the relationship between the absorption change and light intensity change in the forward problem. Then, we estimate the distribution of the absorption change given the observed light intensity change in the inverse problem using the relationship calculated in the forward problem.

#### 2.1. Forward problem

In this work, we used continuous-wave intensity data to reconstruct absorption changes. The photon fluence in the baseline condition Φ_{0} changes to Φ in the presence of absorption changes in the scattering medium. Under the condition that the absorption changes are small, the measured changes in photon fluence *ϕ _{pert}* (

*r*,

_{s}*r*) = − ln(Φ(

_{d}*r*,

_{s}*r*)/Φ

_{d}_{0}(

*r*,

_{s}*r*)) are linearly related to the spatial variations in the absorption coefficient

_{d}*δμ*(

_{a}*r*) by the Rytov approximation [29–32]:

*r*and

_{s}*r*denote the positions of source and detector, respectively. Throughout this study, we discretized the scattering medium into 2.5

_{d}*mm*× 2.5

*mm*× 2.5

*mm*voxels. Thus, Eq. (1) can be written in a matrix form as

**=**

*y***, where**

*Ax***= (**

*x**x*

_{1}, ⋯ ,

*x*)

_{N}*represents the absorption changes*

^{T}*x*=

_{i}*δμ*(

_{a}*r*),

_{i}**= (**

*y**y*

_{1}, ⋯ ,

*y*)

_{M}*represents the measured light intensity changes*

^{T}*y*=

_{j}*ϕ*(

_{pert}*r*

_{s}_{(}

_{j}_{)},

*r*

_{d}_{(}

_{j}_{)}), and

**represents the sensitivity matrix ${A}_{ji}=\frac{{\mathrm{\Phi}}_{0}\left({r}_{s(j)},{r}_{i}\right){\mathrm{\Phi}}_{0}\left({r}_{i},{r}_{d(j)}\right)}{{\mathrm{\Phi}}_{0}\left({r}_{s(j)},{r}_{d(j)})\right)}$. Here, the index**

*A**i*denotes the

*i*th voxel (

*i*= 1, ⋯ ,

*N*) and the index

*j*denotes the

*j*th measurement logical channel (

*j*= 1, ⋯ ,

*M*). The value of Φ

_{0}can be calculated by using the Monte Carlo simulation [33] or solving the diffusion equation. In this study, we obtained the value by solving the diffusion equation for a one-layer model [34].

In a measurement duration, a number of sample data were obtained. We summarized the observed light intensity changes in a matrix form as ** Y** = (

*y*_{1}, ⋯ ,

*y**, ⋯ ,*

_{t}

*y**), where*

_{T}*T*denotes the number of samples. Likewise, the absorption changes are summarized as

**= (**

*X*

*x*_{1}, ⋯ ,

*x**, ⋯ ,*

_{t}

*x**).*

_{T}#### 2.2. Inverse problem

The inverse problem in which the absorption change ** x** is estimated from the observed light intensity change

**is ill-posed, since the number of unknowns**

*y**N*is typically much larger than the number of data

*M*. Furthermore, three-dimensional reconstruction is very difficult due to the characteristics of the sensitivity matrix. Sensitivity is exponentially decaying with depth [16], and thus absorption changes tend to be estimated in superficial layers. In order to overcome this difficulty, we introduce sensitivity-normalized regularization and sparsity for better localization. The proposed estimation method consists of two steps. First, we obtain the solution with the sensitivity-normalized Tikhonov regularization. Second, we refine the solution with the hierarchical Bayesian estimation method.

### 2.2.1. Sensitivity-normalized Tikhonov regularization

The Tikhonov regularization is conventionally used to solve this inverse problem, in which the following cost function is minimized [35]:

**Σ**

*is the measurement noise covariance matrix. The first term denotes the sum of squared error, and the second term ${\Vert {\mathit{x}}_{t}\Vert}_{2}^{2}={\sum}_{i=1}^{N}{x}_{it}^{2}$ denotes*

_{y}*L*

_{2}norm regularization to avoid overfitting.

*λ*is the regularization parameter controlling the relative importance of the regularization term compared with the sum of squared error term. In this study, we determined the appropriate

*λ*value by minimizing Akaike’s Bayesian information criterion (or maximizing the marginal likelihood) [36].

Here, we propose sensitivity-normalized Tikhonov regularization that introduces the normalization weight matrix ** D** in the regularization term to balance the error term and the regularization term in each voxel as

**is a diagonal matrix defined as the sum of two terms: the sensitivity normalization term, which is the diagonal matrix of the product of the sensitivity matrix and the measurement noise covariance matrix, and the constant term where**

*D***= (**

*ρ**ρ*

_{1}, ⋯ ,

*ρ*)

_{N}*is the sensitivity normalization term.*

^{T}*β*is a positive control constant.

**has positive diagonal elements because the noise covariance matrix**

*D***Σ**

*is positive definite. The constant term is introduced in order to make the regularization penalty remain constant in deep layers.*

_{y}Sensitivity-normalized regularization works strongly in superficial layers and weakly in deep layers, since the sensitivity exponentially decays with depth [16]. This weight matrix is similar to that of Culver et al. [17, 37, 38] except for the measurement noise covariance matrix term. This is also similar to the weight matrix of Niu et al. [18], but they introduced normalization for each layer. The proposed sensitivity normalization compensates for the non-uniformity of the sensitivity matrix, not only for the depth direction but also for the horizontal direction.

In this study, the control constant *β* was determined with reference to the maximum value of sensitivity for the region deeper than 22.5 mm from the surface: *β* = max(*ρ _{i}*),

*i*∈

*I*

_{≥22.5}

*, where*

_{mm}*I*

_{≥22.5}

*denotes the set of voxel indices deeper than 22.5 mm, i.e. maximum is taken over the voxels deeper than 22.5 mm.*

_{mm}In the case of *L*_{2} norm regularization, the solution ** x**̂

*can be obtained in an explicit form:*

_{D}*β*, the weight

**is almost the same as**

*D**β*in layers deeper than 22.5 mm:

*D*∼

_{ii}*β*≫

*ρ*. Consequently, a solution for layers deeper than 22.5 mm is suppressed by the regularization term. In contrast, the weight

_{i}**is almost the same as the normalization term in layers shallower than 22.5 mm:**

*D**D*∼

_{ii}*ρ*≫

_{i}*β*. Therefore, the amplitudes of the first and second terms in the brackets of Eq. (6) are balanced. Accordingly, the solution is appropriately regularized regardless of the exponentially non-uniform sensitivity.

### 2.2.2. Hierarchical Bayesian estimation

In this section, we introduce the hierarchical Bayesian method in order to refine the solution ** x**̂

*, using it as an initial value. As already noted in the previous section, the Tikhonov regularization penalizes the solution uniformly at every position. Sensitivity-normalized regularization improves the solution by using a weighted penalty according to the sensitivity value at each position. However, it is still insufficient to localize the solution, which is biased as examined in section 4.1. The hierarchical Bayesian estimation method introduces the regularization parameters*

_{D}*λ*at each voxel position by an automatic relevance determination (ARD) prior [35, 39]. These parameters

_{i}*λ*, which control the degree of penalty, are automatically adjusted from the data. As a result, the solution becomes sparse and high-resolution. However, the estimation of

_{i}*λ*values becomes a nonlinear optimization, which requires an appropriate initial value. Therefore, we set the solution of the sensitivity-normalized regularization

_{i}**̂**

*x**as an initial value of*

_{D}*λ*. Note that the result of the hierarchical Bayesian method becomes biased toward the surface if the solution of the conventional Tikhonov regularization is set as an initial value.

_{i}In the hierarchical Bayesian estimation method, observation and regularization are described as hierarchical probabilistic models. The following models and algorithm were originally developed in the context of the electro-magnetic source localization problem [27]. In this paper, we, for the first time, introduce the ARD prior for CW-DOT and propose combining its sparse-promoting nature with the initial value obtained by the sensitivity-normalized regularization method. This combination makes it possible to obtain tomography with high spatial resolution and high depth accuracy.

First, the observation model is constructed assuming a Gaussian noise in measuring light intensity change. This assumption is validated by the phantom experimental data. We also assume that the noise covariance matrix in the absorption change measurement is proportional to the noise covariance matrix measured under the baseline condition **Σ*** _{y}*. Then, the conditional probability for the observed light-intensity change

**given the absorption change**

*Y***is given by**

*X**σ*is the scale parameter inversely proportional to the measurement noise variance. The scale parameter

*σ*is estimated by introducing a hierarchical prior, as

Second, the regularization model is constructed by adopting the ARD prior. In order to prevent the solution from becoming too sparse, a spatially localized smoothing filter ** W** is incorporated. We assume that the solution

**is represented as**

*x***=**

*x***, where**

*Wz***denotes the sparse expression of the solution. The ARD prior is adopted for**

*z***. The probability distribution of**

*z***is given by a Gaussian distribution with a zero mean vector and a diagonal covariance matrix whose diagonal elements are adjustable hyperparameters:**

*z***Λ**= diag(

**), and**

*λ***= (**

*λ**λ*

_{1},

*λ*

_{2}, ⋯ ,

*λ*)

_{N}*represent the adjustable hyperparameters. In this study, a 5-mm FWHM Gaussian filter was used as*

^{T}**. Equation (9) can be equivalently rewritten as**

*W*The hyperparameter ** λ** is estimated by introducing a hierarchical prior:

*λ*̄

_{0}

*represents the mean value of*

_{i}*λ*in the hierarchical prior, and

_{i}*γ*

_{0}

*represents the reliability (or confidence) of the hierarchical prior [27].*

_{i}In the Bayesian estimation, the posterior probability of the unknown ** X**,

**,**

*λ**σ*given the observation data

**is written as**

*Y**Q*(

**,**

*X***,**

*λ**σ*), which is the approximation of the posterior

*P*(

**,**

*X***,**

*λ**σ*|

**), is defined by**

*Y**F*(

*Q*) is equivalent to the minimization of the Kullback-Leibler distance between the posterior and the trial distribution.

The maximization of the free energy can be solved using a factorization approximation restricting the solution space [40, 41]:

*Q*and

_{X}*Q*. The convergence of the free energy was guaranteed by the previous theoretical studies [26, 40, 41]. In the first step (

_{λ}*X*-step), the free energy

*F*(

*Q*) is maximized with respect to

*Q*while

_{X}*Q*is fixed. The solution

_{λ}*Q*(

_{X}**) becomes a normal distribution [27]:**

*X***̄**

*x**and*

_{t}**Σ**

*denote the expectation value and covariance matrix of*

_{x}

*x**, respectively. In the second step (*

_{t}*λ*-step), the free energy

*F*(

*Q*) is maximized with respect to

*Q*while

_{λ}*Q*is fixed. The solution

_{X}*Q*(

_{λ}**,**

*λ**σ*) becomes a gamma distribution:

*λ*̄

*and*

_{i}*γ*denote the expectation and shape parameter values of

_{i}**, respectively, and**

*λ**σ*̄ and

*γ*denote the expectation and shape parameter values of

_{σ}*σ*, respectively. The above

*X*- and

*λ*-steps are repeated until the free energy converges (see Appendix).

In this procedure, the free energy function can be expressed as

*f*(

**̄,**

*λ**σ*̄,

**Σ**

*) represents a sparse-promoting prior function. Its detailed form is explained in the Appendix.*

_{x}In the Tikhonov regularization of Eq. (2), the solution is uniformly suppressed by the single parameter *λ* shared by all voxel positions. In contrast, the regularization parameters ** λ** = (

*λ*

_{1},

*λ*

_{2}, ⋯ ,

*λ*)

_{N}*are appropriately adjusted at every voxel position in this hierarchical Bayesian estimation method. Therefore, a sparse solution is obtained [35, 39].*

^{T}In this study, the reliability value of the hierarchical prior was set as *γ*_{0} = (*γ*_{01}, ⋯ , *γ*_{0}* _{N}*)

*=*

^{T}**0**. Thus, the prior of Eq. (11) becomes non-informative, and the value

*λ*

_{0}

*does not affect the probabilistic model and the iterative algorithm. In this case,*

_{i}**becomes sparse [35, 39], which means that the elements of**

*z***converge to zeros in most of the voxels and non-zeros in the remaining small number of voxels. Therefore, the solution**

*z***becomes a mixture of a small number of local activities smoothed by**

*x***. In numerical calculation,**

*W***becomes very small values in most of the voxels if the algorithm is terminated by some convergence criteria. In the case where informative prior**

*z**γ*

_{0}

*> 0,*

_{i}**does not converge to zeros because of the prior information. However, if**

*z**γ*

_{0}

*is small, the values of*

_{i}**become numerically very small in most of the voxels. We incorporated the result of the sensitivity-normalized Tikhonov regularization as an initial value of the iterative algorithm as**

*z***̄**

*λ**= (10 · mean(*

_{init}**̂**

*x**))*

_{D}^{−2}(see Appendix). The ten-times value of mean(

**̂**

*x**) was used because the solution of the Tikhonov regularization is overly smoothed.*

_{D}## 3. Phantom experiment

#### 3.1. Experimental conditions

We evaluated the estimation accuracy of the proposed method using a liquid phantom experiment as shown in Fig. 1. The tank was filled with a solution of 1.4% intralipid and ink to make the optical properties of the liquid similar to that of gray and white matter [6, 7]. The estimated absorption and reduced scattering coefficients are *μ _{a}* = 0.019

*mm*

^{−1}and

*μ*′ = 1.1

_{s}*mm*

^{−1}, respectively, based on measurement with a spectral photometer and a near-infrared imaging system.

We used a continuous-wave near-infrared imaging system (FOIRE-3000, Shimadzu Corporation) to acquire the data for the reconstruction of the absorption change. We used only 805-nm wavelength data. The source and detector probes are placed on the bottom alternately in a rectangular arrangement as shown in Fig. 1(c). The sampling rate is 130 ms. We measured for 20 seconds each time, with and without absorbers. We tested the estimation results under three conditions of probe interval: *l* = 13 mm, 18.4 mm (
$=13\hspace{0.17em}\text{mm}\times \sqrt{2}$), 26 mm (=13 mm×2).

In experiments, one or two spherical absorbers of 5 mm in diameter were placed in the liquid phantom. These are black absorbers made of polyacetal (Fig. 1(b)). First, we evaluated the positional accuracy of the proposed method using one absorber placed in four *x*–*y* positions (see section 4.2): (i) center of the four probes, (ii) midpoint of the source and the detector, (iii) beneath the source, (iv) beneath the detector, as shown in Fig. 1(d). Second, we evaluated the spatial resolution of the proposed method using two absorbers. The first absorber is placed in (ii) and the second absorber is placed in (v), separated by *d* from (ii) (see section 4.3). Finally, we evaluated the depth accuracy in a more general setting where the two absorbers are placed in (ii) and (v) at different depths (see section 4.4). One example of this experiment is shown first in section 4.1 for understanding the main idea of the proposed method.

The number of voxels *N* was 7500 = 25 × 25 × 12 (we assumed that absorption changes occur inside the region of 62.5*mm* × 62.5*mm* × 30*mm*). The number of measurement logical channels *M* was 56 (*l* = 13 mm, up to 3rd nearest-neighbour(NN) channels), 48 (*l* = 18.4 mm, up to 2nd NN channels), or 24 (*l* = 26 mm, up to 1st NN channels).

## 4. Results

#### 4.1. Example of 3D reconstruction

In this section, we first explain the main idea of the proposed method by applying it to one example of a 3D reconstruction in the phantom experiment. The example data was obtained by the liquid phantom experiment with an arrangement of 18.4-mm probe intervals. In this case, two absorbers were used to generate the absorption change. The positions of the centers of the spherical absorbers were (ii) (−9.2, 0, 10) [mm] and (v) (3.3, 0, 15) [mm] as shown in Fig. 2(a), where the horizontal distance *d* is 12.5 mm and the vertical distance is 5 mm. Therefore, the distance between the two absorbers is shorter than the probe interval of 18.4 mm. Note that the coordinate system of Fig. 2 is flipped upside down to that of Fig. 1 in order to make it similar to that of functional neuroimaging, in which the probes are usually placed on top of the surface.

Figure 2(b) shows the estimation result of the Tikhonov regularization with spatially uniform regularization, where *D* is the identity matrix. The solution is biased toward superficial layers because the sensitivity is large in superficial layers while the effect of the regularization suppression is uniform and independent of the depth. This depth localization error makes the reconstructed absorption change too small. Moreover, the solution is overly smoothed and the two absorbers cannot be distinguished.

Figure 2(c) shows the estimation result of the sensitivity-normalized Tikhonov regularization. Although the solution looks biased toward deep layers in contrast to the previous result of Fig. 2(b), this seems to be a reasonable result. Typically, a weak absorption change in superficial layers and a strong absorption change in deep layers generate similar light intensity changes. Since the Tikhonov regularization method cannot localize the solution, the solution might become a mixture state of a superficial weak change and a deep strong change. In the figure, the deep strong values are more visible than the superficial weak values. Values deeper than 22.5 mm are suppressed by the introduction of the control constant *β* as explained in section 2.2.1.

Figure 2(d) shows the estimation result of the hierarchical Bayesian estimation method starting from the sensitivity-normalized Tikhonov regularization solution. By using the Bayesian algorithm incorporating the automatic relevance determination prior, the appropriate solution is selected from the mixture state solution. The two absorbers can be accurately distinguished and the positional error is smaller than the voxel size of 2.5 mm. This result is summarized again in section 4.4 (*l* = 18.4 mm, *d* = 12.5 mm, *z* = 10 & 15 mm).

#### 4.2. One-absorber experiment

In this section, we evaluate the horizontal and depth estimation accuracies of the proposed method by using a one-absorber setting. We first evaluated the estimation results at various depths of a spherical absorber’s center, *z* = 7.5 mm, 10 mm, 12.5 mm, 15 mm, 17.5 mm, 20 mm, 22.5 mm, while the horizontal position was fixed. The analysis was done under three conditions of probe interval: *l* = 13 mm, 18.4 mm, 26 mm.

First, the absorber was placed in center position (i) (Fig. 1(d)). Figure 3(a) shows the estimation results for various depth and probe-interval conditions. Here, the 3D estimation results are represented as the 2D *x*–*z* projection images to clearly visualize the estimation accuracy. In the figure, the maximum intensity of the three central voxel regions in the *y* direction (7.5-mm-thick layer in Fig. 3(b)), including the true position of the 5-mm-diameter spherical absorber, was projected onto the *x*–*z* plane in order to check the positional accuracy in the *y* direction. If a positional error occurs in the *y* direction, the estimation value in this region becomes nearly zero. The white circle in the projection images in Fig. 3(a) represents the true position of the absorption change. We judged the estimation results as successful when the positional error between the center position of the true absorption change and the voxel center of the maximum estimation value was within one voxel (2.5 mm) in each *x*, *y*, *z* direction and when the maximum estimation value was greater than 0.025*mm*^{−1}. The estimation image is surrounded by a bold frame if the estimation result was judged as successful. Figure 3(c) represents the depth of the maximum estimation value exceeding 0.025*mm*^{−1} as a function of the true depth of the absorption change.

The first line from the top of Fig. 3(a) shows the estimation results for the 26-mm probe-interval condition. The depths of the maximum estimation value are the same regardless of the variations in the true depth conditions from *z* = 7.5 mm to *z* = 15 mm (see also Fig. 3(c)). Furthermore, the amplitudes of the estimated absorption change strongly vary depending on the true depth conditions. These results indicate that the depth of the absorption change cannot be estimated at all in the 26-mm probe-interval condition. In the 26-mm interval condition, we could use only 1st nearest-neighbour (NN) channels because 2nd NN separation, which is 58.1 mm, is too distant to measure the light intensity changes [38]. Therefore, the depth information of the absorption change could not be obtained without overlapping measurements [38]. Moreover, no absorption change deeper than *z* = 17.5 mm could be detected.

The second line from the top of Fig. 3(a) shows the estimation results for the 18.4-mm probe-interval condition. The absorption changes are accurately estimated from *z* = 7.5 mm to *z* = 20 mm, where the positional errors are within one voxel (see also Fig. 3(c)). The amplitudes of the estimated absorption change are maintained at a roughly constant level. In the 18.4-mm interval condition, we could use the overlapping measurement channels of the 1st and 2nd NN channels, whose separations are 18.4 mm and 41.1 mm, respectively. No absorption change deeper than *z* = 22.5 mm could be detected.

The third line from the top of Fig. 3(a) shows the estimation results for the 13-mm probe-interval condition. The absorption changes are accurately estimated from *z* = 7.5 mm to *z* = 22.5 mm, where the positional errors are within one voxel (see also Fig. 3(c)). The amplitudes of the estimated absorption change are maintained roughly constant. In the 13-mm interval condition, we could use the 1st, 2nd, and 3rd NN channels, whose separations are 13 mm, 29.1 mm and 39 mm, respectively. No absorption change deeper than *z* = 25 mm could be detected. These results show that, with overlapping measurement channels, the depth of the absorption change is accurately estimated.

Next, the estimation accuracy with respect to horizontal position was evaluated by applying the same analyses in four conditions of the *x*–*y* absorber position: (i) center, (ii) midpoint, (iii) source, and (iv) detector (Fig. 1(d)). By deepening the spherical absorber by 2.5 mm from *z* = 7.5 mm, we investigated the depth limit where successful estimation is possible, which means that the positional error is within one voxel (2.5 mm) in each *x*, *y*, *z* direction and the maximum detected value is greater than 0.025*mm*^{−1}. The summarized data are shown in Table 1 under the various conditions of the *x*–*y* absorber position and the probe interval. Note that in the 26-mm probe-interval condition, the estimation failed in every case because the depth could not be estimated at all. The value in the parentheses represents the depth limit where positional accuracy is evaluated with respect to only the *x*, *y* direction for reference.

In the 26-mm interval condition, sensitivity is weakest in the center (i). In contrast, in the 18.4-mm and 13-mm interval conditions, in which overlapping measurement is possible, estimation accuracy is relatively weak at just below the source (iii) and the detector (iv).

As the probe interval becomes shorter, a deeper absorption change can be detected. This is because there are measurement channels in suitable positions owing to the high-density arrangement and the increasing number of simultaneous independent observations by the high-density overlapping measurements.

#### 4.3. Two-absorber experiment 1 (at the same depth)

In this section, we evaluate the spatial resolution of the proposed method by a two-absorber experiment. In this phantom experiment, two spherical absorbers were placed at the same depth in (ii) (
$-\frac{l}{2}$, 0, *z*) and (v) (
$-\frac{l}{2}+d$, 0, *z*) (Fig. 1(d)), where *d* denotes the horizontal distance, *z* denotes the depth of the two absorbers, and *l* denotes the probe interval. We evaluated the estimation results under two conditions of probe interval, 18.4 mm and 13 mm, because the 26-mm probe-interval condition cannot obtain good spatial resolution due to the lack of overlapping measurements.

Figure 4 shows the estimation results for the 18.4-mm probe-interval condition. The estimations are tested under four conditions of the horizontal distance, *d* = 17.5 mm, 15 mm, 12.5 mm, 10 mm, and under four conditions of the depth, *z* = 12.5 mm, 15 mm, 17.5 mm, 20 mm. The 3D estimation results are represented as the 2D *x*–*z* intensity projection images in the same way as Fig. 3(a). The white circles represent the true positions of the two absorbers. We judged the estimation results as successful if the positional errors of both absorbers were within one voxel (2.5 mm) in each *x*, *y*, *z* direction and both maximum estimation values were greater than 0.025*mm*^{−1}. The estimation image is surrounded by a bold frame if the estimation result was judged as successful.

In the case of the horizontal distance *d* = 17.5 mm, which is almost the same as the probe interval, the estimation was accurate until the depth condition of *z* = 17.5 mm as shown in Fig. 4 (*d* = 17.5 mm, *z* = 17.5 mm). In the depth condition of *z* = 20 mm, which corresponds to the detectable depth limit as summarized in Table 1, the two absorbers could not be distinguished.

Even in the case of the horizontal distance *d* = 10 mm, which is nearly half the probe interval, and where the clearance distance of the two spherical absorbers is 5 mm, the estimation was accurate until the depth condition of *z* = 15 mm as shown in Fig. 4 (*d* = 10 mm, *z* = 15 mm).

The same analysis was also done under the 13-mm probe-interval condition. Table 2 summarizes the depth limits where successful estimation is possible.

The estimation results of the 13-mm probe-interval condition were better than those of the 18.4-mm condition. However, the depth limit *z* = 17.5 mm in the horizontal distance *d* = 15 mm seems unsatisfactory compared to the detectable depth limit *z* = 22.5 mm as summarized in Table 1 (13-mm probe interval). In the 4 × 4 rectangular arrangement we used, the numbers of 1st NN, 2nd NN, and 3rd NN channels are 24, 24, and 8, respectively. The number of the 3rd NN channels, which are sensitive to deep absorption changes, would not be high enough to achieve a high spatial resolution in deep layers. Additionally, the estimation results may suffer from the nonlinear effect, especially under the 13-mm probe-interval condition due to the use of the strong absorber, which will be discussed in section 5.

#### 4.4. Two-absorber experiment 2 (at different depths)

Next, we conducted a two-absorber experiment at different depths to confirm the depth accuracy of the proposed method in more general settings. The two spherical absorbers were placed at depths differing by 5 mm in (ii) (
$-\frac{l}{2}$, 0, *z* −5.0) and (v) (
$-\frac{l}{2}+d$, 0, *z*) (Fig. 1(d)), where *d* denotes the horizontal distance and *l* denotes the probe interval. We evaluated the estimation results under two conditions of probe interval, 18.4 mm and 13 mm.

Figure 5 shows the estimation results under the 18.4-mm probe-interval condition. The estimations are tested under four conditions of horizontal distance, *d* = 17.5 mm, 15 mm, 12.5 mm, 10 mm, and under four conditions of depth, *z* = 7.5 & 12.5 mm, 10 & 15 mm, 12.5 & 17.5 mm, 15 & 20 mm. The 3D estimation results are represented as the 2D *x*–*z* intensity projection images in the same way as Fig. 3(a) and Fig. 4. The white circles represent the true positions of the two absorbers. We judged the estimation results as successful if the positional errors of both absorbers were within one voxel (2.5 mm) in each *x*, *y*, *z* direction and both maximum estimation values were greater than 0.025*mm*^{−1}. The estimation image is surrounded by a bold frame if the estimation result was judged as successful.

In the case of the horizontal distance *d* = 17.5 mm, which is almost the same as the probe interval, the estimation was successful until the depth condition of *z* = 15 mm & 20 mm as shown in Fig. 5 (*d* = 17.5 mm, *z* = 15 & 20 mm). The absorber at *z* = 20 mm, which corresponds to the detectable depth limit as summarized in Table 1, could be estimated accurately undisturbed by the other absorber.

In the case of the horizontal distance *d* = 12.5 mm, which is much shorter than the probe interval, the estimation was successful until the depth condition of *z* = 12.5 & 17.5 mm as shown in Fig. 5 (*d* = 12.5 mm, *z* = 12.5 & 17.5 mm). However, in the case of the horizontal distance *d* = 10 mm, the estimation was difficult but successful until the depth condition of *z* = 7.5 & 12.5 mm as shown in Fig. 5 (*d* = 10 mm, *z* = 7.5 & 12.5 mm).

The same analysis was also done for the 13-mm probe-interval condition. Table 3 summarizes the depth limits where successful estimation is possible.

The estimation results under the 13-mm probe-interval condition were the same as those under the 18.4-mm condition in spite of the higher density. The first reason would be due to the probe arrangement we use. The number of 3rd NN channels is small as mentioned in section 4.3, and the 1st NN channels of the 13-mm interval condition are sensitive in only superficial layers compared to those of the 18.4-mm interval condition. The second reason would be the nonlinear effect as mentioned in section 4.3.

## 5. Discussion

We proposed a three-dimensional reconstruction method for diffuse optical tomography. The main feature of the method is its high resolution in both depth and horizontal directions. The key concept that achieves this feature is the introduction of spatially variant regularization and sparsity. These are introduced in two steps. First, a provisional solution is obtained by the sensitivity-normalized Tikhonov regularization. Second, the solution is refined by the hierarchical Bayesian estimation method.

The proposed method was evaluated by applying it to the data obtained from the phantom experiment under three conditions of probe interval: 26 mm, 18.4 mm, 13 mm. This phantom experiment is consistent with our prior assumption for sparsity. Under the 26-mm interval condition, three-dimensional reconstruction failed because there was no overlapping channel due to the large distance of 2nd NN channels. In contrast, three-dimensional reconstruction was successful in the 18.4-mm and 13-mm probe-interval conditions. Under these conditions, the two absorbers whose distance is shorter than the probe interval were accurately estimated. Even when the horizontal distance was 10 mm, which is almost half the probe interval 18.4 mm, the two absorbers were accurately distinguished. Moreover, as the probe interval became shorter, deeper absorption change could be detected. The depth limit where accurate estimation was possible was around 20 mm under the 18.4-mm or 13-mm probe-interval condition as summarized in Table 1. This 20-mm depth corresponds to a depth of 24 mm with respect to the light intensity beneath the source when the medium is a head-like multilayer [4] consisting of a 3-mm scalp (*μ _{a}* = 0.018

*mm*

^{−1}and

*μ*′ = 0.7

_{s}*mm*

^{−1}), a 10-mm skull (

*μ*= 0.013

_{a}*mm*

^{−1}and

*μ*′ = 0.9

_{s}*mm*

^{−1}), 2-mm CSF (

*μ*= 0.004

_{a}*mm*

^{−1}and

*μ*′ = 0.3

_{s}*mm*

^{−1}), and 20-mm gray & white matter (

*μ*= 0.019

_{a}*mm*

^{−1}and

*μ*′ = 1.1

_{s}*mm*

^{−1}).

For the sensitivity-normalized regularization, an appropriate control constant value *β* is required in Eq. (4). We determined the *β* value with reference to the sensitivity of a 22.5-mm layer. The small change of the *β* value does not affect the results of the hierarchical Bayesian estimation method. We confirmed that the depth limits of accurate estimation summarized in Tables 1, 2, 3 do not change even if the *β* value is increased by half or decreased by half. However, if *β* is too large, deep activities are excessively suppressed in a regularized solution. Then, the deep activities cannot be estimated by the hierarchical Bayesian estimation. If *β* is too small, false solutions appear in deep layers in a regularized solution because the regularization penalty becomes too small. This also leads to false solutions in the hierarchical Bayesian estimation. The appropriate value of *β* should be determined by considering the target region of absorption changes and the SN ratio of the measurement devices.

Finally, it should be noted that the linear Rytov approximation Eq. (1) may be violated in this phantom experiment due to the use of a strong absorber. Hence, we checked the nonlinear effect by using the Monte Carlo simulations [33]. We compared the light intensity change calculated by linear Rytov approximation and that calculated by Monte Carlo simulation, which includes the nonlinear effect. We found the following two phenomena. First, the estimated values of the absorption change were smaller than the true value as investigated in [30,42]. Second, the nonlinearity effects of the observed light intensity change depend on the positions of the source, the detector, and the absorber. These effects tended to be stronger in measurement channels where the light intensity changes were large, and stronger under the 13-mm probe-interval condition than under the 18.4-mm and 26-mm conditions. In order to take account of these nonlinear effects in the linear model, we considered nonlinear effects as noise terms that are proportional to the squared observation values. This procedure can be considered a simplified form of the second-order perturbation expansion of the nonlinear model. This was done only in the 13-mm probe-interval condition (see Appendix). The estimation results became inaccurate without these noise terms. Of course, when the linear approximation is valid, accurate estimation results are obtained without these terms, which we confirmed through a simulation study. Although we treated the nonlinear effect as noise terms, the estimation results may improve if we can apply the nonlinear effect appropriately in the forward model.

## Appendix

## Noise model for nonlinear effects

Since the nonlinear effect is relatively large in the 13-mm probe-interval condition, the value proportional to the squared observation value, mean(*y _{j}*)

^{2}, is added in the diagonal element of the noise covariance matrix

**Σ**

*. This means that we considered the nonlinear effect as a noise in this study. These squared values are added only in the 13-mm probe-interval condition and not in the 18.4-mm or 26-mm interval condition.*

_{y}## Free energy function

The free energy function can be calculated under the factorization assumption,

*x̄**,*

_{t}**, and**

*λ̄**σ̄*represent the expectation values of

*x**,*

_{t}**, and**

*λ**σ*with respect to a trial distribution

*Q*, respectively.

**Σ**

*represents the covariance matrix of*

_{x}**with respect to a trial distribution**

*x**Q*.

## Estimation algorithm

- Hierarchical prior values and initial values are set.
- In this study, we set hierarchical prior values as
*γ*_{0}=**0**and initial values as=*λ̄**λ̄*= (10 · mean(_{init}̂*x*))_{D}^{−2},*σ*̄ = 1. - The free energy is maximized by repeating the
*X*-step and*λ*-step alternately. The leading order of the computational cost is*o*(*M*^{2}*N*) for each iteration step when*N*≫*M*. In the following equations, := denotes the assignment operator, namely, the value of the left-hand side variable is updated by the current value of the right-hand side equations.We repeated these steps until the following two criteria were met. The change ratio of the free energy is smaller than 10

^{−5}. The iteration count reaches 1000 times.[

*X*-step]In the

*X*-step, the following values are calculated:$$\mathbf{\Sigma}:={\overline{\sigma}}^{-1}+{\mathbf{\Sigma}}_{y}+\mathit{AW}{\overline{\mathbf{\Lambda}}}^{-1}{\mathit{W}}^{T}{\mathit{A}}^{T},$$$$\mathit{h}:=\text{diag}\left[{\overline{\mathbf{\Lambda}}}^{-1}{\mathit{W}}^{T}{\mathit{A}}^{T}{\mathbf{\Sigma}}^{-1}\mathit{Y}{\mathit{Y}}^{T}{\mathbf{\Sigma}}^{-1}\mathit{AW}+T\left({\mathbf{I}}_{N}-{\overline{\mathbf{\Lambda}}}^{-1}{\mathit{W}}^{T}{\mathit{A}}^{T}{\mathbf{\Sigma}}^{-1}\mathit{AW}\right)\right],$$$$S:=\text{tr}\left[{\overline{\sigma}}^{-1}{\mathbf{\Sigma}}_{y}{\mathbf{\Sigma}}^{-1}\mathit{Y}{\mathit{Y}}^{T}{\mathbf{\Sigma}}^{-1}+T{\overline{\mathbf{\Lambda}}}^{-1}{\mathit{W}}^{T}{\mathit{A}}^{T}{\mathbf{\Sigma}}^{-1}\mathit{AW}\right].$$The expectation and covariance ofcan be calculated using these values as*X*̄ =*X**W***Λ**̄^{−1}*W*^{T}*A*^{T}**Σ**^{−1}and ${\mathbf{\Sigma}}_{x}^{-1}=\overline{\sigma}{\mathit{A}}^{T}{\mathbf{\Sigma}}_{y}^{-1}\mathit{A}+{\mathit{W}}^{-1T}\overline{\mathbf{\Lambda}}{\mathit{W}}^{-1}$, respectively.*Y*[

*λ*-step] In the*λ*-step, the expectations and shape parameters of**Λ**and*σ*are calculated as$${\gamma}_{i}{\overline{\lambda}}_{i}^{-1}:={\gamma}_{0i}{\overline{\lambda}}_{0i}^{-1}+\frac{1}{2}{h}_{i}{\overline{\lambda}}_{i}^{-1},$$After the convergence criteria are met,

*X*-step is performed one more step. Then, the estimation result is obtained as

## Acknowledgments

The authors thank Makoto Fukushima for helpful comments. This research was supported by a contract with the National Institute of Information and Communications Technology entitled, ’Multimodal integration for brain imaging measurements’. A part of this research was supported by SRPBS, MEXT.

## References and links

**1. **A. Villringer, J. Planck, C. Hock, L. Schleinkofer, and U. Dirnagl, “Near infrared spectroscopy (NIRS): a new tool to study hemodynamic changes during activation of brain function in human adults,” Neurosci. Lett. **154**, 101–104 (1993). [CrossRef] [PubMed]

**2. **Y. Hoshi and M. Tamura, “Detection of dynamic changes in cerebral oxygenation coupled to neuronal function during mental work in man,” Neurosci. Lett. **150**, 5–8 (1993). [CrossRef] [PubMed]

**3. **A. Maki, Y. Yamashita, Y. Ito, E. Watanabe, Y. Mayanagi, and H. Koizumi, “Spatial and temporal analysis of human motor activity using noninvasive NIR topography,” Med. Phys. **22**, 1997–2005 (1995). [CrossRef] [PubMed]

**4. **E. Okada, M. Firbank, M. Schweiger, S. R. Arridge, M. Cope, and D. T. Delpy, “Theoretical and experimental investigation of near-infrared light propagation in a model of the adult head,” Appl. Opt. **36**, 21–31 (1997). [CrossRef] [PubMed]

**5. **A. P. Gibson, J. C. Hebden, and S. R. Arridge, “Recent advances in diffuse optical imaging,” Phys. Med. Biol. **50**, R1–R43 (2005). [CrossRef] [PubMed]

**6. **D. A. Boas and A. M. Dale, “Simulation study of magnetic resonance imaging-guided cortically constrained diffuse optical tomography of human brain function,” Appl. Opt. **44**, 1957–1968 (2005). [CrossRef] [PubMed]

**7. **B. W. Zeff, B. R. White, H. Dehghani, B. L. Schlaggar, and J. P. Culver, “Retinotopic mapping of adult human visual cortex with high-density diffuse optical tomography,” Proc. Natl. Acad. Sci. U.S.A. **104**, 12169–12174 (2007). [CrossRef] [PubMed]

**8. **C. Habermehl, S. Holtze, J. Steinbrink, S. P. Koch, H. Obrig, J. Mehnert, and C. H. Schmitz, “Somatosensory activation of two fingers can be discriminated with ultrahigh-density diffuse optical tomography,” NeuroImage **59**, 3201–3211 (2011). [CrossRef] [PubMed]

**9. **J. P. Culver, R. Choe, M. J. Holboke, L. Zubkov, T. Durduran, A. Slemp, V. Ntziachristos, B. Chance, and A. G. Yodh, “Three-dimensional diffuse optical tomography in the parallel plane transmission geometry: evaluation of a hybrid frequency domain/continuous wave clinical system for breast imaging,” Med. Phys. **30**, 235–247 (2003). [CrossRef] [PubMed]

**10. **A. Li, E. L. Miller, M. E. Kilmer, T. J. Brukilacchio, T. Chaves, J. Stott, Q. Zhang, T. Wu, M. Chorlton, R. H. Moore, D. B. Kopans, and D. A. Boas, “Tomographic optical breast imaging guided by three-dimensional mammography,” Appl. Opt. **42**, 5181–5190 (2003). [CrossRef] [PubMed]

**11. **H. Dehghani, M. E. Eames, P. K. Yalavarthy, S. C. Davis, S. Srinivasan, C. M. Carpenter, B. W. Pogue, and K. D. Paulsen, “Near infrared optical tomography using NIRFAST: Algorithm for numerical model and image reconstruction,” Commun. Numer. Methods Eng. **25**, 711–732 (2008). [CrossRef] [PubMed]

**12. **Q. Fang, J. Selb, S. A. Carp, G. Boverman, E. L. Miller, D. H. Brooks, R. H. Moore, D. B. Kopans, and D. A. Boas, “Combined optical and X-ray tomosynthesis breast imaging,” Radiology **258**, 89–97 (2011). [CrossRef]

**13. **F. Gao, H. Zhao, and Y Yamada, “Improvement of image quality in diffuse optical tomography by use of full time-resolved data,” Appl. Opt. **41**, 778–791 (2002). [CrossRef] [PubMed]

**14. **M. Guven, B. Yazici, X. Intes, and B. Chance, “Diffuse optical tomography with a priori anatomical information,” Phys. Med. Biol. **50**, 2837–2858 (2005). [CrossRef] [PubMed]

**15. **F. Abdelnour, C. Genovese, and T. Huppert, “Hierarchical Bayesian regularization of reconstructions for diffuse optical tomography using multiple priors,” Biomed. Opt. Express **1**, 1084–1103 (2010). [CrossRef]

**16. **B. W. Pogue, T. O. McBride, J. Prewitt, U. L. Österberg, and K. D. Paulsen, “Spatially variant regularization improves diffuse optical tomography,” Appl. Opt. **38**, 2950–2961 (1999). [CrossRef]

**17. **J. P. Culver, T. Durduran, D. Furuya, C. Cheung, J. H. Greenberg, and A. G. Yodh, “Diffuse optical tomography of cerebral blood flow, oxygenation, and metabolism in rat during focal ischemia,” J. Cereb. Blood Flow Metab. **23**, 911–924 (2003). [CrossRef] [PubMed]

**18. **H. Niu, F. Tian, Z. J. Lin, and H. Liu, “Development of a compensation algorithm for accurate depth localization in diffuse optical tomography,” Opt. Lett. **35**, 429–431 (2010). [CrossRef] [PubMed]

**19. **H. Niu, Z. J. Lin, F. Tian, S. Dhamne, and H. Liu, “Comprehensive investigation of three-dimensional diffuse optical tomography with depth compensation algorithm,” J. Biomed. Opt. **15**, 046005 (2010). [CrossRef] [PubMed]

**20. **N. Cao, A. Nehorai, and M. Jacobs, “Image reconstruction for diffuse optical tomography using sparsity regularization and expectation-maximization algorithm,” Opt. Express **15**, 13695–13708 (2007). [CrossRef] [PubMed]

**21. **P. Mohajerani, A. A. Eftekhar, J. Huang, and A. Adibi, “Optimal sparse solution for fluorescent diffuse optical tomography: theory and phantom experimental results,” Appl. Opt. **46**, 1679–1685 (2007). [CrossRef] [PubMed]

**22. **Y. Lu, X. Zhang, A. Douraghy, D. Stout, J. Tian, T. F. Chan, and A. F. Chatziioannou, “Source reconstruction for spectrally-resolved bioluminescence tomography with sparse a priori information,” Opt. Express **17**, 8062–8080 (2009). [CrossRef] [PubMed]

**23. **S. Okawa, Y. Hoshi, and Y. Yamada, “Improvement of image quality of time-domain diffuse optical tomography with lp sparsity regularization,” Biomed. Opt. Express **2**, 3334–3348 (2011). [CrossRef] [PubMed]

**24. **D. Wipf and S. Nagarajan, “A unified Bayesian framework for MEG/EEG source imaging,” Neuroimage **44**, 947–966 (2009). [CrossRef]

**25. **F. Lucka, S. Pursiainen, M. Burger, and C. H. Wolters, “Hierarchical Bayesian inference for the EEG inverse problem using realistic FE head models: depth localization and source separation for focal primary currents,” Neuroimage **61**, 1364–1382 (2012). [CrossRef] [PubMed]

**26. **D. Wipf and S. Nagarajan, “A new view of automatic relevance determination,” Adv. Neural Inf. Process. Syst. **20**, 1625–1632 (2008).

**27. **M. Sato, T. Yoshioka, S. Kajihara, K. Toyama, N. Goda, K. Doya, and M. Kawato, “Hierarchical Bayesian estimation for MEG inverse problem,” NeuroImage **23**, 806–826 (2004). [CrossRef] [PubMed]

**28. **T. Aihara, Y. Takeda, K. Takeda, W. Yasuda, T. Sato, Y. Otaka, T. Hanakawa, M. Honda, M. Liu, M. Kawato, M. Sato, and R. Osu, “Cortical current source estimation from electroencephalography in combination with near-infrared spectroscopy as a hierarchical prior,” NeuroImage **59**, 4006–4021 (2012). [CrossRef]

**29. **A. C. Kak and M. Slaney, *Principles of Computerized Tomographic Imaging* (IEEE Press, New York, 1988).

**30. **M. A. O’Leary, “Imaging with diffuse photon density waves,” Ph.D. Thesis, Unversity of Pennsylvania (1996).

**31. **S. R. Arridge, “Optical tomography in medical imaging,” Inverse Probl. **15**, R41–R93 (1999). [CrossRef]

**32. **S. R. Arridge and J. C. Schotland, “Optical tomography: forward and inverse problems,” Inverse Probl. **25**, 123010 (2009). [CrossRef]

**33. **Q. Fang and D. A. Boas, “Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units,” Opt. Express **17**, 20178–20190 (2009). [CrossRef] [PubMed]

**34. **R. C. Haskell, L. O. Svaasand, T. T. Tsay, T. C. Feng, M. S. McAdams, and B. J. Tromberg, “Boundary conditions for the diffusion equation in radiative transfer,” J. Opt. Soc. Am. A **11**, 2727–2741 (1994). [CrossRef]

**35. **C. M. Bishop, *Pattern Recognition and Machine Learning* (Springer, New York, 2006).

**36. **H. Akaike, “Likelihood and the Bayes procedure,” in *Bayesian Statistics*, J. M. Bernardo, M. H. De Groot, D. V. Lindley, and A. F. M. Smith, eds. (Univ. Press, Valencia, 1980), 143–166.

**37. **J. Selb, A. M. Dale, and D. A. Boas, “Linear 3D reconstruction of time-domain diffuse optical imaging differential data: improved depth localization and lateral resolution,” Opt. Express **15**, 16400–16412 (2007). [CrossRef] [PubMed]

**38. **H. Dehghani, B. R. White, B. W. Zeff, A. Tizzard, and J. P. Culver, “Depth sensitivity and image reconstruction analysis of dense imaging arrays for mapping brain function with diffuse optical tomography,” Appl. Opt. **48**, D137–D143 (2009). [CrossRef] [PubMed]

**39. **A. C. Faul and M. E. Tipping, “Analysis of sparse Bayesian learning,” Adv. Neural Inf. Process. Syst. **14**, 383–389 (2002).

**40. **H. Attias, “Inferring parameters and structure of latent variable models by variational Bayes,” Proc. 15th Conf. on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 21–30 (1999).

**41. **M. Sato, “Online model selection based on the variational Bayes,” Neural Comput. **13**, 1649–1681 (2001). [CrossRef]

**42. **M. A. O’Leary, D. A. Boas, B. Chance, and A. G. Yodh, “Experimental images of heterogeneous turbid media by frequency-domain diffusing-photon tomography,” Opt. Lett. **20**, 426–428 (1995). [CrossRef]