## Abstract

Numerical performance of two gradient-based methods, a truncated-Newton method with trust region (TN) and a nonlinear conjugate gradient (NCG), is studied and compared for a given data set and conditions specific for the contrast enhanced optical tomography problem. Our results suggest that the relative performance of the two methods depends upon the error functions, specific to the problem to be solved. The TN outperforms the NCG when maps of fluorescence lifetime are reconstructed while both methods performed well when the absorption coefficient constitutes the parameter set that is to be recovered.

© Optical Society of America

## 1. Introduction

In this contribution, the relative performance of the truncated-Newton method with trust region (TN) and the nonlinear conjugate gradient (NCG) method is studied on inverse problems specific to contrast enhanced optical tomography. The objective of this exercise is to study the efficiency and the accuracy of these two methods for proper selection of an approach that will be suitable for optical image reconstruction. We refer to optical tomography problems as those problems that involve measurement of diffusely propagated light through turbid media and the use of the measurements on the boundary to recover an interior optical property map. The optical image reconstruction problem(or the inverse problem) with multiple source points on the boundary is formulated as an unconstrained optimization problem. Different numerical techniques such as the Born iterative method [1], the modified Newton-Raphson method with Levenberg-Marquardt regularization technique [2–6], the nonlinear conjugate gradient method [7–12] and the truncated Newton methods [13–15] have been applied for optical tomography problems as reviewed by Arridge [16]. Herein, TN and NCG are investigated for image reconstruction because they are known to be efficient and accurate for large dimensional problems; they use a low and predictable amount of storage, and require only the function and gradient values at each iteration without requiring any further information about the problem. The main objective of this paper is to compare the efficiency of two gradient-based methods, TN and NCG, with a given set of data and conditions specific for contrast-enhanced imaging.

The contrast enhanced optical imaging problem involves the determination of a
distribution of absorption coefficients and fluorescence lifetimes within the region,
from time-dependent light measurements at the surface of the object. Therefore the
variables in the inverse problems are the absorption coefficient,
*µ*_{a}
(*r̄*),
and lifetime, *τ*(*r̄*), owing
to a fluorescent agent within a given region.

In 1983, Dembo and Steihaug [17] proposed a truncated-Newton algorithm for large-scale problems. They noted that when far from a minimum, it may not be appropriate to solve the Newton equations accurately. They suggested solving the linear Newton equations by the linear conjugate gradient method and stopping the solution process when the updated solution satisfies an improvement test based on the norm of the residual vector. Within this scheme, an iteration of the Newton method is defined as a major (outer) iteration and an iteration of the linear conjugate gradient method is defined as a minor (inner) iteration. Dixon and Price [18] developed a truncated-Newton method that introduced a trust region based on a paper by Steihaugh [19] and a cone of acceptable direction.

Hestenes and Stiefel [20] proposed the linear conjugate gradient method for solving systems of linear equations while Fletcher and Reeves [21] first proposed the extension of the conjugate gradient method to solving general unconstrained optimization problems. Polak and Ribiere [22, 23] proposed yet another nonlinear conjugate gradient method, which Powell [24] argues to be preferable for nonquadratic functions.

The gradients for both TN and NCG are calculated by a technique that is developed from the ideas behind Reverse Automatic Differentiation [25] in order to calculate the gradients efficiently without excessive storage requirement. Readers interested in background literature and an analytical implementation method are referred to the references [13–15, 25–27].

In this contribution we compare the performances of TN and NCG in a simple system in order to comment on the relative merits of both approaches for optical tomography. The excitation and emission diffusion equations used for the forward problem and solved with the Galerkin finite element method are discussed in Section 2. Formulation of the inverse problem is given in Section 3. Descriptions of TN and NCG are provided in Section 4 and the method and approach are detailed in Section 5. Section 6 gives comparative results of these two methods for the case of contrast enhanced optical tomography. Finally Section 7 provides the mathematical arguments for why TN outperforms NCG when reconstructing maps of fluorescent lifetimes. We also describe why both methods perform comparably when reconstructingmaps of absorption owing to fluorophore. Our results suggest that the variations of the error functions may dictate the proper optimization scheme.

## 2. Governing equations for intensity-modulated excitation and emission light propagation

In tissues, the propagation of intensity-modulated excitation light and the generation and subsequent propagation of intensity-modulated fluorescent light can be described by the following coupled diffusion equations [13]:

Here, Φ
_{x}
and Φ
_{m}
are the AC components of the excitation and emission photon fluence
(photon/cm^{2} s) at position **r**, respectively;
*ω* is the angular modulation frequency
(*rad*/*s*); *c* is the speed of
light within the medium (cm/s); ${\mu}_{{a}_{\mathit{xi}}}$ is the absorption due to non-fluorescing chromophores
(*cm*
^{-1}); and ${\mu}_{{a}_{\mathit{xf}}}$ is the absorption due to fluorophores
(*cm*
^{-1}).
*D*
_{x,m} is the
optical diffusion coefficient equivalent to $1/3({\mu}_{{a}_{x,m}}+{\mu}_{{s}_{x,m}}^{\prime})$, where ${\mu}_{{s}_{x,m}}^{\prime}$ is isotropic scattering coefficient (cm^{-1}) and ${\mu}_{{a}_{x,m}}$ is the total absorption coefficient, at the respective
wavelengths. It is important to note that the diffusion equation applies when ${\mu}_{{a}_{x,m}}\ll {\mu}_{{s}_{x,m}}^{\prime}$. Consequently, in our formulation, we assume that ${D}_{x,m}\approx 1/3{\mu}_{{s}_{x,m}}^{\prime}$.

The right hand term of equation (2) describes the generation of fluorescence within
the medium. The term *ϕ* represents the quantum efficiency
of the fluorescence process, which is defined as the probability that an excited
fluorophore will decay radiatively, and *τ* is the
fluorophore lifetime (ns). Note that the source term requires coupling with the
solution of excitation fluence described by equation (1). For the purposes of this study, we assume that
the absorption coefficient at the excitation wavelength is only due to fluorophores,
i.e., ${\mu}_{{a}_{\mathit{xi}}}=0$.

The numerical solution for the excitation fluence distribution given in equation (1) is obtained using the Robin boundary condition given by Ishimaru [28]

where *S* is the complex flux density associated with a point source
of excitation light located at position **r**
_{s}
on *d*Ω and **n** denotes the unit
outward vector normal to the surface. The boundary condition for emission fluence is
identical to equation (3) with the exception that *S*=0 on
the surface.

## 2.1 Galerkin finite element formulation of excitation and emission equations

To apply the Galerkin finite element method for two-dimensional forward problems, the
solution domain Ω is divided into number of triangular elements in which
the variables Φ_{x,m}, ${\mu}_{{a}_{x,m}}$ and *τ* are assumed linear. The
formulation of the Galerkin finite element method of the diffusion equations is
described in the references [13–15, 29]. Local stiffness matrices are formed for each element and
then these elements are combined into a global stiffness matrix **K**, and
the solution is obtained by solving the complex equations:

## 2.2 Synthetic data sets for reconstruction

In order to test our inversion algorithms, we generated a data set of
Φ_{x,m} in a two
dimensional, square domain of 4×4 cm^{2}. For the forward
simulation, the mesh contained 2209 internal and 192 boundary nodes with 4608
triangular elements. Four point sources of excitation light intensity-modulated at
frequency *ω* were simulated midpoint on each side of the
domain with a total of 60 point detectors simulated equidistant along the domain
periphery, excluding positions occupied by one of the four sources and at each
corner of the domain. Predictions of
Φ_{x,m} were made at every
detector for each of the four sources, each providing intensitymodulation with unit
depth ofmodulation and zero phase. For this simulated measurement configuration
there were 4×59=236 simulated observations. The main objective of this
paper is to investigate the performance of TN and NCG methods using identical
conditions. Hence, the same number of sources and detectors were used for both these
methods. (For this focused study we choose not to optimize the number of
sources/detectors required for the inverse problem). The simulator was coded in
Fortran 77 and took 6 seconds to run on a SUN Ultrasparc 10 Workstation.

To mimic measurement error, zero-mean Gaussian noise corresponding to 0.1-degree standard deviation in phase and 1.0% standard error in amplitude was added to the synthetic data sets [13].

## 3. Inverse problemof recovering optical properties

For the purposes of comparing these methods, the absorption coefficient ${\mu}_{{a}_{\mathit{xf}}}$ and lifetime *τ* are to be determined
from synthetic fluorescence measurements. Let *l*=1,…,
*N*_{q}
denote the number of sources q on the
boundary and *j*=1,…, *N*_{B}
denote the number of detectors *B* on the boundary. The error
function for optimization is taken as:

The subscript *c* denotes the values calculated by the forward problem
and the subscript *me* denotes the experimentally measured values, or
in this case, the synthetic data sets. The superscript ∗ denotes the
complex conjugate of the complex function Φ
_{m}
. For simplicity, we will use *µ*_{a}
to
denote ${\mu}_{{a}_{\mathit{xf}}}$ in the remainder of the text.

## 4 Gradient based methods

#### 4.1 The truncated Newton method (TN)

Newton’s method is based on approximating the function
*E*((*µ*_{a}
)
_{k}
+**d**) by the quadratic model [13–15, 17–19]

where **g**
_{k}
and **G**
_{k}
are the gradient and Hessian of the error function *E*
at the *k*^{th}
iteration, respectively. The Newton
direction is obtained from an exact minimum of equation (6); i.e. the search direction **d** at
iteration *k* is defined by the equation

Dembo and Steihaug [17] proposed to solve equation (7) by the linear conjugate gradient method as it
generates a finite sequence of approximations, **d**
_{i}
1≤*i*≤*n*, where
*n* is the number of unknowns to the solution **d**.
A sequence of approximations of the solution is generated until a vector **d**
_{i}
is obtained for which the following condition on the relative residual
is satisfied.

where **r**
_{i}
=**G**
_{k}
**d**
_{i}
+**g**
_{k}
.

The linear conjugate gradient iteration is then truncated and **d**
_{i}
is used as the search direction for the minimisation. The relative
residual is used as a measure of accuracy of **d**
_{i}
and the required level of accuracy improves as the minimisation
proceeds and the minimum is approached.

Using the linear conjugate gradient method to solve equation (7), Dembo and Steihaug [17] noted that it was not necessary to compute the Hessian
**G**, as only the product **Gd** was required. The
product is calculated using the finite difference formula given below:

where

This technique avoids the calculation and storage of the Hessian but requires an
additional gradient evaluation for each minor iteration. As more inner
iterations are usually required with an increase in the outer iteration counter
*k*, the cost of each outer iteration increases.

## 4.2 Nonlinear conjugate gradient method (NCG)

Nonlinear conjugate gradient methods are well known in unconstrained optimization [22, 23]. A brief description is given below:

The basic scheme of NCG methods for minimising a differentiable function
*E*: ℜ
^{n}
→ℜ is to generate a sequence of iterates
(*µ*_{a}
)
_{k}
according to

where **d**
_{k}
is the search direction and *α*_{k}
is
the step length, which minimises *E* along **d**
_{k}
from the point (*µ*_{a}
)
_{k}
. For *k*=1, the search direction
**d**
_{1}=-∇̄*E*((*µ*_{a}
)_{1})=-**g**
_{1}
can be used, and for subsequent iterations, given **x**
_{k}
with **g**
_{k}
=∇̄*E*((*µ*_{a}
)
_{k}
)≠0 *k*≥1, we use

Many NCG methods differ in their choice for the multiplier
*β*_{k}
used to construct the search direction
given by equation (11). The Polak and Ribiere choice (PR) for
*β*_{k}
is given by

The Polak and Ribiere method is used in this analysis.

## 4.3 Line search for both TN and NCG

After computing the direction **d**
_{k}
, the Armijo line search [22, 30] is used in both TN and NCG methods to find an appropriate
step size, *α*_{k}
, which minimises the function
*E* along the line (*µ*_{a}
)
_{k}
+*α*_{k}
**d**
_{k}
. Gilbert and Nocedal [31] and Dembo and Steihaug [17] have shown that these methods are globally convergent if
the Wolfe conditions [32] that are given below are satisfied.

Condition 1

A search direction **d**
_{k}
is acceptable if

where *ε*
_{1} is a small positive constant, typical
*ε*
_{1}=0.001. This ensures that the angle
between **d** and the gradient vector is negative and bounded away from
zero.

Condition 2

where *ε*
_{2} is a constant, such that
0.5>*ε*
_{2}>0 (typically
(*ε*
_{2}=0.1). This condition ensures that the
step taken produces a non-trivial reduction of the objective function.

Condition 3

where *ε*
_{3} is a constant such that
0.5>*ε*
_{3}>0. This
condition requires that the rate of decrease of *E* in the direction
**d** at
(*µ*_{a}
)_{k+1}
must be larger than some prescribed fraction of the rate of decrease in the
direction **d** at (*µ*_{a}
)
_{k}
. This condition prevents the steps from being too small, relative to the
initial rate of decrease of *E*.

Condition 1 is satisfied in the TN and the NCG methods, whilst conditions 2 and 3 are
satisfied within the line search. Conditions 2 and 3 ensure that the step size,
α, is bounded away from zero. It does so by requiring the actual decrease
in *E* to be bounded away from the linear predicted reduction.

## 5. Approach and Methods

To investigate the accuracy and the efficiency (computing time) of TN and NCG, three
problems were considered as detailed in Tables 1 and 2. In these problems there are three heterogeneities (or
target regions, as illustrated in Figure 1) that have different optical properties than that of
their surrounding background region. Maps of absorption due to fluorophore,
*µ*_{a}
, and fluorescence lifetime,
*τ*, were reconstructed from synthetic fluorescence
measurements. We assume that all parameters other than the property to be mapped are
known and constant for both absorption and fluorescence lifetime reconstructions. In
problem 1, absorption imaging with up to tenfold increase in absorption within
heterogeneities was investigated. In problem 2, fluorescence lifetime imaging with
up to tenfold decrease in lifetimes within heterogeneities (fluorescence quenching)
was investigated. Problem 3 investigated fluorescence lifetime imaging with up to
tenfold increase in lifetime within heterogeneities. Thus we considered lifetime
imaging cases in which (i) the background possesses smaller fluorescent lifetime
than the heterogeneities and, (ii) the background possesses larger lifetime than the
heterogeneities.

To quantitatively evaluate the accuracy of TN and NCG methods, the following parameter for each reconstruction has calculated:

Sum of square
errors=$\sum _{i=1}^{n=289}$[(*µ*_{a}
)
_{actual}
-(*µ*_{a}
)
_{c}
]^{2} where c is the calculated value and there are 289 unknowns.

Similar expressions were used for lifetime. A coarse, 2-dimensional mesh with 225
internal nodes and 64 boundary nodes (total 289 nodes) was used for the inverse
problems while a total of 2401 nodes were used for the forward problems. The dual
mesh scheme was utilised for this analysis. A coarse mesh was used for the inverse
problem to reduce the computing time while a finer mesh was used for the forward
problem to calculate the fluence accurately. The same meshes were used for both TN
and NCG methods to compare their efficiency. (An optimal mesh size required for the
inverse and the forward problem using different methods was not conducted in this
study). For comparison of these two gradient-based methods, the synthetic data sets
described in Section 2.2 were used for reconstruction of absorption coefficients as
well as of lifetimes from fluorescence measurements. The final images reported
herein are convergence the results obtained when either the length of the gradient
‖**g**‖ was less than 10^{-8} (chosen by
trial and error) or a maximum of 2000 iterations was reached.

## 6. Results and Discussion

#### 6.1 Absorption coefficient imaging from synthetic fluorescence measurements

Figure 2 (a) depicts the actual spatial distribution of absorption used for generation of the synthetic data set described in Problem 1 of Table 1. The three heterogeneities or targets

possessed different absorption coefficients that were greater than the background. Figure 2(b) shows the reconstructed map of absorption from synthetic fluence values at 150MHz modulation frequency using the TN method while Figure 2(c) was obtained by the NCG method. These two methods reconstructed the three heterogeneities very well as shown in Figures 2b and 2c. The sum of square errors, resulting from the NCG method is slightly less than that of TN method as shown in Table 3. The time taken by TN method is 14031 seconds and the NCG took 14467 seconds. The quantitative error of the NCG method is slightly lower than that of the TN method when the initial guess was set to the background region as shown in Table 3. Both TN and NCG methods successfully recovered reconstructed maps similar to those depicted in Figures 2 (b)–(c) at various modulation frequencies.

## 6.2 luorescence imaging from synthetic fluorescence measurements with increased decay kinetics

Figure 3(a) depicts the actual lifetime map of Problem 2
listed in Table 1, in which the three heterogeneities with ten-fold
uptake of fluorescent dye exhibited a longer lifetime (τ_{h}=10,
8, and 5 ns) within a background that possessed a shorter lifetime
(τ_{b}=1 ns). Figure 3(b) shows the reconstructed map of lifetime from
synthetic fluence values at 150 MHz modulation frequency using the TN method while
the NCG method was unable to reconstruct the heterogeneities. Indeed, while the TN
method was capable of recovering lifetime maps at all modulation frequencies, the
NCG method could recover lifetimes when the modulation frequency,
*ω*, was greater than 250 MHz (data not shown for
brevity). When the optimisation problem was redefined to map the parameter
*ωτ* instead of
*τ*, recovery of lifetime maps was still not possible with
the NCG method at *ω*<250 MHz.

It is noteworthy that using the Levenberg-Marquardt approaches, Paithankar [2] and Jiang [33] report reconstruction involving fluorophore quenching only and were unable to successfully reconstruct heterogeneities with longer lifetimes within a background of short lifetime. The sum of square errors given by the TN method is given in Table 4, and the reconstruction took 10859 seconds.

As will be discussed below in Section 7, it appears that the NCG method is unable to
reconstruct lifetime maps when the variation of the error function
*E* is small. As discussed in Section 7, the TN method gives the good
results due to the fact that this method uses the gradients as well as the Hessian
while the NCG method uses only the gradients.

## 6.3 Fluorescence lifetime imaging from synthetic fluorescence measurements with quenching

Figure 4(a) depicts the actual lifetime map of Problem 3
listed in Table 1 in which three heterogeneities with ten-fold uptake
of fluorescent dye exhibited a shortening of lifetime (τ_{h}=8,
5, and 1 ns) within a background which possessed a longer lifetime
(τ_{b}=10 ns). Figure 4(b) shows the reconstructed map of lifetime from
synthetic fluence values at modulation frequency of 150 MHz using the TN method
while Figure 4(c) shows the lifetime map obtained from the NCG
method. Reconstruction using the NCG method resulted in only one of three
heterogeneities distinguished from the background while reconstruction using the TN
method resulted in all three heterogeneities distinguished from the background. The
sum of square errors given by the TN method is shown in Table 5. Time taken by the TN method was 9417 seconds. As in
problem 2, the TN method was capable of recovering lifetime maps at all modulation
frequencies while the NCG method could recover lifetimes when the modulation
frequency, *ω*, was greater than 250MHz (data not shown
for brevity). When the optimization was redefined tomap parameter
*ωτ* instead of
*τ*, the frequency dependence on TN and NCG performance
remain unchanged.

## 7 Relative performance of TN and NCG

In order to explore the performance of TN and NCG for reconstruction of lifetime and absorption coefficient maps due to fluorophore, we examined the error function and the adherence of Wolfe’s three criteria for both TN and NCG methods for each of the three problems. We have analysed TN and NCG using (i) sensitivity analysis to determine error function dependence at different modulation frequencies, and (ii) Wolfe’s criteria specific to the formulation of TN and NCG for the three problems investigated.

## 7.1 Sensitivity Analysis

A sensitivity analysis was performed by varying the values of one parameter
(absorption coefficient or lifetime) in the target or heterogeneities while all the
other parameters were kept constant. The objective of this exercise is to observe
the variation of error function *E* for a small change in the
parameter under study. Two cases were considered: (i) influence of the absorption
coefficients on the target regions, and (ii) influence of the lifetimes on the
target regions.

## (a) Sensitivity of error function to absorption coefficient

The synthetic data at modulation frequency of 150 MHz, Φ
_{m}
, were calculated using a absorption coefficient of 0.02
*cm*
^{-1} throughout the square except on the three
target regions, as shown in Figure 1, where the absorption coefficient was 0.2
*cm*
^{-1}. Each heterogeneity consisted of four nodal
points. The calculated values, Φ
_{c}
, was obtained using different absorption coefficients on the target
regions that varied from 0.02 to 0.2 cm^{-1} while all the remaining
parameters were kept constant. The variation of the error function at modulation
frequency of 150 MHz due to variation of the absorption coefficients is shown in Figure 5. The error function dependence of absorption varied
little with modulation frequency.

## (b) Sensitivity of error function to lifetime

The synthetic Φ_{m} data at modulation frequencies of 15, 150, and 500 MHz, were obtained using a
lifetime of 10 ns throughout the square except on the three target regions, as shown
in Figure 1, where the lifetime was 1 ns. The calculated values,
Φ_{c}, were obtained using different lifetimes on the target
region which varied between 10 to 1 ns while all other parameters remained constant.
The influence of modulation frequency upon the sensitivity of the error function to
lifetime is illustrated in Figure 6. Similar sensitivities were obtained when the
lifetime of the background was 1 ns and of the target regions was 10 ns. While the
TN method successfully recovers absorption and lifetime maps at all modulation
frequencies, the NCG method fails to recover lifetime maps at modulation frequencies
less than 250MHz, where the variation of the error function with target lifetime is
small. Regardless of whether the reconstruction involved lifetime or the product of
lifetime and modulation frequency, the variation of the error function with lifetime
is small when *ω*<250 MHz. Since the NCG method
uses only the gradient of the error function while the TN method employs both the
gradient and the Hessian of the error function, it may not be surprising that TN
method performance is independent of modulation frequency.

## 7.2 Mathematical formulation

In addition to our evaluation of the gradients of the error function, we investigate the criteria of Wolfe’s condition for both TN and NCG methods. We start our evaluation by stating that the nonlinear conjugate gradient method is based upon the following assumption [34]:

Assumption 1: If *E*: ℜ
^{n}
→ℜ is twice continuously differentiable and if there
exists 0<m<M<∞ such that, for all x, d∊ℜ
^{n}
,

It should be noted the error function *E*(·) is either a
function of the absorption coefficients,
*µ*_{a}
(*r̄*), or
the lifetimes, *τ*(*r̄*).
However x represents spatial distribution of either the absorption coefficients or
the lifetimes.

Polak [34] proved the following theorem for the conjugate gradient method:

Theorem1 Suppose that Assumption above is satisfied and that ${\left\{{\mathbf{x}}_{k}\right\}}_{k=0}^{\infty}$ is a sequence constructed by the Polak-Ribiere Algorithm in Section 4.2 solving equation (5). Then,

(a) for all *k*∊N

and

(b) the sequence ${\left\{{\mathbf{x}}_{k}\right\}}_{k=0}^{\infty}$ converges to x̂, which is the unique minimizer of the
error function *E*(·).

This theorem assumes the strict convexity of the error function
*E*(·).

To investigate the performance of the NCG and the TN methods the following parameters were calculated for the problems 1, 2, and 3:

(a) the curvature direction **d**
^{T}
**G**(**x**)**d**

(b) the length of the gradient vector
‖∇*E*(**x**)‖

(c) the Newton direction calculated by the truncated Newton method
‖**d**‖.

Table 6 gives typical values of the curvature directions for the problems 1, 2, and 3 in the first three iterations.

In problem 1, the curvature direction **d**
^{T}
**Gd** is decreasing and positive, indicatry that a solution
should exist. Since the function decreases along the direction **d** and
the curvature direction is positive (and hence the Hessian is positive definite),
the error function in problem 1 must also be convex. Thus the assumption of the NCG
method applies as given in Equation (16). It is not surprising that, the NCG method as
well as the TN method enable recovery of absorption.

As in problem 1, the curvature directions of problems 2 and 3 are decreasing and positive, but these values are very large. The lengths of the gradient vectors of each problem are given in Table 7. These gradient vectors are used for both TN and NCG optimization methods.

The gradient vectors of problem 1 are decreasing and small compared to problems 2 and
3. As described in Section 6.1, the NCG method successfully recovers a map of
absorption for solution to problem 1. The gradient vectors {‖g
_{k}
‖*k*=1,2,3,…} of problems 2 and 3 are
decreasing, bounded away from zero, but are large. The values of
*α* calculated by the Armijo line search for both
methods are given in Tables 8 and 9, respectively. The values of *α*
used in the NCG method are zeroes in problem 2 at 150 MHz.

We have from equation (10)

If dk is a descent direction, then
*E*(*τ*_{k}
+*α*
**d**
_{k}
)<*E*(*τ*_{k}
),
for small positive values of *α*. For this property we
assume that the step length satisfies *α*>0 (as
shown in Table 9). If the lifetimes
(*τ*_{k}
) are greater than zero, then some of the
Newton directions **d**
_{k} should be negative. Since the
directions of all the gradient vectors of problem 2 are negative at modulation
frequencies less than 250 MHz (data at 150 and 500 MHz are listed in Table 10), problem 2 does not satisfy equation (18) or (10). It should be noted that the Newton direction is accepted
as a search direction if it satisfies the Wolfe condition 1 to ensure that it has a
significant component in the negative gradient direction.

The assumptions on Newton search direction **d**
_{k}
are that (i) sufficient descent must be produced, and (ii) the search
direction is gradient related. Two assumptions for the step length are that (i) it
must result in sufficient reduction in the function *E*, and (ii) it
must not be too small. *Wolfe conditions 2 and 3* (section 4.3)
satisfy these assumptions associated with convergence. Let us investigate whether
problem 2 interrogated at modulation frequencies of 200 and 500MHz satisfies the
three *Wolfe conditions* (section 4.3) using different values of step
length *α*. The step length is restricted to values
between zero and one.

First we test if *α*=1 satisfies the *Wolfe
conditions*. If it does not, then we try *α*
with a very small number taken for instance from Table 9 as
*α*=0.39×10^{-18}. We take the
typical parameters, *ε*
_{1}=0.001 and
*ε*
_{2}=*ε*
_{3}=0.1
for this analysis. Table 11 presents values of
*E*(*τ*),
*E*(*τ*+*α*
**d**),
and
*E*(*τ*+2*α*
**d**)
for the minimum and maximum values of step size and *τ*
equivalent to 1 ns. These error values are computed to assess the validity of Wolfe
criteria 1, 2, and 3. Evaluation was performed at 200 MHz where the NCG failed to
provide *τ* recovery, and at 500 MHz, where NCG recovery
was successful. When the step size is maximum (i.e.
*α*=1), problem 2 at 200 and 500 MHz modulation
frequencies does not satisfyWolfe condition 2 because the reduction of error
function is much higher than **d**
^{T}
**g** (even when the parameter
*ε*
_{2}=0.5.). When the step size is minimum,
taken to be *α*=0.39×10^{-18}, problem
2 at 200 MHz does not satisfy Wolfe condition 3, i.e. that the rate of decrease of
error function *E* in the direction **d** at
(*τ*)_{k+1} is not
larger than some prescribed fraction of the rate of decrease in the direction
**d** at (*τ*)
_{k}
. Whether the reconstruction parameter is
*ωτ* or *τ*, Wolfe
criteria 2 and 3 for the NCG method are not met.

Another reason for NCG failure pointed out in Section 7.1 may be that the variation
of error function *E* is smaller at 200 MHz than that at 500 MHz
(c.f. Table 11). Problem 2 does not converge at 200 MHz due to
small variations of the error function, but does converge when the modulation
frequency is greater than 200 MHz as discussed in Section 6.3. Powell and others
have shown NCG may not converge for non-convex optimization problems and will fail
due to round off for ill-conditioned problems [35].

The TN algorithm is a trust region method, and for each iteration, it computes the
Newton direction vectors {‖**d**
_{k}
‖*k*=1, 2, 3,…} using the linear
conjugate gradient method to approximately minimize the quadratic model of the error
function given by equation (6) within a trust region. That is, the linear
conjugate method calculates the Newton direction by solving the equation (7) with the gradient on the right hand side. Whenever
the linear conjugate gradient method detects a nonascent negative curvature
direction (a direction **d**
_{k}
such that g(**x**
_{k})^{T}
**d**
_{k}<0 and ${{\mathbf{d}}_{k}}^{T}$
**G**(**x**
_{k}
)**d**
_{k}
<0), the minimization of the model is stopped and the new point
is obtained by performing the trust region method along this direction (for detailed
discussion see reference [18]). Once the Newton direction vectors are calculated, the
unknown vectors are updated using the equation (10). The Newton directions calculated by TN for the
specific problems described herein are given in Table 12.

These values (Table 12) are the same order of magnitude as the lifetimes in problems 2 and 3. It is also important to note that when the Newton directions are calculated by the TN method as contained in Table 12, all Wolfe criteria are satisfied whereas Table 11 shows that this is not the case for NCG method. Finally, TN has the added advantage that it performs well when the error function is nonconvex [36].

## 8 Conclusions

It is seen that the NCG does not perform well for certain optical tomography scenarios. The NCG method gives slightly better images than those of the TN method when the absorption coefficients are the unknown quantities. The NCG method fails when fluorescent lifetimes are the unknown quantities and the modulation frequency is less than 250 MHz while the TN method reconstructed good images at these modulation frequencies. Since the variations of the error function are small at 200 MHz, the Wolfe conditions were not satisfied by the NCG method. When the modulation frequency is higher than 250 MHz the NCG method gives images similar to TN method. We suggest that similar careful consideration of the error function be considered when choosing an optimisation method for optical tomography. Since the TN method uses the gradients as well as the Hessian, this method out performs the NCG method that uses only the gradients when maps of fluorescence lifetime are being reconstructed.

## Acknowledgements

This work was supported by the National Institutes of Health (R01 CA67167). We thank the members of the Photon Migration Laboratory as well as the constructive comments by the anonymous reviewers who helped to enhance our contribution.

## References

**1. **Y. Q. Yao, Y. Wang, Y. L. Pei, W. W. Zhu, and R. L. Barbour, “Frequency-domain optical imaging of absorption and scattering distributions by Born iterative method,” J. Opt. Soc. Am A , **14**, .325–342, (1997). [CrossRef]

**2. **D. Y. Paithankar, A. U. Chen, B. W. Pogue, M. S. Patterson, and E. M. Sevick-Muraca, “Imaging of fluorescent yield and lifetime from multiply scattered light re-emitted from tissues and other random media,” Appl. Opt , **36**, 2260–2272, (1997). [CrossRef] [PubMed]

**3. **K. D. Paulsen and H. Jiang, “Enhanced frequency domain optical image reconstruction in tissues through total variation minimization,” Appl. Opt. **35**, 3447–3458, (1996). [CrossRef] [PubMed]

**4. **H. B. Jiang, K. D. Paulsen, U. L. Osterberg, B. W. Pogue, and M. S. Patterson, “Optical image reconstruction using frequency domain data: Simulations and experiments,” J. Opt. Soc. Am. A **13**, 253–266, (1996). [CrossRef]

**5. **M. A. O’Leary, D. A. Boas, B. Chance, and A. G. Yodh, “Experimental images of heterogeneous turbid media by frequency-domain diffusion-photon tomography,” Opt. Lett. **20**, 426–428, (1995). [CrossRef]

**6. **S. R. Arridge, M. Schweiger, and D. T. Delpy, “Iterative reconstruction of near-infrared absorption images,” in *Inverse Problems in Scattering and Imaging*, M. A. Fiddy, ed., Proc. SPIE1867, 372–383, (1992).

**7. **R. Roy, “Image reconstruction from light measurement on biological tissue,’ Ph. D thesis, Hatfield, England, (1996).

**8. **S. R. Arridge and M. Schweiger, “A gradient-based optimization scheme for optical tomography”, Opt. Express ; **2**, 213–226, (1998) http://www.opticsexpress.org/oearchive/source/4014.htm [CrossRef] [PubMed]

**9. **A. H. Hielscher, A. D Klose, and K. M. Hanson, “Gradient-based iterative image reconstruction scheme for time-resolved optical tomography,” IEEE Trans Med. Imag , **18**, 262–271, (1999). [CrossRef]

**10. **A. D. Klose and A.H. Hielscher, “Iterative reconstruction scheme for optical tomography based on the equation of radiative transfer,” Med. Phy. **26**, 1698–1707, (1999). [CrossRef]

**11. **M. V. Klibanov, T. R. Lucas, and R. M. Frank, “A fast and accurate imaging algorithm in optical diffusion tomography,” Inverse Problems , **13**, 1341–1361, (1997). [CrossRef]

**12. **S. S. Saquib, K. M. Hanson, and G. S. Cunningham, “Model-based image reconstruction from time-resolved diffusion data,” in Medical Imaging: Image Processing, Proc. of the SPIE-The International Society for Optical Engineering , **3034**, 369–380, (1997).

**13. **R. Roy and E. M. Sevick-Muraca, “Truncated Newton’s optimization scheme for absorption and fluorescence optical tomography: Part I theory and formulation,” Opt. Express , **4**, 353–371, (1999). http://www.opticsexpress.org/oearchive/source/9268.htm [CrossRef] [PubMed]

**14. **R. Roy and E. M. Sevick-Muraca, “Truncated Newton’s optimization scheme for absorption and fluorescence optical tomography: Part II Reconstruction from synthetic measurements,” Opt. Express , **4**, 372–382, (1999). http://www.opticsexpress.org/oearchive/source/10447.htm [CrossRef] [PubMed]

**15. **R. Roy and E. M. Sevick-Muraca, “Active constrained truncated Newton method for simple-bound optical tomography,” J. Opt. Soc. Am. A **17**, 1627–1641, (2000). [CrossRef]

**16. **S. R. Arridge, “Optical tomography in medical imaging,” Inverse Problems , **15**, R41–R93, (1999). [CrossRef]

**17. **R. S. Dembo and T. Steihaug, “Truncated Newton algorithms for large-scale unconstrained optimisation,” Math. Programming , **26**, 190–212, (1983). [CrossRef]

**18. **L. C. W. Dixon and R. C. Price, “Numerical experience with the truncated Newton method for unconstrained optimization,” J. optimization theory and applications. **56**, 245–255, (1988). [CrossRef]

**19. **T. Steihaug, “The conjugate gradient method and trust region in large scale optimisation,” SIAM J. Numerical analysis , **20**, 626–637, (1983). [CrossRef]

**20. **M. R. Hestenes and E. Stiefel, ‘Methods of conjugate gradients for solving linear systems,” J. Res. Nat. Bur. Stand. **49**, 409–436 (1952). [CrossRef]

**21. **R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients,” Computer J. , **7**, 149–154 (1964). [CrossRef]

**22. **S. Bazaraa, H. D. Sherali, and C. M. Shetty, “Nonlinear programming theory and algorithms,” John Wiley & Sons Inc, New York, (1993).

**23. **R. Fletcher, “Practical methods of optimisation,” Second edition, John Wiley & Sons, Chichester, (1996).

**24. **M. J. D. Powell, “Restart procedures for the conjugate gradient method,” Math. Programming , **12**, 241–254, (1977). [CrossRef]

**25. **A. Griewank, “On automatic differentiation,” edited M. Iri and K. Tanaka, *Mathematical programming: Recent developments and application*, Kluwer Academic Publishers, 83–108, (1989).

**26. **B. Christianson, A. J. Davies, L. C. W. Dixon, R. Roy, and P. van der Zee, “Giving reverse differentiation a helping hand,” Opt. Meth. And Software , **8**, 53–67, (1997). [CrossRef]

**27. **A. J. Davies, B. Christianson, L. C. W. Dixon, R. Roy, and P. van der Zee, “Reverse differentiation and the inverse diffusion problem,” Adv. In Eng. Software , **28**, 217–221, (1997). [CrossRef]

**28. **A. Ishimaru, *Wave propagation and scattering in random media*, Academic Press, New York, (1978).

**29. **O. C. Zienkiewcz and R. L. Taylor, *The finite element methods in engineering science*, McGraw-Hill, New York, (1989).

**30. **L. Armijo, “Minimization of functions having Lipschitz continuous first partial derivatives,” Pacific J. Mathematics , **16**, 1–3, (1966).

**31. **J. C. Gilbert and J Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” Report no. 1268, INRIA, (1990)

**32. **P. Wolfe, “Convergence condition for ascent methods,” SIAM Rev , **11**, 226–253, (1969). [CrossRef]

**33. **H. Jiang, “Frequency-domain fluorescent diffusion tomography: a finite-element-based algorithm and simulations,” Appl. Opt. **37**, 5337–5343, (1998). [CrossRef]

**34. **E. Polak, *Optimization, algorithms and consistent approximation*, Springer-Verlag, New York, (1997)

**35. **M. J. D. Powell, “Nonconvex minimization calculations and the conjugate gradient methods,” Lecture Notes in *Math. 1066*,Spriger-Verlag, New York, 122–141, (1984).

**36. **S. G. Nash, “A survey of truncated-Newton methods,” J. of Computational and Applied Math. **124**, 45–59, (2000). [CrossRef]