A numerical study of gradient-based nonlinear optimization methods for contrast enhanced optical tomography

Ranadhir Roy; Eva M. Sevick-Muraca

doi:10.1364/OE.9.000049

1. Introduction

In this contribution, the relative performance of the truncated-Newton method with trust region (TN) and the nonlinear conjugate gradient (NCG) method is studied on inverse problems specific to contrast enhanced optical tomography. The objective of this exercise is to study the efficiency and the accuracy of these two methods for proper selection of an approach that will be suitable for optical image reconstruction. We refer to optical tomography problems as those problems that involve measurement of diffusely propagated light through turbid media and the use of the measurements on the boundary to recover an interior optical property map. The optical image reconstruction problem(or the inverse problem) with multiple source points on the boundary is formulated as an unconstrained optimization problem. Different numerical techniques such as the Born iterative method [1], the modified Newton-Raphson method with Levenberg-Marquardt regularization technique [2–6], the nonlinear conjugate gradient method [7–12] and the truncated Newton methods [13–15] have been applied for optical tomography problems as reviewed by Arridge [16]. Herein, TN and NCG are investigated for image reconstruction because they are known to be efficient and accurate for large dimensional problems; they use a low and predictable amount of storage, and require only the function and gradient values at each iteration without requiring any further information about the problem. The main objective of this paper is to compare the efficiency of two gradient-based methods, TN and NCG, with a given set of data and conditions specific for contrast-enhanced imaging.

The contrast enhanced optical imaging problem involves the determination of a distribution of absorption coefficients and fluorescence lifetimes within the region, from time-dependent light measurements at the surface of the object. Therefore the variables in the inverse problems are the absorption coefficient, µ_a (r̄), and lifetime, τ(r̄), owing to a fluorescent agent within a given region.

In 1983, Dembo and Steihaug [17] proposed a truncated-Newton algorithm for large-scale problems. They noted that when far from a minimum, it may not be appropriate to solve the Newton equations accurately. They suggested solving the linear Newton equations by the linear conjugate gradient method and stopping the solution process when the updated solution satisfies an improvement test based on the norm of the residual vector. Within this scheme, an iteration of the Newton method is defined as a major (outer) iteration and an iteration of the linear conjugate gradient method is defined as a minor (inner) iteration. Dixon and Price [18] developed a truncated-Newton method that introduced a trust region based on a paper by Steihaugh [19] and a cone of acceptable direction.

Hestenes and Stiefel [20] proposed the linear conjugate gradient method for solving systems of linear equations while Fletcher and Reeves [21] first proposed the extension of the conjugate gradient method to solving general unconstrained optimization problems. Polak and Ribiere [22, 23] proposed yet another nonlinear conjugate gradient method, which Powell [24] argues to be preferable for nonquadratic functions.

The gradients for both TN and NCG are calculated by a technique that is developed from the ideas behind Reverse Automatic Differentiation [25] in order to calculate the gradients efficiently without excessive storage requirement. Readers interested in background literature and an analytical implementation method are referred to the references [13–15, 25–27].

In this contribution we compare the performances of TN and NCG in a simple system in order to comment on the relative merits of both approaches for optical tomography. The excitation and emission diffusion equations used for the forward problem and solved with the Galerkin finite element method are discussed in Section 2. Formulation of the inverse problem is given in Section 3. Descriptions of TN and NCG are provided in Section 4 and the method and approach are detailed in Section 5. Section 6 gives comparative results of these two methods for the case of contrast enhanced optical tomography. Finally Section 7 provides the mathematical arguments for why TN outperforms NCG when reconstructing maps of fluorescent lifetimes. We also describe why both methods perform comparably when reconstructingmaps of absorption owing to fluorophore. Our results suggest that the variations of the error functions may dictate the proper optimization scheme.

2. Governing equations for intensity-modulated excitation and emission light propagation

In tissues, the propagation of intensity-modulated excitation light and the generation and subsequent propagation of intensity-modulated fluorescent light can be described by the following coupled diffusion equations [13]:

- \nabla \cdot [D_{x} (\vec{r}) \nabla Φ_{x} (\vec{r}, ω)] + [\frac{iω}{c} + μ_{a_{xi}} (\vec{r}) + μ_{a_{xf}} (\vec{r})] Φ_{x} (\vec{r}, ω) = 0 on Ω

- \nabla \cdot [D_{m} (\vec{r}) \nabla Φ_{m} (\vec{r}, ω)] + [\frac{iω}{c} + μ_{a_{m}} (\vec{r})] Φ_{m} (\vec{r}, ω) = {ϕμ}_{a_{xf}} \frac{1}{1 - iωτ} Φ_{x} (\vec{r}, ω) on Ω

Here, Φ _x and Φ _m are the AC components of the excitation and emission photon fluence (photon/cm² s) at position r, respectively; ω is the angular modulation frequency (rad/s); c is the speed of light within the medium (cm/s); $μ_{a_{xi}}$ is the absorption due to non-fluorescing chromophores (cm ^-1); and $μ_{a_{xf}}$ is the absorption due to fluorophores (cm ^-1). D _x,m is the optical diffusion coefficient equivalent to $1 / 3 (μ_{a_{x, m}} + μ_{s_{x, m}}^{'})$ , where $μ_{s_{x, m}}^{'}$ is isotropic scattering coefficient (cm^-1) and $μ_{a_{x, m}}$ is the total absorption coefficient, at the respective wavelengths. It is important to note that the diffusion equation applies when $μ_{a_{x, m}} ≪ μ_{s_{x, m}}^{'}$ . Consequently, in our formulation, we assume that $D_{x, m} \approx 1 / 3 μ_{s_{x, m}}^{'}$ .

The right hand term of equation (2) describes the generation of fluorescence within the medium. The term ϕ represents the quantum efficiency of the fluorescence process, which is defined as the probability that an excited fluorophore will decay radiatively, and τ is the fluorophore lifetime (ns). Note that the source term requires coupling with the solution of excitation fluence described by equation (1). For the purposes of this study, we assume that the absorption coefficient at the excitation wavelength is only due to fluorophores, i.e., $μ_{a_{xi}} = 0$ .

The numerical solution for the excitation fluence distribution given in equation (1) is obtained using the Robin boundary condition given by Ishimaru [28]

2 D_{x} \frac{{\partial Φ}_{x}}{\partial n} + Φ_{x} + Sδ (\bar{r}, {\bar{r}}_{s}) = 0 on d Ω

where S is the complex flux density associated with a point source of excitation light located at position r _s on dΩ and n denotes the unit outward vector normal to the surface. The boundary condition for emission fluence is identical to equation (3) with the exception that S=0 on the surface.

2.1 Galerkin finite element formulation of excitation and emission equations

To apply the Galerkin finite element method for two-dimensional forward problems, the solution domain Ω is divided into number of triangular elements in which the variables Φ_x,m, $μ_{a_{x, m}}$ and τ are assumed linear. The formulation of the Galerkin finite element method of the diffusion equations is described in the references [13–15, 29]. Local stiffness matrices are formed for each element and then these elements are combined into a global stiffness matrix K, and the solution is obtained by solving the complex equations:

K {\bar{Φ}}_{x, m} = b

2.2 Synthetic data sets for reconstruction

In order to test our inversion algorithms, we generated a data set of Φ_x,m in a two dimensional, square domain of 4×4 cm². For the forward simulation, the mesh contained 2209 internal and 192 boundary nodes with 4608 triangular elements. Four point sources of excitation light intensity-modulated at frequency ω were simulated midpoint on each side of the domain with a total of 60 point detectors simulated equidistant along the domain periphery, excluding positions occupied by one of the four sources and at each corner of the domain. Predictions of Φ_x,m were made at every detector for each of the four sources, each providing intensitymodulation with unit depth ofmodulation and zero phase. For this simulated measurement configuration there were 4×59=236 simulated observations. The main objective of this paper is to investigate the performance of TN and NCG methods using identical conditions. Hence, the same number of sources and detectors were used for both these methods. (For this focused study we choose not to optimize the number of sources/detectors required for the inverse problem). The simulator was coded in Fortran 77 and took 6 seconds to run on a SUN Ultrasparc 10 Workstation.

To mimic measurement error, zero-mean Gaussian noise corresponding to 0.1-degree standard deviation in phase and 1.0% standard error in amplitude was added to the synthetic data sets [13].

3. Inverse problemof recovering optical properties

For the purposes of comparing these methods, the absorption coefficient $μ_{a_{xf}}$ and lifetime τ are to be determined from synthetic fluorescence measurements. Let l=1,…, N_q denote the number of sources q on the boundary and j=1,…, N_B denote the number of detectors B on the boundary. The error function for optimization is taken as:

E ({\bar{μ}}_{a_{xf}}, \bar{τ}) = \frac{1}{2} \sum_{1 = 1}^{N_{q}} \sum_{\underset{j \neq 1}{j = 1}}^{N_{B}} {(\frac{{(Φ_{m})}_{c} - {(Φ_{m})}_{me}}{{(Φ_{m})}_{me}})}_{i, j} {(\frac{{(Φ_{m}^{*})}_{c} - {(Φ_{m}^{*})}_{me}}{{(Φ_{m}^{*})}_{me}})}_{i, j}

The subscript c denotes the values calculated by the forward problem and the subscript me denotes the experimentally measured values, or in this case, the synthetic data sets. The superscript ∗ denotes the complex conjugate of the complex function Φ _m . For simplicity, we will use µ_a to denote $μ_{a_{xf}}$ in the remainder of the text.

4 Gradient based methods

4.1 The truncated Newton method (TN)

Newton’s method is based on approximating the function E((µ_a ) _k +d) by the quadratic model [13–15, 17–19]

E ({(μ_{a})}_{k} + d) = E ({(μ_{a})}_{k}) + g_{k}^{T} d + \frac{1}{2} d^{T} G_{k} d

where g _k and G _k are the gradient and Hessian of the error function E at the k^th iteration, respectively. The Newton direction is obtained from an exact minimum of equation (6); i.e. the search direction d at iteration k is defined by the equation

G_{k} d = - g_{k}

Dembo and Steihaug [17] proposed to solve equation (7) by the linear conjugate gradient method as it generates a finite sequence of approximations, d _i 1≤i≤n, where n is the number of unknowns to the solution d. A sequence of approximations of the solution is generated until a vector d _i is obtained for which the following condition on the relative residual is satisfied.

\frac{∥ r_{i} ∥}{∥ g_{k} ∥} \leq min (\frac{1}{k}, ∥ g_{k} ∥)

where r _i =G _k d _i +g _k .

The linear conjugate gradient iteration is then truncated and d _i is used as the search direction for the minimisation. The relative residual is used as a measure of accuracy of d _i and the required level of accuracy improves as the minimisation proceeds and the minimum is approached.

Using the linear conjugate gradient method to solve equation (7), Dembo and Steihaug [17] noted that it was not necessary to compute the Hessian G, as only the product Gd was required. The product is calculated using the finite difference formula given below:

G (μ_{a}) d = \frac{1}{σ} [g (μ_{a} + σ d) - g (μ_{a})]

where

σ = \frac{\sqrt{machineprecision}}{‖ d ‖}

This technique avoids the calculation and storage of the Hessian but requires an additional gradient evaluation for each minor iteration. As more inner iterations are usually required with an increase in the outer iteration counter k, the cost of each outer iteration increases.

4.2 Nonlinear conjugate gradient method (NCG)

Nonlinear conjugate gradient methods are well known in unconstrained optimization [22, 23]. A brief description is given below:

The basic scheme of NCG methods for minimising a differentiable function E: ℜ ⁿ →ℜ is to generate a sequence of iterates (µ_a ) _k according to

{(μ_{a})}_{k + 1} = {(μ_{a})}_{k} + α_{k} d_{k}, {(μ_{a})}_{k + 1} > 0

where d _k is the search direction and α_k is the step length, which minimises E along d _k from the point (µ_a ) _k . For k=1, the search direction d ₁=-∇̄E((µ_a )₁)=-g ₁ can be used, and for subsequent iterations, given x _k with g _k =∇̄E((µ_a ) _k )≠0 k≥1, we use

d_{k} = - g_{k} + β_{k} d_{k - 1}

Many NCG methods differ in their choice for the multiplier β_k used to construct the search direction given by equation (11). The Polak and Ribiere choice (PR) for β_k is given by

β_{k} = \frac{g_{k}^{T} (g_{k} - g_{k - 1})}{{∥ g_{k - 1} ∥}^{2}}

The Polak and Ribiere method is used in this analysis.

4.3 Line search for both TN and NCG

After computing the direction d _k , the Armijo line search [22, 30] is used in both TN and NCG methods to find an appropriate step size, α_k , which minimises the function E along the line (µ_a ) _k +α_k d _k . Gilbert and Nocedal [31] and Dembo and Steihaug [17] have shown that these methods are globally convergent if the Wolfe conditions [32] that are given below are satisfied.

Condition 1

A search direction d _k is acceptable if

- d_{k}^{T} g_{k} \geq ε_{1} ∥ d_{k} ∥ ∥ g_{k} ∥ > 0

where ε ₁ is a small positive constant, typical ε ₁=0.001. This ensures that the angle between d and the gradient vector is negative and bounded away from zero.

Condition 2

E (μ_{a} + α d) - E (μ_{a}) \leq ε_{2} α d^{T} g

where ε ₂ is a constant, such that 0.5>ε ₂>0 (typically (ε ₂=0.1). This condition ensures that the step taken produces a non-trivial reduction of the objective function.

Condition 3

E (μ_{a} + 2 α d) - E (μ_{a}) > 2 ε_{3} α d^{T} g

where ε ₃ is a constant such that 0.5>ε ₃>0. This condition requires that the rate of decrease of E in the direction d at (µ_a )_k+1 must be larger than some prescribed fraction of the rate of decrease in the direction d at (µ_a ) _k . This condition prevents the steps from being too small, relative to the initial rate of decrease of E.

Condition 1 is satisfied in the TN and the NCG methods, whilst conditions 2 and 3 are satisfied within the line search. Conditions 2 and 3 ensure that the step size, α, is bounded away from zero. It does so by requiring the actual decrease in E to be bounded away from the linear predicted reduction.

5. Approach and Methods

To investigate the accuracy and the efficiency (computing time) of TN and NCG, three problems were considered as detailed in Tables 1 and 2. In these problems there are three heterogeneities (or target regions, as illustrated in Figure 1) that have different optical properties than that of their surrounding background region. Maps of absorption due to fluorophore, µ_a , and fluorescence lifetime, τ, were reconstructed from synthetic fluorescence measurements. We assume that all parameters other than the property to be mapped are known and constant for both absorption and fluorescence lifetime reconstructions. In problem 1, absorption imaging with up to tenfold increase in absorption within heterogeneities was investigated. In problem 2, fluorescence lifetime imaging with up to tenfold decrease in lifetimes within heterogeneities (fluorescence quenching) was investigated. Problem 3 investigated fluorescence lifetime imaging with up to tenfold increase in lifetime within heterogeneities. Thus we considered lifetime imaging cases in which (i) the background possesses smaller fluorescent lifetime than the heterogeneities and, (ii) the background possesses larger lifetime than the heterogeneities.

Table 1. Background optical parameters used for the optimization problems (Eqns 1 & 2)

View Table | View all tables in this article

Table 2. Target optical parameters used for the optimization problems (Eqns 1 & 2)

View Table | View all tables in this article

To quantitatively evaluate the accuracy of TN and NCG methods, the following parameter for each reconstruction has calculated:

Sum of square errors= $\sum_{i = 1}^{n = 289}$ [(µ_a ) _actual -(µ_a ) _c ]² where c is the calculated value and there are 289 unknowns.

Figure 1. A two-dimensional square with three target (heterogeneities) regions with different optical properties. Size of each heterogeneity is 0.0625cm².

Download Full Size | PDF

Similar expressions were used for lifetime. A coarse, 2-dimensional mesh with 225 internal nodes and 64 boundary nodes (total 289 nodes) was used for the inverse problems while a total of 2401 nodes were used for the forward problems. The dual mesh scheme was utilised for this analysis. A coarse mesh was used for the inverse problem to reduce the computing time while a finer mesh was used for the forward problem to calculate the fluence accurately. The same meshes were used for both TN and NCG methods to compare their efficiency. (An optimal mesh size required for the inverse and the forward problem using different methods was not conducted in this study). For comparison of these two gradient-based methods, the synthetic data sets described in Section 2.2 were used for reconstruction of absorption coefficients as well as of lifetimes from fluorescence measurements. The final images reported herein are convergence the results obtained when either the length of the gradient ‖g‖ was less than 10^-8 (chosen by trial and error) or a maximum of 2000 iterations was reached.

Figure 2. a) Actual distribution of absorption coefficients, (b) reconstruction of absorption coefficients by the TN method (c) reconstruction of absorption coefficients by the NCG method.

Download Full Size | PDF

6. Results and Discussion

6.1 Absorption coefficient imaging from synthetic fluorescence measurements

Figure 2 (a) depicts the actual spatial distribution of absorption used for generation of the synthetic data set described in Problem 1 of Table 1. The three heterogeneities or targets

Table 3. Recovery of absorption coefficient (Problem 1)

View Table | View all tables in this article

possessed different absorption coefficients that were greater than the background. Figure 2(b) shows the reconstructed map of absorption from synthetic fluence values at 150MHz modulation frequency using the TN method while Figure 2(c) was obtained by the NCG method. These two methods reconstructed the three heterogeneities very well as shown in Figures 2b and 2c. The sum of square errors, resulting from the NCG method is slightly less than that of TN method as shown in Table 3. The time taken by TN method is 14031 seconds and the NCG took 14467 seconds. The quantitative error of the NCG method is slightly lower than that of the TN method when the initial guess was set to the background region as shown in Table 3. Both TN and NCG methods successfully recovered reconstructed maps similar to those depicted in Figures 2 (b)–(c) at various modulation frequencies.

6.2 luorescence imaging from synthetic fluorescence measurements with increased decay kinetics

Figure 3(a) depicts the actual lifetime map of Problem 2 listed in Table 1, in which the three heterogeneities with ten-fold uptake of fluorescent dye exhibited a longer lifetime (τ_h=10, 8, and 5 ns) within a background that possessed a shorter lifetime (τ_b=1 ns). Figure 3(b) shows the reconstructed map of lifetime from synthetic fluence values at 150 MHz modulation frequency using the TN method while the NCG method was unable to reconstruct the heterogeneities. Indeed, while the TN method was capable of recovering lifetime maps at all modulation frequencies, the NCG method could recover lifetimes when the modulation frequency, ω, was greater than 250 MHz (data not shown for brevity). When the optimisation problem was redefined to map the parameter ωτ instead of τ, recovery of lifetime maps was still not possible with the NCG method at ω<250 MHz.

It is noteworthy that using the Levenberg-Marquardt approaches, Paithankar [2] and Jiang [33] report reconstruction involving fluorophore quenching only and were unable to successfully reconstruct heterogeneities with longer lifetimes within a background of short lifetime. The sum of square errors given by the TN method is given in Table 4, and the reconstruction took 10859 seconds.

Table 4. Recovery of fluorescent lifetime (background 1 ns, Problem 2)

View Table | View all tables in this article

Figure 3. (a) Actual spatial distribution of lifetimes (background 1 ns), (b) reconstruction of lifetimes by the TN method.

Download Full Size | PDF

As will be discussed below in Section 7, it appears that the NCG method is unable to reconstruct lifetime maps when the variation of the error function E is small. As discussed in Section 7, the TN method gives the good results due to the fact that this method uses the gradients as well as the Hessian while the NCG method uses only the gradients.

6.3 Fluorescence lifetime imaging from synthetic fluorescence measurements with quenching

Figure 4(a) depicts the actual lifetime map of Problem 3 listed in Table 1 in which three heterogeneities with ten-fold uptake of fluorescent dye exhibited a shortening of lifetime (τ_h=8, 5, and 1 ns) within a background which possessed a longer lifetime (τ_b=10 ns). Figure 4(b) shows the reconstructed map of lifetime from synthetic fluence values at modulation frequency of 150 MHz using the TN method while Figure 4(c) shows the lifetime map obtained from the NCG method. Reconstruction using the NCG method resulted in only one of three heterogeneities distinguished from the background while reconstruction using the TN method resulted in all three heterogeneities distinguished from the background. The sum of square errors given by the TN method is shown in Table 5. Time taken by the TN method was 9417 seconds. As in problem 2, the TN method was capable of recovering lifetime maps at all modulation frequencies while the NCG method could recover lifetimes when the modulation frequency, ω, was greater than 250MHz (data not shown for brevity). When the optimization was redefined tomap parameter ωτ instead of τ, the frequency dependence on TN and NCG performance remain unchanged.

Table 5. Recovery of fluorescent lifetime (background 10 ns, Problem 3)

View Table | View all tables in this article

Figure 4. a) Actual spatial distribution of lifetimes (background 10 ns) (b) reconstruction of lifetimes by the TN method (c) reconstruction of lifetimes by the NCG method

Download Full Size | PDF

7 Relative performance of TN and NCG

In order to explore the performance of TN and NCG for reconstruction of lifetime and absorption coefficient maps due to fluorophore, we examined the error function and the adherence of Wolfe’s three criteria for both TN and NCG methods for each of the three problems. We have analysed TN and NCG using (i) sensitivity analysis to determine error function dependence at different modulation frequencies, and (ii) Wolfe’s criteria specific to the formulation of TN and NCG for the three problems investigated.

7.1 Sensitivity Analysis

A sensitivity analysis was performed by varying the values of one parameter (absorption coefficient or lifetime) in the target or heterogeneities while all the other parameters were kept constant. The objective of this exercise is to observe the variation of error function E for a small change in the parameter under study. Two cases were considered: (i) influence of the absorption coefficients on the target regions, and (ii) influence of the lifetimes on the target regions.

(a) Sensitivity of error function to absorption coefficient

The synthetic data at modulation frequency of 150 MHz, Φ _m , were calculated using a absorption coefficient of 0.02 cm ^-1 throughout the square except on the three target regions, as shown in Figure 1, where the absorption coefficient was 0.2 cm ^-1. Each heterogeneity consisted of four nodal points. The calculated values, Φ _c , was obtained using different absorption coefficients on the target regions that varied from 0.02 to 0.2 cm^-1 while all the remaining parameters were kept constant. The variation of the error function at modulation frequency of 150 MHz due to variation of the absorption coefficients is shown in Figure 5. The error function dependence of absorption varied little with modulation frequency.

Figure 5. Error function E as a function of the absorption coefficient of three target regions at 150 MHz.

Download Full Size | PDF

(b) Sensitivity of error function to lifetime

The synthetic Φ_m data at modulation frequencies of 15, 150, and 500 MHz, were obtained using a lifetime of 10 ns throughout the square except on the three target regions, as shown in Figure 1, where the lifetime was 1 ns. The calculated values, Φ_c, were obtained using different lifetimes on the target region which varied between 10 to 1 ns while all other parameters remained constant. The influence of modulation frequency upon the sensitivity of the error function to lifetime is illustrated in Figure 6. Similar sensitivities were obtained when the lifetime of the background was 1 ns and of the target regions was 10 ns. While the TN method successfully recovers absorption and lifetime maps at all modulation frequencies, the NCG method fails to recover lifetime maps at modulation frequencies less than 250MHz, where the variation of the error function with target lifetime is small. Regardless of whether the reconstruction involved lifetime or the product of lifetime and modulation frequency, the variation of the error function with lifetime is small when ω<250 MHz. Since the NCG method uses only the gradient of the error function while the TN method employs both the gradient and the Hessian of the error function, it may not be surprising that TN method performance is independent of modulation frequency.

Figure 6. Error equation E as a function of the lifetimes on three target regions

Download Full Size | PDF

7.2 Mathematical formulation

In addition to our evaluation of the gradients of the error function, we investigate the criteria of Wolfe’s condition for both TN and NCG methods. We start our evaluation by stating that the nonlinear conjugate gradient method is based upon the following assumption [34]:

Assumption 1: If E: ℜ ⁿ →ℜ is twice continuously differentiable and if there exists 0<m<M<∞ such that, for all x, d∊ℜ ⁿ ,

m {∥ d ∥}^{2} \leq 〈 d, G (x) d 〉 \leq M {∥ d ∥}^{2}

It should be noted the error function E(·) is either a function of the absorption coefficients, µ_a (r̄), or the lifetimes, τ(r̄). However x represents spatial distribution of either the absorption coefficients or the lifetimes.

Polak [34] proved the following theorem for the conjugate gradient method:

Theorem1 Suppose that Assumption above is satisfied and that ${x_{k}}_{k = 0}^{\infty}$ is a sequence constructed by the Polak-Ribiere Algorithm in Section 4.2 solving equation (5). Then,

(a) for all k∊N

〈 \nabla E (x_{k}), d_{k} 〉 \leq - \frac{m}{M} ∥ \nabla E (x_{k}) ∥ ∥ d_{k} ∥

and

(b) the sequence ${x_{k}}_{k = 0}^{\infty}$ converges to x̂, which is the unique minimizer of the error function E(·).

This theorem assumes the strict convexity of the error function E(·).

To investigate the performance of the NCG and the TN methods the following parameters were calculated for the problems 1, 2, and 3:

(a) the curvature direction d ^T G(x)d

(b) the length of the gradient vector ‖∇E(x)‖

(c) the Newton direction calculated by the truncated Newton method ‖d‖.

Table 6 gives typical values of the curvature directions for the problems 1, 2, and 3 in the first three iterations.

Table 6. Curvature directions of problems 1, 2, and 3 at modulation frequencies of 150 and 500 M

View Table | View all tables in this article

In problem 1, the curvature direction d ^T Gd is decreasing and positive, indicatry that a solution should exist. Since the function decreases along the direction d and the curvature direction is positive (and hence the Hessian is positive definite), the error function in problem 1 must also be convex. Thus the assumption of the NCG method applies as given in Equation (16). It is not surprising that, the NCG method as well as the TN method enable recovery of absorption.

As in problem 1, the curvature directions of problems 2 and 3 are decreasing and positive, but these values are very large. The lengths of the gradient vectors of each problem are given in Table 7. These gradient vectors are used for both TN and NCG optimization methods.

Table 7. Gradient vectors for problems 1, 2, and 3 at modulation frequency of 150 and 500 MHz.

View Table | View all tables in this article

The gradient vectors of problem 1 are decreasing and small compared to problems 2 and 3. As described in Section 6.1, the NCG method successfully recovers a map of absorption for solution to problem 1. The gradient vectors {‖g _k ‖k=1,2,3,…} of problems 2 and 3 are decreasing, bounded away from zero, but are large. The values of α calculated by the Armijo line search for both methods are given in Tables 8 and 9, respectively. The values of α used in the NCG method are zeroes in problem 2 at 150 MHz.

Table 8. α calculated in the TN optimization method

View Table | View all tables in this article

Table 9. α calculated in the NCG optimization method

View Table | View all tables in this article

We have from equation (10)

α_{k} = - \frac{{(τ)}_{k}}{d_{k}} α_{k} > 0

If dk is a descent direction, then E(τ_k +α d _k )<E(τ_k ), for small positive values of α. For this property we assume that the step length satisfies α>0 (as shown in Table 9). If the lifetimes (τ_k ) are greater than zero, then some of the Newton directions d _k should be negative. Since the directions of all the gradient vectors of problem 2 are negative at modulation frequencies less than 250 MHz (data at 150 and 500 MHz are listed in Table 10), problem 2 does not satisfy equation (18) or (10). It should be noted that the Newton direction is accepted as a search direction if it satisfies the Wolfe condition 1 to ensure that it has a significant component in the negative gradient direction.

Table 10. Direction of the gradient vectors at initial guess

View Table | View all tables in this article

The assumptions on Newton search direction d _k are that (i) sufficient descent must be produced, and (ii) the search direction is gradient related. Two assumptions for the step length are that (i) it must result in sufficient reduction in the function E, and (ii) it must not be too small. Wolfe conditions 2 and 3 (section 4.3) satisfy these assumptions associated with convergence. Let us investigate whether problem 2 interrogated at modulation frequencies of 200 and 500MHz satisfies the three Wolfe conditions (section 4.3) using different values of step length α. The step length is restricted to values between zero and one.

First we test if α=1 satisfies the Wolfe conditions. If it does not, then we try α with a very small number taken for instance from Table 9 as α=0.39×10^-18. We take the typical parameters, ε ₁=0.001 and ε ₂=ε ₃=0.1 for this analysis. Table 11 presents values of E(τ), E(τ+α d), and E(τ+2α d) for the minimum and maximum values of step size and τ equivalent to 1 ns. These error values are computed to assess the validity of Wolfe criteria 1, 2, and 3. Evaluation was performed at 200 MHz where the NCG failed to provide τ recovery, and at 500 MHz, where NCG recovery was successful. When the step size is maximum (i.e. α=1), problem 2 at 200 and 500 MHz modulation frequencies does not satisfyWolfe condition 2 because the reduction of error function is much higher than d ^T g (even when the parameter ε ₂=0.5.). When the step size is minimum, taken to be α=0.39×10^-18, problem 2 at 200 MHz does not satisfy Wolfe condition 3, i.e. that the rate of decrease of error function E in the direction d at (τ)_k+1 is not larger than some prescribed fraction of the rate of decrease in the direction d at (τ) _k . Whether the reconstruction parameter is ωτ or τ, Wolfe criteria 2 and 3 for the NCG method are not met.

Another reason for NCG failure pointed out in Section 7.1 may be that the variation of error function E is smaller at 200 MHz than that at 500 MHz (c.f. Table 11). Problem 2 does not converge at 200 MHz due to small variations of the error function, but does converge when the modulation frequency is greater than 200 MHz as discussed in Section 6.3. Powell and others have shown NCG may not converge for non-convex optimization problems and will fail due to round off for ill-conditioned problems [35].

Table 11. Values of error function E at different step lengths for Problem 2 for NCG

View Table | View all tables in this article

The TN algorithm is a trust region method, and for each iteration, it computes the Newton direction vectors {‖d _k ‖k=1, 2, 3,…} using the linear conjugate gradient method to approximately minimize the quadratic model of the error function given by equation (6) within a trust region. That is, the linear conjugate method calculates the Newton direction by solving the equation (7) with the gradient on the right hand side. Whenever the linear conjugate gradient method detects a nonascent negative curvature direction (a direction d _k such that g(x _k)^T d _k<0 and ${d_{k}}^{T}$ G(x _k )d _k <0), the minimization of the model is stopped and the new point is obtained by performing the trust region method along this direction (for detailed discussion see reference [18]). Once the Newton direction vectors are calculated, the unknown vectors are updated using the equation (10). The Newton directions calculated by TN for the specific problems described herein are given in Table 12.

Table 12. Newton directions calculated by the truncated Newton method of problems 1, 2, and 3

View Table | View all tables in this article

These values (Table 12) are the same order of magnitude as the lifetimes in problems 2 and 3. It is also important to note that when the Newton directions are calculated by the TN method as contained in Table 12, all Wolfe criteria are satisfied whereas Table 11 shows that this is not the case for NCG method. Finally, TN has the added advantage that it performs well when the error function is nonconvex [36].

8 Conclusions

It is seen that the NCG does not perform well for certain optical tomography scenarios. The NCG method gives slightly better images than those of the TN method when the absorption coefficients are the unknown quantities. The NCG method fails when fluorescent lifetimes are the unknown quantities and the modulation frequency is less than 250 MHz while the TN method reconstructed good images at these modulation frequencies. Since the variations of the error function are small at 200 MHz, the Wolfe conditions were not satisfied by the NCG method. When the modulation frequency is higher than 250 MHz the NCG method gives images similar to TN method. We suggest that similar careful consideration of the error function be considered when choosing an optimisation method for optical tomography. Since the TN method uses the gradients as well as the Hessian, this method out performs the NCG method that uses only the gradients when maps of fluorescence lifetime are being reconstructed.

Acknowledgements

This work was supported by the National Institutes of Health (R01 CA67167). We thank the members of the Photon Migration Laboratory as well as the constructive comments by the anonymous reviewers who helped to enhance our contribution.

References

1. Y. Q. Yao, Y. Wang, Y. L. Pei, W. W. Zhu, and R. L. Barbour, “Frequency-domain optical imaging of absorption and scattering distributions by Born iterative method,” J. Opt. Soc. Am A , 14, .325–342, (1997). [CrossRef]

2. D. Y. Paithankar, A. U. Chen, B. W. Pogue, M. S. Patterson, and E. M. Sevick-Muraca, “Imaging of fluorescent yield and lifetime from multiply scattered light re-emitted from tissues and other random media,” Appl. Opt , 36, 2260–2272, (1997). [CrossRef] [PubMed]

3. K. D. Paulsen and H. Jiang, “Enhanced frequency domain optical image reconstruction in tissues through total variation minimization,” Appl. Opt. 35, 3447–3458, (1996). [CrossRef] [PubMed]

4. H. B. Jiang, K. D. Paulsen, U. L. Osterberg, B. W. Pogue, and M. S. Patterson, “Optical image reconstruction using frequency domain data: Simulations and experiments,” J. Opt. Soc. Am. A 13, 253–266, (1996). [CrossRef]

5. M. A. O’Leary, D. A. Boas, B. Chance, and A. G. Yodh, “Experimental images of heterogeneous turbid media by frequency-domain diffusion-photon tomography,” Opt. Lett. 20, 426–428, (1995). [CrossRef]

6. S. R. Arridge, M. Schweiger, and D. T. Delpy, “Iterative reconstruction of near-infrared absorption images,” in Inverse Problems in Scattering and Imaging, M. A. Fiddy, ed., Proc. SPIE1867, 372–383, (1992).

7. R. Roy, “Image reconstruction from light measurement on biological tissue,’ Ph. D thesis, Hatfield, England, (1996).

8. S. R. Arridge and M. Schweiger, “A gradient-based optimization scheme for optical tomography”, Opt. Express ; 2, 213–226, (1998) http://www.opticsexpress.org/oearchive/source/4014.htm [CrossRef] [PubMed]

9. A. H. Hielscher, A. D Klose, and K. M. Hanson, “Gradient-based iterative image reconstruction scheme for time-resolved optical tomography,” IEEE Trans Med. Imag , 18, 262–271, (1999). [CrossRef]

10. A. D. Klose and A.H. Hielscher, “Iterative reconstruction scheme for optical tomography based on the equation of radiative transfer,” Med. Phy. 26, 1698–1707, (1999). [CrossRef]

11. M. V. Klibanov, T. R. Lucas, and R. M. Frank, “A fast and accurate imaging algorithm in optical diffusion tomography,” Inverse Problems , 13, 1341–1361, (1997). [CrossRef]

12. S. S. Saquib, K. M. Hanson, and G. S. Cunningham, “Model-based image reconstruction from time-resolved diffusion data,” in Medical Imaging: Image Processing, Proc. of the SPIE-The International Society for Optical Engineering , 3034, 369–380, (1997).

13. R. Roy and E. M. Sevick-Muraca, “Truncated Newton’s optimization scheme for absorption and fluorescence optical tomography: Part I theory and formulation,” Opt. Express , 4, 353–371, (1999). http://www.opticsexpress.org/oearchive/source/9268.htm [CrossRef] [PubMed]

14. R. Roy and E. M. Sevick-Muraca, “Truncated Newton’s optimization scheme for absorption and fluorescence optical tomography: Part II Reconstruction from synthetic measurements,” Opt. Express , 4, 372–382, (1999). http://www.opticsexpress.org/oearchive/source/10447.htm [CrossRef] [PubMed]

15. R. Roy and E. M. Sevick-Muraca, “Active constrained truncated Newton method for simple-bound optical tomography,” J. Opt. Soc. Am. A 17, 1627–1641, (2000). [CrossRef]

16. S. R. Arridge, “Optical tomography in medical imaging,” Inverse Problems , 15, R41–R93, (1999). [CrossRef]

17. R. S. Dembo and T. Steihaug, “Truncated Newton algorithms for large-scale unconstrained optimisation,” Math. Programming , 26, 190–212, (1983). [CrossRef]

18. L. C. W. Dixon and R. C. Price, “Numerical experience with the truncated Newton method for unconstrained optimization,” J. optimization theory and applications. 56, 245–255, (1988). [CrossRef]

19. T. Steihaug, “The conjugate gradient method and trust region in large scale optimisation,” SIAM J. Numerical analysis , 20, 626–637, (1983). [CrossRef]

20. M. R. Hestenes and E. Stiefel, ‘Methods of conjugate gradients for solving linear systems,” J. Res. Nat. Bur. Stand. 49, 409–436 (1952). [CrossRef]

21. R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients,” Computer J. , 7, 149–154 (1964). [CrossRef]

22. S. Bazaraa, H. D. Sherali, and C. M. Shetty, “Nonlinear programming theory and algorithms,” John Wiley & Sons Inc, New York, (1993).

23. R. Fletcher, “Practical methods of optimisation,” Second edition, John Wiley & Sons, Chichester, (1996).

24. M. J. D. Powell, “Restart procedures for the conjugate gradient method,” Math. Programming , 12, 241–254, (1977). [CrossRef]

25. A. Griewank, “On automatic differentiation,” edited M. Iri and K. Tanaka, Mathematical programming: Recent developments and application, Kluwer Academic Publishers, 83–108, (1989).

26. B. Christianson, A. J. Davies, L. C. W. Dixon, R. Roy, and P. van der Zee, “Giving reverse differentiation a helping hand,” Opt. Meth. And Software , 8, 53–67, (1997). [CrossRef]

27. A. J. Davies, B. Christianson, L. C. W. Dixon, R. Roy, and P. van der Zee, “Reverse differentiation and the inverse diffusion problem,” Adv. In Eng. Software , 28, 217–221, (1997). [CrossRef]

28. A. Ishimaru, Wave propagation and scattering in random media, Academic Press, New York, (1978).

29. O. C. Zienkiewcz and R. L. Taylor, The finite element methods in engineering science, McGraw-Hill, New York, (1989).

30. L. Armijo, “Minimization of functions having Lipschitz continuous first partial derivatives,” Pacific J. Mathematics , 16, 1–3, (1966).

31. J. C. Gilbert and J Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” Report no. 1268, INRIA, (1990)

32. P. Wolfe, “Convergence condition for ascent methods,” SIAM Rev , 11, 226–253, (1969). [CrossRef]

33. H. Jiang, “Frequency-domain fluorescent diffusion tomography: a finite-element-based algorithm and simulations,” Appl. Opt. 37, 5337–5343, (1998). [CrossRef]

34. E. Polak, Optimization, algorithms and consistent approximation, Springer-Verlag, New York, (1997)

35. M. J. D. Powell, “Nonconvex minimization calculations and the conjugate gradient methods,” Lecture Notes in Math. 1066,Spriger-Verlag, New York, 122–141, (1984).

36. S. G. Nash, “A survey of truncated-Newton methods,” J. of Computational and Applied Math. 124, 45–59, (2000). [CrossRef]

	background
Unknown variables	$μ_{a_{xi}}$ cm⁻¹	$μ_{a_{xf}}$ cm⁻¹	$μ_{s_{x}}^{'}$ cm⁻¹	$μ_{s_{m}}^{'}$ cm⁻¹	τ ns	ϕ
Problem 1, $μ_{a_{xf}}$	0.0	0.02	10.0	0.02	10	0.034
Problem 2 τ	0.0	0.02	10.0	0.02	1	0.034
Problem 3 τ	0.0	0.02	10.0	0.02	10	0.034

	Target1	Target2	Target3
Unknown variables	$μ_{a_{xf}}$ cm⁻¹	τ ns	$μ_{a_{xf}}$ cm⁻¹	τ ns	$μ_{a_{xf}}$ cm⁻¹	τ ns
Problem 1 $μ_{a_{xf}}$	0.2	10	0.1	8	0.05	5
Problem 2 τ	0.2	10	0.1	8	0.05	5
Problem 3 τ	0.2	1	0.1	8	0.05	5

Iteration Number	Problem 1 150 MHz	Problem 2 150 MHz	Problem 3 150 MHz	Problem 2 500 MHz	Problem 3 500 MHz
1	38271	0.3143×10³²	0.2040×10³²	0.2406×10³⁴	0.6244×10³⁰
2	1048	0.4732×10³⁰	0.4390×10³⁰	0.1001×10³⁴	0.1598×10²⁹
3	810	0.7994×10²⁹	0.5054×10²⁹	0.3414×10³⁰	0.2551×10²⁸

Iteration Number	Problem 1 150 MHz	Problem 2 150 MHz	Problem 3 150 MHz	Problem 2 500 MHz	Problem 3 500 MHz
1	14.04	0.1642×10¹⁵	0.1339×10¹⁵	0.9457×10¹⁷	0.1339×10¹⁵
2	3.530	0.3103×10¹⁵	0.2875×10¹⁴	0.8770×10¹⁷	0.2875×10¹⁴
3	2.630	0.1164×10¹³	0.9534×10¹³	0.8141×10¹⁷	0.9534×10¹³

Iteration Number	Problem 1 150 MHz	Problem 3 150 MHz	Problem 2 500MHz	Problem 3 500 MHz
1	0.44×10^-3	0.25×10^-18	0.39×10^-18	0.35×10^-18
2	0.14×10^-3	0.25×10^-18	0.19×10^-18	0.29×10^-18
3	0.72×10^-4	0.13×10^-18	0.33×10^-18	0.15×10^-18

A numerical study of gradient-based nonlinear optimization methods for contrast enhanced optical tomography

Abstract

1. Introduction

2. Governing equations for intensity-modulated excitation and emission light propagation

2.1 Galerkin finite element formulation of excitation and emission equations

2.2 Synthetic data sets for reconstruction

3. Inverse problemof recovering optical properties

4 Gradient based methods

4.1 The truncated Newton method (TN)

4.2 Nonlinear conjugate gradient method (NCG)

4.3 Line search for both TN and NCG

5. Approach and Methods

6. Results and Discussion

6.1 Absorption coefficient imaging from synthetic fluorescence measurements

6.2 luorescence imaging from synthetic fluorescence measurements with increased decay kinetics

6.3 Fluorescence lifetime imaging from synthetic fluorescence measurements with quenching

7 Relative performance of TN and NCG

7.1 Sensitivity Analysis

(a) Sensitivity of error function to absorption coefficient

(b) Sensitivity of error function to lifetime

7.2 Mathematical formulation

8 Conclusions

Acknowledgements

References

Cited By

Figures (6)

Tables (12)

Equations (19)

Optics Express

Modulation frequency	Problem 2 150 MHz		Problem 3 150 MHz
	Positive	Negative	Positive	Negative
150 MHz	0	289	289	0
500 MHz	43	246	289	0

MHz	Alpha α	E(τ)	E(τ+α d)	E(τ+2α d)	Wolfe conditions
200	1.0	0.02260	2.52711	2.54711	1 satisfied
					2 not satisfied
					3 satisfied
200	0.39×10^-18	0.02260	0.02039	0.02018	1 satisfied
					2 satisfied
					3 not satisfied
500	1.0	0.2222	35.35	35.35	1 satisfied
					2 not satisfied
					3 satisfied
500	0.39×10^-18	0.2222	0.0601	0.2388	1 satisfied
					2 satisfied
					3 satisfied

Iteration Number	Problem 1 150 MHz	Problem 2 150 MHz	Problem 3 150 MHz	Problem 2 500 MHz	Problem 3 500 MHz
1	0.7225×10^-1	0.1409×10^-9	0.1177×10^-9	0.8530×10^-5	0.1294×10^-7
2	0.7828×10^-1	0.7824×10^-10	0.6972×10^-10	0.8051×10^-5	0.6687×10^-7
3	0.2639×10^-1	0.1328×10^-9	0.8905×10^-10	0.7805×10^-5	0.2357×10^-7