## Abstract

Optical diffraction tomography relies on solving an inverse scattering problem governed by the wave equation. Classical reconstruction algorithms are based on linear approximations of the forward model (Born or Rytov), which limits their applicability to thin samples with low refractive-index contrasts. More recent works have shown the benefit of adopting nonlinear models. They account for multiple scattering and reflections, improving the quality of reconstruction. To reduce the complexity and memory requirements of these methods, we derive an explicit formula for the Jacobian matrix of the nonlinear Lippmann-Schwinger model which lends itself to an efficient evaluation of the gradient of the data-fidelity term. This allows us to deploy efficient methods to solve the corresponding inverse problem subject to sparsity constraints.

© 2017 Optical Society of America

## 1. Introduction

Optical diffraction tomography (ODT) was introduced in [1] by E. Wolf in the late ’60s. It is a microscopic technique that retrieves the distribution of refractive indices in biological samples out of holographic measurements of the scattered complex field produced when the sample is illuminated by an incident wave. This method is of particular interest in biology because, contrarily to fluorescence imaging, it does not require any staining of the sample [2]. It proceeds by solving an inverse scattering problem, where the scattering phenomenon is governed by the wave equation. There is a vast literature on inversion methods going from linearized models (Born, Rytov) [1, 3] to nonlinear ones [4–7]. It is worth noting that the scattering model, along with its associated inverse problem, is generic and not limited to optical diffraction tomography. In particular, it is encountered in many other fields such as acoustics, microwave imaging, or radar applications [8].

#### 1.1. From the wave equation to the Lippmann-Schwinger integral equation

Let us consider an unknown object of refractive index *n*(**x**) lying in the region Ω ⊆ ℝ* ^{D}* (

*D*∈ {2, 3}) and being immersed in a medium of refractive index

*n*

_{b}, as depicted in Fig. 1. This sample is illuminated by the incident plane wave

**k**∈ ℝ

*specifies the direction of the wave propagation,*

^{D}*ω*∈ ℝ denotes its angular frequency, and

*u*

_{0}∈ ℂ defines its complex envelope (amplitude). The resulting total electric field

*u*(

**x**,

*t*) satisfies the wave equation

^{8}m/s is the speed of light in free space. Denoting by

*u*(

**x**) the complex amplitude of $u(\mathbf{x},t)=\mathrm{Re}\left(u(\mathbf{x}){e}^{-i\omega t}\right)$ and substituting it into Eq. (2), we obtain the inhomogeneous Helmholtz equation with the propagating constant in free space

*k*

_{0}=

*ω*/c. The total field

*u*(

**x**) is the sum of the scattered field

*u*

^{sc}(

**x**) and of the incident field

*u*

^{in}(

**x**), which is itself a solution of the homogeneous Helmholtz equation $\mathbf{\nabla}{u}^{\text{in}}(\mathbf{x})+{k}_{0}^{2}{n}_{\mathrm{b}}^{2}{u}^{\text{in}}(\mathbf{x})=0$. Accordingly, Eq. (3) can be rewritten as (see [1])

*g*(

**x**) is the Green’s function of the shift-invariant differential operator $({\mathbf{\nabla}}^{2}+{k}_{0}^{2}{n}_{\mathrm{b}}^{2}\mathbf{I})$. Specifically,

*g*verifies ${\mathbf{\nabla}}^{2}g(\mathbf{x})+{k}_{0}^{2}{n}_{\mathrm{b}}^{2}g(\mathbf{x})=-\delta (\mathbf{x})$, where

*δ*is the Dirac distribution and the minus sign is a convention used in physics. Under Sommerfeld’s radiation condition,

*g*(

**x**) is given by [9, and references therein]

*u*(

**x**) is governed by the Lippmann-Schwinger equation

#### 1.2. Inverse ODT problem: prior work

Let the object be illuminated by a series of incident fields ${u}_{p}^{\text{in}}(\mathbf{x})$, *p* ∈ [1 … *P*]. Records of the resulting total fields *u _{p}* (

**x**) at positions

**x**

*(*

_{m}*m*∈ [1 …

*M*]) in the detector plane Γ are denoted

**y**

*∈ ℂ*

_{p}*(see Fig. 1). The objective is then to retrieve the scattering potential function*

^{M}*f*(

**x**) (

*i.e.*, the refractive index

*n*(

**x**)) from the data

**y**

*. Pioneering methods were using linear approximations of the model. For instance, assuming that the scattering field is weak compared to the incident one (*

_{p}*i.e., u*

^{sc}≪

*u*

^{in}), one can interpret the phase of the transmitted wave as the Radon transform of the refractive index and then reconstruct

*f*using the filtered-back-projection algorithm [10, 11]. This method ignores the effect of diffraction. The first Born approximation [1] has then been proposed as a refined model. Its validity is however limited to thin samples with weak variations of their refractive index (RI) [12]. A more accurate linearization, less sensitive to the thickness of the sample but still limited to weak RI contrasts, is given by the Rytov approximation [3, 13]. It is derived by assuming that the total field has the form

*u*(

**x**) =

*u*

^{in}(

**x**)e

^{ϕ}^{(}

^{x}^{)}, where

*ϕ*(

**x**) is a complex phase function. Both Born and Rytov approximations have been originally used to derive direct inversion methods. They were later used within regularized variational approaches to improve the quality of reconstructed images [14,15].

Inversion methods that use a nonlinear model have been shown to significantly improve the accuracy of reconstruction. These include the conjugate-gradient method (CGM) [16, 17], the contrast source-inversion method (CSI) [18], the beam-propagation method (BPM) [19], the recursive Born approximation [5], or the hybrid method proposed in [4]. Although still approximate (for instance, they do not properly take reflections into account), they more closely adhere to the model of the physical phenomenon than the linear models, at the price of a higher computational cost. We refer the reader to [2] for additional details concerning existing approximations, regularizations, algorithms, and comparisons.

To address applications with thick samples and large RI contrasts, a better solution is to rely on the exact Lippmann-Schwinger model which accounts for mutiple scattering and reflections. Such an approach has been recently proposed in [6,7] (SEAGLE algorithm). There, the authors tackle the problem from a variational perspective. They minimize a nonconvex objective using the well known fast iterative shrinkage-thresholding algorithm (FISTA) [20]. Their main contribution is to compute the forward model (which itself requires the inversion of an operator) using Nesterov’s accelerated gradient-descent (NAGD) method [21] and, more interestingly, to explicitly compute the gradient of the quadratic data-fidelity term as an error-backpropagation of the forward algorithm. However, the bottleneck of their method is its high memory consumption. Indeed, the error-backpropagation strategy requires one to store all the iterates produced during the computation of the iterative forward model. This can be limiting for large 3D volumes.

#### 1.3. Contributions

To improve the computational efficiency of solvers such as SEAGLE, we provide an explicit expression for the Jacobian of the nonlinear Lippmann-Schwinger operator. This results in an efficient method to compute the gradient of the data-fidelity term and avoid recoursing to the memory-consuming error-backpropagation strategy. Another advantage is that the computation of the forward model and of the gradient are now decoupled. They can thus be solved using any numerical scheme. Then, considering simulated data, we show that the proposed method results in a significant reduction of both computational time and memory requirements with respect to SEAGLE, at no loss in quality.

In Section 2.1, we formulate the discrete forward model proposed in [7]. Then, the common approach used to solve the inverse problem subject to sparsity constraints is presented in Section 2.2. There, we highlight our main innovation with respect to SEAGLE, which is a new computation of the gradient of the data-fidelity term. It relies on the derivation of the Jacobian of the forward model, which is given by Proposition 3.1. Finally, Sections 4 and 5 are dedicated to numerical comparisons.

## 2. Solving the inverse problem

#### 2.1. Formulation of the forward model

In this section, we review the formulation of the forward model that was proposed by Liu *et al.* in [7]. Let the region of interest Ω be divided into *N* ∈ ℕ “pixels”. Then, over Ω, we define the discrete version of Eq. (7) as

**u**

*∈ ℂ*

_{p}*, ${\mathbf{u}}_{p}^{\text{in}}\in {\u2102}^{N}$,*

^{N}**f**∈ ℝ

*are the discrete representations of*

^{N}*u*, ${u}_{p}^{\text{in}}$, and

_{p}*f*, respectively. The diagonal matrix

**diag**(

**f**) ℝ

^{N}^{×}

*is formed out of the entries of*

^{N}**f**, while

**G**∈ ℂ

^{N}^{×}

*stands for the matrix of the convolution operator on Ω (convolution with*

^{N}*g*). One can notice that Eq. (8) is nonlinear with respect to

**f**. On the other hand, given ${\mathbf{u}}_{p}^{\text{in}}$ and

**f**, the computation of

**u**

*amounts to inverting the operator (*

_{p}**I**−

**G diag**(

**f**)). Instead of attempting to compute this inverse directly, the ODT forward model on Ω, for a given

**f**, is defined as

**u**

*(*

_{p}**f**) (inside Ω), we get measurements

**y**

*on Γ using a different discretization $\tilde{\mathbf{G}}\in {\u2102}^{M\times N}$ of the Green’s function (see [7])*

_{p}#### 2.2. Common optimization strategy

Following the classical variational approach, the estimation of **f** ∈ ℝ* ^{N}* from the measurements {

**y**

*∈ ℂ*

_{p}*}*

^{M}

_{p}_{∈[1 …}

_{P}_{]}is formulated as the optimization problem

*μ*> 0 balances between these two terms. It is customary to consider the data term where ∀

*p*∈ [1 …

*P*]

**u**

*(*

_{p}**f**) is given by Eq. (9). As regularizer $\mathcal{R}$, the combination

*i*

_{⩾0}(

**f**) = {0, if

**f**

*≥0 ∀*

_{n}*n*; +∞, otherwise} and ∂

*denotes the gradient operator along the*

_{d}*d*th direction. This choice is supported by the facts that we consider situations where

*n*

_{b}≤

*n*(

**x**) ⇒

*f*(

**x**) ≥ 0 and that

*n*and, thus,

*f*can be assumed to be piecewise-constant. It is worth noting that Eq. (11) is nonconvex due to the nonlinearity of the forward operator in Eq. (8). However, since $\mathcal{D}$ is smooth with respect to

**f**, Eq. (11) can be solved by deploying a forward-backward splitting (FBS) method [22] or some accelerated variants [20, 23], as presented in Algorithm 1. The gradient of the data-fidelity term $\mathcal{D}$ is given by

*α*)

^{k}

_{k}_{∈ℕ.}Its convergenceis guaranteed in the convex case when $\gamma <1/\text{Lip}(\mathbf{\nabla}\mathcal{D})$, where $\text{Lip}(\mathbf{\nabla}\mathcal{D})$ is the Lipschitz constant of $\mathbf{\nabla}\mathcal{D}$. In the nonconvex case, a local convergence of the classical FBS algorithm can be shown [24]. Although, to the best of our knowledge, there exists no theoretical proof of convergence of accelerated versions for nonconvex function, Algorithm 1 always converged in our experiments.

### 2.2.1. Computation of ${\mathbf{J}}_{{h}_{p}}^{H}(\mathbf{f})$

The computation of ${\mathbf{J}}_{{h}_{p}}^{H}(\mathbf{f})$, required at line 5 of Algorithm 1, is challenging. The existence of a closed-form solution is made unlikely by the fact that the forward model in Eq. (8) itself requires one to invert an operator. We distinguish two distinct strategies.

- SEAGLE: Build an error-backpropagation rule from the NAGD algorithm used to compute the forward model Eq. (9).
- Ours: Derive an explicit expression of ${\mathbf{J}}_{{h}_{p}}(\mathbf{f})$, as given in Section 3 (Proposition 3.1).

### 2.2.2. Computation of ${\text{prox}}_{\gamma \lambda \mathcal{R}}$

Numerous methods have been proposed to compute the proximity operator of $\mathcal{R}$, [25–27]. In SEAGLE, Liu *et al.* use the algorithm proposed by Beck and Teboulle [25]. Here, we compute it using the popular alternating-direction method of multipliers (ADMM) [28], which is well suited to the minimization of the sum of three convex functions. Moreover, it provides a high modularity for spatial regularization since one can easily change from one regularizer (*e.g.*, TV) to another (*e.g.*, Hessian Shatten-norm [29]). Details about the computation of $\mathcal{R}$ are provided in Appendix A.

### 2.2.3. Speedup strategies

The cost of evaluating the forward model with Eq. (9) and the gradient $\mathbf{\nabla}\mathcal{D}$ is proportional to the number *P* of illuminations ${\mathbf{u}}_{p}^{in}$. However, these computations can easily be parallelized by performing the computation for each illumination (or each element of the sum in Eq. (15)) on a separate thread. Moreover, in the spirit of the stochastic gradient-descent algorithm [30], we approximate $\mathbf{\nabla}\mathcal{D}$ as

*ω*is a subset of [1 …

*P*]. We change

*ω*at each iteration. Such a method is known to spare many computations when $\mathbf{\nabla}{\mathcal{D}}_{p}$ does not admit a simple-form expression.

## 3. Efficient computation of the gradient $\mathbf{\nabla}\mathcal{D}$

The error-backpropagation strategy used in SEAGLE to compute ${\mathbf{J}}_{{h}_{p}}^{H}(\mathbf{f})$ implies that one must store all the forward iterates. This consumes memory resources and compromises the deployment of the method for large 3D data. Instead, Proposition 3.1 reveals that its computation requires one to invert the operator (**I** − **diag**(**f**)**G*** ^{H}*). This operator has the same form (and size) that the operator we invert within the forward computation in Eq. (9) and both can be computed in a similar way, using an iterative algorithm. Moreover, it allows us to decouple the forward and gradient computation in Algorithm 1, which has the two following advantages:

- choice of any iterative algorithm for computing Eq. (9) at line 4 of Algorithm 1, and computing ${\mathbf{J}}_{{h}_{p}}^{H}(\mathbf{f})$ at line 5 of Algorithm 1 (see Section 4.2);
- reduction of the memory consumption (no needs for storing forward iterates).

**Proposition 3.1.** *The Jacobian matrix of the function h _{p} in Eq. (17) is given by*

*Proof*. We use the Gâteaux derivative in the direction

**v**∈ ℝ

*given by*

^{N}Then, from Eq. (8), we get that

## 4. Algorithm analysis

#### 4.1. Memory requirement

In this section, we elaborate on the memory consumption of the proposed method in comparison with SEAGLE. First, let us state that gradient based methods, such as NAGD or CG, have similar memory requirements. It corresponds roughly to three times the size of the optimization variable which is the part that is common to both algorithms. The additional memory requirement that is specific to SEAGLE relies only on the storage of the NAGD iterates during the forward computation. Suppose that *K*_{NAGD} ∈ ℕ iterations are necessary to compute the forward model with Eq. (9) and that the region Ω is sampled over *N* ∈ ℕ pixels (voxels, in 3D). Since the total field **u*** _{p}* (

**f**) computed by NAGD is complex-valued, each pixel is represented with 16 bytes (double precision for accurate computations). Hence, the difference of memory consumption between SEAGLE and our method is

*K*

_{NAGD}intermediate iterates of NAGD. Here, we assumed that $\mathbf{\nabla}\mathcal{D}$ was computed by sequentially adding the partial gradients $\mathbf{\nabla}{\mathcal{D}}_{p}$ associated to the

*P*incident fields. Hence, once the partial gradient associated to one incident angle is computed by successively applying the forward model (NAGD) and the error-backpropagation procedure, the memory used to store the intermediate iterates can be recycled to compute the partial gradient associated to the next incident angle. However, when the parallelization strategy detailled in Section 2.2.3 is used, the memory requirement is mutiplied by the number

*N*

_{Threads}∈ ℕ of threads, so that

*N*

_{Threads}partial gradients in parallel requires

*N*

_{Threads}times more memory.

For illustration, we give in Fig. 2 the evolution of Δ_{Mem} as a function of *N* for different values of *K*_{NAGD} and *N*_{Threads}. One can see with the vertical dashed lines that, for 3D volumes, the memory used by SEAGLE quickly reaches several tens of Megabytes, even for small volumes (*e.g.*, 128 × 128 × 128), to hundreds of Gigabytes for the larger volumes that are typical of microscopy (*e.g.*, 512 × 512 × 256). This shows the limitation of SEAGLE for 3D reconstruction in the presence of a shortage of memory resources and reinforces the interest of the proposed alternative.

#### 4.2. Conjugate gradient vs. Nesterov accelerated gradient descent for Eq. (9)

Due to Proposition 3.1, we can compute both Eq. (9) and ${J}_{{h}_{p}}^{H}(\mathbf{f})$ using any state-of-the-art quadratic optimization algorithm. This contrasts with SEAGLE, where one must derive the error-backpropagation rule from the forward algorithm, which may limit its choice. We now provide numerical evidence that GC is more efficient than NAGD for solving Eq. (9). To this end, we consider a circular object (bead) of radius *r*_{bead} and refractive index *n*_{bead} immersed into water (*n*_{b} = 1.333), as presented in Fig. 3 (top-left). In such a situation, an analytic expression of the total field is provided by the Mie theory [31,32]. Hence, at each iteration *k*, we compute the relative error *ε _{k}* of the current estimate

**u**

*to the Mie solution*

^{k}**u**

_{Mie}as

*λ*= 406 nm. The region of interest is square with a side length of 16

*λ*(see top-left panel of Fig. 3). It is sampled using 1,024 points along each side. We used a fine grid in order to limit the impact of numerical errors related to discretization. The wave source corresponds to the bottom border of this region. Then, as in [6, 7], we refer to the refractive index

*n*

_{bead}by its contrast with respect to the background medium, defined as $\mathrm{max}(\left|\mathbf{f}\right|)/\left({k}_{0}^{2}{n}_{\mathrm{b}}^{2}\right)$. We show in Fig. 4 the evolution of ${k}_{{\epsilon}_{0}}$, which is the number of iterations needed to let the relative error Eq. (28) fall below

*ε*

_{0}= 10

^{−2}. One can observe that CG is much more efficient than NAGD, in particular for large contrasts. This is not negligible since an evaluation of the forward model is required at each iteration of Algorithm 1 (line 4). Our comparison in terms of a number of iterations is fair because the computational cost of one iteration is the same for both algorithms. Note that the descent step of NAGD was adapted during the iterations following the same rule as in [6, 7].

Finally, the solution obtained with the two algorithms for *r*_{bead} = 3*λ* and a contrast of 1 are shown in Fig. 3. The analytic Mie solution is also provided for comparison. From these figures, one can appreciate the high accuracy obtained by solving Eq. (9), as first demonstrated in [6, 7].

## 5. Numerical experiments

This section is devoted to numerical experiments that illustrate the two main advantages of the proposed method over SEAGLE, which consist in a reduced computational cost and a reduced memory consumption. The algorithms have been implemented using an inverse-problem library developed in our group [33] (*GlobalBioIm*: http://bigwww.epfl.ch/algorithms/globalbioim/). Hence, they share the implementation of the overall FISTA algorithm as well as inner procedures such as the computation of the proximity operator of ℛ (see Appendix A). The only difference between the two methods resides in the computation of the forward model in Eq. (9) and ${J}_{{h}_{p}}^{H}(\mathbf{f})$. For SEAGLE, this is performed using the NAGD algorithm and an error-backpropagation strategy. For our method, Eq. (9) and ${J}_{{h}_{p}}^{H}(\mathbf{f})$ are computed using the CG algorithm, in accordance with Proposition 3.1. Note that no parallelization is used. Reconstructions are performed with MATLAB 9.1 (The MathWorks Inc., Natick, MA, 2000) on a PowerEdge T430 Dell computer (Intel Xeon E5-2620 v3).

#### 5.1. Simulated data

### 5.1.1. Simulation settings

The Shepp-Logan phantom of Fig. 5 has the contrast $\mathrm{max}(\left|\mathbf{f}\right|)/\left({k}_{0}^{2}{n}_{\mathrm{b}}^{2}\right)=0.2.$ It is immersed into water (*n*_{b} = 1.333). The wavelength of the incident plane waves is *λ* = 406 nm. We consider thirty-one incident angles, from −60° to +60^{°}. The sources are placed at the bottom side of the sample, at a distance of 16.5*λ* from its center. Moreover, we consider two detectors placed on both top and bottom sides of the object, also at a distance of 16.5*λ* from its center. Hence, the overall region is a square of length 33*λ* per side. Data are simulated using a fine discretization of this region, with a (1024 × 1024) grid that leads to square pixels of surface (3.223 10^{−2}*λ*)^{2}. We used a large number of CG iterations to get the accurate simulation mentioned in Section 4.2. Then, the measurements were extracted from the first and last rows of each total field associated to the incident fields. This lead to a total of (31 × 2 × 1024) measurements. Finally, we defined three ODT problems by downsizing (using averaging) the (31 × 2 × 1024) measurements to grids with size of (31 × 2 × 512), (31 × 2 × 384), and (31 × 2 × 256).

This setting corresponds to an ill-posed and highly scattering situation. Moreover, the detector length is only two times larger than the object, which results in a loss of information for large incident angles. This makes the resulting inverse problem challenging.

### 5.1.2. Algorithm parameters

For each simulated OTD problem, we considered a square region of interest Ω with sides half the sources–detector distance. That corresponds to images of size (256 × 256) with pixels of area (6.445 · 10^{−2}*λ*)^{2}, (192 × 192) with pixels of area (8.839 · 10^{−2}*λ*)^{2}, and (128 × 128) with pixels of area (1.289 · 10^{−1}*λ*)^{2}. The support of the phantom is·fully contained in Ω.

Then, to compute the gradient (stochastic-gradient strategy), we selected eight angles over the thirty-one that were available and changed this selection at each iteration (see Section 2.2.3).

The NAGD or CG forward algorithms are stopped either after hundred-twenty iterations or when the relative error between two iterates is below 10^{−4}. Finally, two-hundred iterations of FISTA are performed with a descent step fixed empirically to *γ* = 5 · 10^{−3}. We used the regularization parameter *μ* = 3.3 · 10^{−2}.

### 5.1.3. Metrics

We compared the two methods in terms of running time and memory consumption, as measured by the peak memory (maximum allocated memory) reached by each algorithm during execution. The outcome is reported in Table 1. Once again, due to the use of our inverse-problem library [33], the comparison of the two methods is fair because their implementations differ only by the forward algorithm and by the computation of ${J}_{{h}_{p}}^{H}(\mathbf{f})$. Moreover, CG and NAGD are implemented in the same fashion since they inherit the same optimization class of our inverse-problem library. Finally, we also provide the SNR of the reconstructed refractive index and observe that the computational gain comes at no cost in quality.

### 5.1.4. Discussion

Our proposed alternative to SEAGLE allows us to reduce both time and memory. Moreover, we have measured the peak memory difference Δ_{Mem} between the two methods and superimposed it on the predictions of Fig. 2 where the adequacy between the theoretical curves and the measured points is remarkable. Hence, although our experiments are restricted to 2D data, where the gap between the two algorithms is moderate, the evolution of Δ_{Mem} for 3D data can be extrapolated from Fig. 2. This shows the relevance of our method when size increases.

The SNR values given in Table 1 as well as the reconstructions presented in Fig. 6 suggest that the two methods perform similarly in terms of quality. This is not surprising since the overall algorithm is the same, the differences residing merely in the computation of the forward model in Eq. (9) and the Jacobian ${J}_{{h}_{p}}(\mathbf{f})$. Moreover, one can observe that the quality of reconstruction decreases when the discretization grid becomes coarser. Indeed, the model is insufficiently accurate when the discretization is too poor. For instance, in the case of the (128 × 128) grid, one wavelength unit is discretized using eight pixels, which is clearly detrimental to the accuracy of the forward model.

Reconstructions for the (256×256) problem are presented in Fig. 6 for completeness. Besides the difficulty of the considered scenario, the two methods are able to retrieve most details of the object in comparison with the Rytov approximation. Artifacts are mainly due to the missing-cone problem and to the limited length of the detector. This corroborates the findings of [7].

#### 5.2. Real data

We evaluated our method using the *FoamDielExt* target (TM polarisation) of the Institut Fresnel’s public database [34]. The data were collected for the two-dimensional inhomogeneous sample depicted in the left panel of Fig. 7. The permitivity of the ground truth was measured experimentally and is subject to uncertainties [34]. The object is fully contained in a square region of length 15 cm per side, which we discretize using a 256 × 256 grid. Sensors were distributed circularly around the object, at a distance of 1.67 m from its center, and with a step of 1°. Eight sources, uniformly distributed around the object, were sequentially activated. For each activated source, the sensors closer than 60° from the source were excluded. Thus, 241 detectors among the 360 available were used for each source. Frequencies from 2 to 10 GHz with a step of 1 GHz are available in the database but we used only the 3 GHz measurements (*i.e.*, *λ* = 10 cm).

The NAGD or CG forward algorithms are stopped either after two-hundred iterations or when the relative error between two iterates is below 10^{−6}. Hundred iterations of FISTA are performed with a descent step *γ* = 5 · 10^{−3}. We used the regularization parameter *μ* = 1.6 · 10^{−2}.

In Fig. 7, we see that both methods provide good reconstructions that are essentially indistinguishable (see also SNR values provided in the caption of the figure). This corroborates the simulated numerical experiments of Section 5.1. The main point here is that, for this setting, the proposed method was 15 times faster than SEAGLE.

## 6. Conclusion

We have presented a refinement of the SEAGLE algorithm that was recently proposed in [6, 7] and that has shown unprecedented reconstructions for difficult configurations. However, the current limitation of SEAGLE is that its memory requirements increase excessively with the size of the problem, particularly in 3D. As an alternative, we have derived the explicit expression of the Jacobian matrix ${J}_{{h}_{p}}(\mathbf{f})$ of the nonlinear Lippman-Schiwnger model and shown that it can be computed in a direct analogy with the computation of the forward model. This approach allows us to drastically reduce the memory consumption and opens the door to 3D reconstruction using desktop computers. Moreover, the proposed method is quite flexible in the sense that it can cope with any iterative algorithm employed to compute either the forward model or ${J}_{{h}_{p}}^{H}(\mathbf{f})$. For instance, the conjugate-gradient algorithm proved its efficiency for this task. It allows a significant decrease of the computational time with respect to SEAGLE. Finally, these improvements in terms of speed and memory come at no loss in quality.

## A. Proximity operator of ℛ

In this appendix, we describe how we compute the proximity operator of the regularization term ℛ in Eq. (14) using the ADMM algorithm [28]. The proximity operator is defined [35] as the solution of the optimization problem

*μ*> 0. Let us start by reformulating Eq. (29) as

*ρ*

_{1}and

*ρ*

_{2}are positive scalars, and where

**w**

_{1}∈ ℝ

^{N}^{×}

*and*

^{D}**w**

_{2}∈ ℝ

*are the Lagrangian multipliers. Then, one can minimize Eq. (31) using ADMM. The iterates are summarized in Algorithm 2.*

^{N}For the sake of completeness, we provide in Eq. (32) and Eq. (33) the expressions

## Funding

European Research Council (ERC) (692726)

## Acknowledgments

This research was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, Grant Agreement no 692726 “GlobalBioIm: Global integrative framework for computational bio-imaging.”

## References and links

**1. **E. Wolf, “Three-dimensional structure determination of semi-transparent objects from holographic data,” Opt. Commun. **1**, 153–156 (1969). [CrossRef]

**2. **D. Jin, R. Zhou, Z. Yaqoob, and P. T. So, “Tomographic phase microscopy: Principles and applications in bioimaging,” J. Opt. Soc. Am. B **34**, B64–B77 (2017). [CrossRef]

**3. **A. Devaney, “Inverse-scattering theory within the Rytov approximation,” Opt. Lett. **6**, 374–376 (1981). [CrossRef] [PubMed]

**4. **E. Mudry, P. C. Chaumet, K. Belkebir, and A. Sentenac, “Electromagnetic wave imaging of three-dimensional targets using a hybrid iterative inversion method,” Inverse Probl. **28**, 065007 (2012). [CrossRef]

**5. **U. S. Kamilov, D. Liu, H. Mansour, and P. T. Boufounos, “A recursive Born approach to nonlinear inverse scattering,” IEEE Signal Process. Lett. **23**, 1052–1056 (2016). [CrossRef]

**6. **H.-Y. Liu, U. S. Kamilov, D. Liu, H. Mansour, and P. T. Boufounos, “Compressive imaging with iterative forward models,” in “IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),” (IEEE, 2017), pp. 6025–6029.

**7. **H.-Y. Liu, D. Liu, H. Mansour, P. T. Boufounos, L. Waller, and U. S. Kamilov, “SEAGLE: Sparsity-driven image reconstruction under multiple scattering,” arXiv preprint arXiv:1705.04281 (2017).

**8. **D. Colton and R. Kress, *Inverse Acoustic and Electromagnetic Scattering Theory*, vol. 93 (Springer Science & Business Media, 2012).

**9. **J. A. Schmalz, G. Schmalz, T. E. Gureyev, and K. M. Pavlov, “On the derivation of the Green’s function for the Helmholtz equation using generalized functions,” Am. J. Phys. **78**, 181–186 (2010). [CrossRef]

**10. **A. C. Kak and M. Slaney, *Principles of Computerized Tomographic Imaging* (SIAM, 2001). [CrossRef]

**11. **W. Choi, C. Fang-Yen, K. Badizadegan, S. Oh, N. Lue, R. R. Dasari, and M. S. Feld, “Tomographic phase microscopy,” Nat. Methods **4**, 717 (2007). [CrossRef] [PubMed]

**12. **B. Chen and J. J. Stamnes, “Validity of diffraction tomography based on the first Born and the first Rytov approximations,” Appl. Opt. **37**, 2996–3006 (1998). [CrossRef]

**13. **Y. Sung, W. Choi, C. Fang-Yen, K. Badizadegan, R. R. Dasari, and M. S. Feld, “Optical diffraction tomography for high resolution live cell imaging,” Opt. Express **17**, 266–277 (2009). [CrossRef] [PubMed]

**14. **Y. Sung and R. R. Dasari, “Deterministic regularization of three-dimensional optical diffraction tomography,” J. Opt. Soc. Am. A **28**, 1554–1561 (2011). [CrossRef]

**15. **J. Lim, K. Lee, K. H. Jin, S. Shin, S. Lee, Y. Park, and J. C. Ye, “Comparative study of iterative reconstruction algorithms for missing cone problems in optical diffraction tomography,” Opt. Express **23**, 16933–16948 (2015). [CrossRef] [PubMed]

**16. **P. C. Chaumet and K. Belkebir, “Three-dimensional reconstruction from real data using a conjugate gradient-coupled dipole method,” Inverse Probl. **25**, 024003 (2009). [CrossRef]

**17. **K. Belkebir, P. C. Chaumet, and A. Sentenac, “Superresolution in total internal reflection tomography,” J. Opt. Soc. Am. A **22**, 1889–1897 (2005). [CrossRef]

**18. **A. Abubakar and P. M. van den Berg, “The contrast source inversion method for location and shape reconstructions,” Inverse Probl. **18**, 495 (2002). [CrossRef]

**19. **U. Kamilov, I. Papadopoulos, M. Shoreh, A. Goy, C. Vonesch, M. Unser, and D. Psaltis, “Learning approach to optical tomography,” Optica **2**, 517–522 (2015). [CrossRef]

**20. **A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci. **2**, 183–202 (2009). [CrossRef]

**21. **Y. Nesterov, “A method of solving a convex programming problem with convergence rate O(1/k^{2}),” Soviet Math. Dokl. **27**, 372–376 (1983).

**22. **P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Model Simul. **4**, 1168–1200 (2006). [CrossRef]

**23. **Y. Nesterov, “Gradient methods for minimizing composite functions,” Math. Prog. **140**, 125–161 (2013). [CrossRef]

**24. **H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods,” Math. Prog. **137**, 91–129 (2013). [CrossRef]

**25. **A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. **18**, 2419–2434 (2009). [CrossRef] [PubMed]

**26. **A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” J. Math. Imaging Vision **40**, 120–145 (2011). [CrossRef]

**27. **U. S. Kamilov, “A parallel proximal algorithm for anisotropic total variation minimization,” IEEE Trans. Image Process. **26**, 539–548 (2017). [CrossRef]

**28. **S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn. **3**, 1–122 (2011). [CrossRef]

**29. **S. Lefkimmiatis, J. Ward, and M. Unser, “Hessian Schatten-norm regularization for linear inverse problems,” IEEE Trans. Image Process. **22**, 1873–1888 (2013). [CrossRef] [PubMed]

**30. **L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in “Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22–27, 2010 Keynote, Invited and Contributed Papers,” Y. Lechevallier and G. Saporta, eds. (Physica-Verlag HD, Heidelberg, 2010), pp. 177–186.

**31. **A. J. Devaney, *Mathematical Foundations of Imaging, Tomography and Wavefield Inversion* (Cambridge University Press, 2012). [CrossRef]

**32. **J. A. Stratton, *Electromagnetic Theory* (John Wiley & Sons, 2007).

**33. **M. Unser, E. Soubies, F. Soulez, M. McCann, and L. Donati, “GlobalBioIm: A unifying computational framework for solving inverse problems,” in “Proceedings of the OSA Imaging and Applied Optics Congress on Computational Optical Sensing and Imaging (COSI’17),” (San Francisco CA, USA, 2017). Paper no. CTu1B.

**34. **Jean-Michel Geffrin, Pierre Sabouroux, and Christelle Eyraud, “Free space experimental scattering database continuation: experimental set-up and measurement precision,” Inverse Probl. **21**, 6(2005). [CrossRef]

**35. **J.-J. Moreau, “Fonctions convexes duales et points proximaux dans un espace hilbertien,” C. R. Acad. Sci. Ser. A Math. **255**, 2897–2899 (1962).