Simultaneous measurement and reconstruction tailoring for quantitative phase imaging

Zhengyun Zhang; Wei-Na Li; Anand Asundi; George Barbastathis

doi:10.1364/OE.26.032532

1. Introduction

Imaging the spatially-varying optical path length differences encountered by light as it passes through a specimen of interest, i.e., quantitative phase imaging, has wide applicability in many scientific domains. For example, the spatial distribution of absorption in many biological specimens does not exhibit high contrast, unlike quantitative phase images of the same specimen. While interferometry and holography are classical solutions to phase imaging, there also exists a large family of self-interferometric techniques wherein the axial intensity evolution of a collimated beam after passing through a specimen of interest is used to numerically estimate the specimen’s optical path length profile. Members of this family include nonlinear phase retrieval techniques like Gerchberg–Saxton–Fienup approaches [1,2] or PhaseLift [3], techniques based on the transport of intensity equation (TIE) [4–8], and techniques based on contrast transfer functions (CTFs) [9, 10]. While TIE and CTF techniques usually assume coherent or nearly coherent illumination, they have been extended to the partially coherent regime as well. Partial coherence results in loss of high frequency information away from the focal plane [8,11]; loss of spatial coherence caused by area sources can be inverted via illumination diversity [12].

Like any other inverse problem, reconstruction quality depends on the design of both the measurement and reconstruction steps of the imaging method—what measurements do we take, and how do we then numerically estimate the phase? In standard TIE imaging, transverse intensity profiles at two closely spaced planes are used to estimate the axial derivative, from which a phase image is computed via regularized inversion of the Laplace operator. Recent papers [7, 8, 13–15] suggest measurements at multiple planes for better derivative estimates, whereas other threads [16,17] discuss alternatives to Tikhonov regularization for reconstruction.

The optimal reconstruction design naturally depends on the choice of measurements taken, but conversely the optimal measurement design also depends on the choice of reconstruction method—we want to avoid wasting resources, such as imaging time or light budget, to collect information that the reconstruction method will simply ignore. This principle suggests that it is suboptimal to design measurements to capture every last bit of information about the specimen and then design a reconstruction approach based on these measurements. Rather, it would be better to jointly optimize the design of the two. The seminal wavefront coding papers [18,19] as well as the ensuing computational photography field followed this general approach, wherein the captured images themselves may not be the “best” images of the object being photographed, but they yield higher quality images than standard photography after appropriate numerical postprocessing. Joint optimization was also leveraged explicitly via what is known as multi-domain optimization (MDO) in the design of many computational optical systems such as the origami lens [20,21]. Inspired by these approaches, we propose phase imaging via a joint optimization framework we call simultaneous measurement and reconstruction tailoring (SMaRT), which we now outline.

Our framework builds upon the same motivation as Wiener filtering and related minimum mean square error (MMSE) estimators, but we extend the classical approach by allowing the noise statistics (e.g., per-image noise level) to vary as a consequence of measurement design. This leads to seeking an optimal measurement–reconstruction design pair that minimizes the expected reconstruction error. More specifically, we parameterize specimens by vectors s in some Hilbert space 𝒮, and we are interested in imaging a class of specimens 𝒮₀ ⊆ 𝒮 with prior probability density function f_𝒮₀(s). Given design spaces ℳ₀ and ℛ₀ for the measurement M and reconstruction R, respectively, we wish to solve a problem of the form

\underset{M \in ℳ_{0}, R \in ℛ_{0}}{minimize} \int f_{𝒮_{0}} (s) {‖ s - r_{R} (p_{M} (s)) ‖}_{2}^{2} d s,

where p_M(·) is a stochastic function that describes the noisy forward model for a specific measurement method M, and r_R(·) is a deterministic function that reconstructs an estimate of s from data, based on reconstruction design R. While the general case is difficult to solve, the specific case of weak-phase objects illuminated by partially coherent light and imaged through a standard optical microscope under a fixed time budget enables rewriting Eq. (1) as a regularized least squares problem of the form

\underset{x_{1}, \dots, x_{L}}{minimize} {‖ b - \sum_{l = 1}^{L} A_{l} x_{l} ‖}_{2}^{2} + γ {(\sum_{l = 1}^{L} ‖ x_{l} ‖)}^{2} .

The least squares term describes the goodness of fit in the noiseless case, and the squared sum of norms regularizer models the impact of noise on the reconstruction under a fixed time budget. In this imaging scenario, measurement design M is a set of possibly-zero exposure times for L candidate imaging planes, and the reconstruction design R is a set of spectral coefficients so that r_R(·) spectrally mixes the captured images accordingly. Regularization parameter γ is not hand-tuned, but rather computed from calibration data; the A_l matrices and vector b are also a function of calibration data as well as expected feature size and feature phase height, the only two user-adjustable parameters in this approach. Optimal M and R are both computed from the solution x₁, . . ., x_L to the problem above. Furthermore, it is possible to predict the performance of the entire imaging process by examining the composite function r_R (p_M(·)).

The key to our SMaRT approach is the formulation of an optimization problem to jointly design the measurement and reconstruction processes for solving an inverse problem, rather than manually designing these two processes separately. This ameliorates the effect of human bias in the design process and allows a numerical algorithm to explore novel recipes that directly reduce the average reconstruction error. Furthermore, automated design facilitates leveraging additional information such as specimen statistics and the illumination spectrum, leading to more performant extraction of phase information compared to previous methods, which usually opt for simple measurement designs based on few assumptions. For example, knowledge of the spectrum of a polychromatic source allows us to ameliorate the effects of blurring caused by the source not being quasimonochromatic. Simple and intuitive parameters as well as predictable performance in our approach makes it easy to use in practice, and the concepts introduced in this paper can potentially be applied to other imaging scenarios as well.

The remainder of the paper is as follows. In Section 2, we derive an expression for the forward model p_M(·) of an optical microscope and list the assumptions needed for the model to be valid; in Section 3, we use the forward model to derive a convex problem of the form in Eq. (2) and discuss how to apply it to phase imaging as well as making the solution computationally tractable; in Section 4, we apply our method to both simulated and experimental data in the context of imaging a standard phase target; finally, in Section 5, we discuss the implications of our technique.

2. Forward propagation model

We start with a standard optical microscope layout under Köhler illumination, as shown in Fig. 1. Light from a single-color light-emitting diode (LED) is collimated by a condenser lens and illuminates a thin (phase) object of interest placed at the specimen plane. An objective lens and tube lens form a 4f system images the specimen onto the intermediate image plane, where a camera in a standard microcope would capture images. To accomodate phase imaging, we allow the camera to be translated axially to capture images at varying amounts of defocus. Our imaging system is typical of TIE imaging and similar to systems for low-coherence inline holography.

Fig. 1 A standard optical microscope, wherein an area light source at the front focal plane of a condenser lens illuminates a thin specimen (blue) coincident with the back focal plane. An objective–tube lens 4f system forms a magnified image (blue dotted) of the specimen at the intermediate image plane. Rays passing through the specimen’s center and edge are shown in gray and dark red, respectively. For clarity, relative sizes are not to scale.

Download Full Size | PDF

To simplify our model, we assume that the field at the intermediate image plane is a magnified and diffraction-limited representation of the field immediately after the specimen, since objective lenses are often highly optimized multi-element lens systems designed to minimize aberration. This allows us to mathematically model the system as a virtual partially coherent source illuminating a magnified specimen, whose image is captured at varying amounts of defocus, both negative and positive. The magnification reduces the effective range of illumination angles emanating from the virtual source, and it, together with the the numerical aperture (NA) of the objective lens, also imposes a low spatial bandlimit on the field observed at the intermediate image plane. These factors enable simplifying assumptions in the mathematical model that we derive in the rest of the section. Our derivation starts with a parameterization s of the specimen, continues with expressions for the propagated intensity profiles under both coherent and partially coherent illumination, and concludes with a model for noisy pixel measurements, yielding the stochastic forward operator p_M(·).

2.1. Thin specimen parametrization

We consider thin specimens located on some reference plane (which we will denote as z = 0) with coordinates (x, y) and amplitude transmission profiles of the form

Θ (x, y) = \bar{Θ} + s (x, y) = \bar{Θ} + α (x, y) + i β (x, y),

where Θ̄ is the average transmissivity, s(x, y) is a zero-mean complex-valued function with α(x, y) and β(x, y) as its real and imaginary components, respectively, and i is the imaginary unit. In practice, we assume Θ̄ = 1 since we are not interested in illumination intensity for phase imaging. We assume that the specimen is not dispersive—i.e., an incident coherent field U_in(x, y) at z = 0 for any wavelength of interest λ induces an output field at z = 0 given by U_out(x, y) = Θ(x, y)U_in(x, y). We also assume that s(x, y) is spatially bandlimited, and that we are only interested in the value of s(x, y) within a rectangular area of extent W_x × W_y. Let positive integers M and N be defined so that W_x/M and W_y/N are sufficient Nyquist sampling intervals for the bandlimit of s(x, y). Given these assumptions, we choose to model s(x, y) via a finite weighted sum of complex phasors, resulting in a periodic function with period W_x and W_y in the x and y directions, respectively:

s (x, y) \approx \sum_{m = 1}^{M} \sum_{n = 1}^{N} {\tilde{s}}_{m n} exp (i 2 π ξ_{m} x) exp (i 2 π η_{n} y) = \sum_{m, n} {\tilde{s}}_{m n} ψ_{m n} (x, y),

where ξ_m = (m− m̄)/W_x and η_n = (n − n̄)/W_y give the discretized frequencies, and m̄ = ⌊M/2⌋+1 and n̄ = ⌊N/2⌋ + 1 give the indexes of the zero-frequency component. We define α̃_mn and β̃_mn similarly for α(x, y) and β(x, y), respectively. Since α(x, y) and β(x, y) are real-valued functions, their corresponding Fourier coefficients α̃_mn and β̃_mn possess symmetry, i.e.,

{\tilde{α}}_{m n} = {\tilde{α}}_{m^{'} n^{'}}^{*}

and

{\tilde{β}}_{m n} = {\tilde{β}}_{m^{'} n^{'}}^{*}

where m′ = 2m̄ − m, and n′ = 2n̄ − n, and ^* denotes complex conjugation. Note that by definition, s̃_m̄n̄ = α̃_m̄n̄ = β̃_m̄n̄ = 0. Thus, we can represent each specimen by a vector s of length 2MN − 2, which consists of MN − 1 entries corresponding to the α̃_mns followed by MN − 1 entries corresponding to the β̃_mns. For the remainder of the paper, we will use

\sum_{m, n}

to denote summing over all pairs of (m, n) except for (m̄, n̄).

We note that we must restrict the region of interest to a finite W_x × W_y region to make any numerical approach tractable, and we assume a periodic extrapolation of the specimen to simplify notation and achieve a model compatible with the discrete Fourier transform. In practice, no matter what assumptions are made about areas outside the region of interest, there will always be boundary artifacts that signify a mismatch between these assumptions and reality.

2.2. Noiseless propagated intensity for coherent illumination

We first derive the transverse intensity profile at various propagation distances z when the specimen is illuminated by a near-normal-incidence unit-amplitude coherent plane wave with wavelength λ much smaller than the features in the specimen; the magnification and numerical aperture present in the microscope’s 4f system satisfies this assumption. The incident field is

U_{in} (x, y) = exp [i 2 π (ξ x + η y)],

where frequencies ξ, η describe the angle of the incoming plane wave. The small incidence angle and large feature size assumptions imply M/(2W_x) + |ξ| ≪ 1/λ, and N/(2W_y) + |η| ≪ 1/λ. These assumptions allow us to model propagation of the outgoing field U_out(x, y) using the Fresnel diffraction integral, resulting in a propagated field of the form

U_{prop} (x, y, z) = ϕ_{z} U_{in} (x, y) [1 + s_{prop} (x - λ z ξ, y - λ z η, z; λ)],

where ϕ_z = exp(i2πz/λ) exp[−iπλz(ξ² + η²)] and

s_{prop} (x, y, z; λ) = \sum_{m, n} {\tilde{s}}_{m n} ψ_{m n} (x, y) exp [- i π λ z (ξ_{m}^{2} + η_{n}^{2})] .

The intensity can thus be written as

\begin{array}{r} I_{coherent} (x, y, z; ξ, η, λ) = 1 + 2 Re {s_{prop} (x - λ z ξ, y - λ z η, z; λ)} \\ + {| s_{prop} (x - λ z ξ, y - λ z η, z; λ) |}^{2} . \end{array}

For weakly scattering specimens, the quadratic term in Eq. (8) is small, so we can approximate

I_{coherent} (x, y, z; ξ, η, λ) \approx 1 + 2 Re [s_{prop} (x - λ z ξ, y - λ z η, z; λ)] .

This is the weak phase object approximation, a standard approximation used in the CTF literature [22, 23]. For future reference, we note that the Fourier transform along x and y of Re[s_prop(x, y, z; λ)] can be written as the following series:

\sum_{m, n} {{\tilde{α}}_{m n} cos [π λ z (ξ_{m}^{2} + η_{n}^{2})] + {\tilde{β}}_{m n} sin [π λ z (ξ_{m}^{2} + η_{n}^{2})]} δ (ξ - ξ_{m}, η - η_{n}) .

2.3. Noiseless propagated intensity for partially coherent illumination

Using results from the fully coherent case, we now derive an expression for the propagated intensity I(x, y, z) when we illuminate the specimen with a partially coherent field whose cross spectral density [24] at the z = 0 plane has the form

W (x_{1}, y_{1}, x_{2}, y_{2}, ω) = I_{0} μ (x_{1} - x_{2}, y_{1} - y_{2}, ω),

where x₁, y₁ and x₂, y₂ are coordinates on this plane, ω = 2πc/λ is the optical frequency, c is the speed of light, k = 2π/λ is the wave number, and μ is proportional to the complex degree of spectral coherence [25] and related to the two-dimensional Fourier transform of a nonnegative function ρ(a, b, ω) by

μ (Δ_{x}, Δ_{y}, ω) = \iint ρ (a, b, ω) exp [- i k (a Δ_{x} + b Δ_{y})] d a d b .

The functions ρ and μ are normalized such that ∫ μ(0, 0, ω) dω = 1. Any field of the form given by Eq. (11) and Eq. (12) can be interpreted as a stochastic incoherent mixture of plane waves parameterized by ξ = −ω/c and η = −ω/c, with probability density ρ(a, b, ω). Thus, the resulting intensity can be written as

I (x, y, z) = I_{0} \int \iint ρ (a, b, ω) I_{coherent} (x, y, z; - k a, - k b, 2 π c / ω) d a d b d ω .

Substituting in Eq. (9), we obtain

I (x, y, z) \approx I_{0} + 2 I_{0} Re {\int \iint ρ (a, b, ω) s_{prop} (x + 2 π z a, y + 2 π z b, z; 2 π c / ω) d a d b d ω}

= I_{0} + \int [\frac{I_{0}}{4 π^{2} z^{2}} ρ (- \frac{x}{2 π z}, - \frac{y}{2 π z}, ω)] \otimes 2 Re {s_{prop} (x, y, z; 2 π c / ω)} d ω,

whose Fourier transform along x and y is

\tilde{I} (ξ, η, z) \approx I_{0} δ (ξ, η) + {\tilde{I}}_{real} (ξ, η, z) + {\tilde{I}}_{imag} (ξ, η, z),

where the contributions to the intensity by the real and imaginary parts of s(x, y) are given by

{\tilde{I}}_{real} (ξ, η, z) = I_{0} \sum_{m, n} {\tilde{α}}_{m n} δ (ξ - ξ_{m}, η - η_{m}) {\tilde{P}}_{real} (ξ_{m}, η_{n}, z),

{\tilde{I}}_{imag} (ξ, η, z) = I_{0} \sum_{m, n} {\tilde{β}}_{m n} δ (ξ - ξ_{m}, η - η_{m}) {\tilde{P}}_{imag} (ξ_{m}, η_{n}, z),

and the propagation transfer functions are given by

{\tilde{P}}_{real} (ξ_{m}, η_{n}, z) = 2 \int cos [λ π z (ξ_{m}^{2} + η_{n}^{2})] μ (- λ z ξ_{m}, - λ z η_{n}, ω) d ω,

{\tilde{P}}_{imag} (ξ_{m}, η_{n}, z) = 2 \int sin [λ π z (ξ_{m}^{2} + η_{n}^{2})] μ (- λ z ξ_{m}, - λ z η_{n}, ω) d ω .

The partially coherent intensity patterns can be interpreted as the convolution of the coherent intensity patterns with the shape of the area source scaled proportional to propagation distance [11]. For a circular area source with uniform spectrum, i.e., ρ(a, b, ω) = ρ̂(ω) when

a^{2} + b^{2} < r_{source}^{2}

and 0 otherwise, we can write μ(−λzξ_m, −λzη_n, ω) = J₁(2r̂)/r̂, where

\hat{r} = π z r_{source} {(ξ_{m}^{2} + η_{n}^{2})}^{1 / 2}

and J₁ is the Bessel function of the first kind and order one.

2.4. Noisy imaging model

We now model the relationship between the propagated intensity and measured pixel values. We assume that images are captured via an M × N array of sensor pixels covering the same W_x × W_y extent used in the definition of the specimen, with one image for each of the propagation distances z₁, . . ., z_L. For the (m, n) pixel in the lth image, we model the noiseless measured value as

{pixel}_{l m n}^{(noiseless)} = 𝒞_{1} τ_{l} \iint χ (x - x_{m}, y - y_{n}) I (x, y, z_{l}) d x d y,

where 𝒞₁ is a constant scaling factor, τ_l is the exposure time for the lth image, χ(x, y) is the pixel’s spatial sensitivity profile normalized to ∬χ(x, y) dx dy = 1, I(x, y, z) is the propagated intensity derived earlier, and (x_m, y_n) is the pixel center for the (m, n) pixel. This model relies on the assumption that the illumination spectrum is narrowband enough so that wavelength-dependent pixel sensitivity effects do not apply.

For noisy measurements, we assume that: (1) we record ≫ 10 photons per pixel, (2) Poisson shot noise dominates all other noise sources, and (3) the weak-perturbation assumption of I ≈ I₀ applies. While more powerful techniques such as deep neural networks can be applied to low photon count scenarios (see for example [26]), in this paper we focus on typical imaging scenarios, where all three assumptions are usually valid. These assumptions allow us to approximate the noise at all the pixels as independent and identically distributed (i.i.d.) zero-mean Gaussian noise with variance proportional to exposure time τ_l and average intensity I₀:

{pixel}_{l m n}^{(noisy)} ~ 𝒩 ({pixel}_{l m n}^{(noiseless)}, 𝒞_{2} τ_{l} I_{0}),

where 𝒞₂ is another constant scaling factor and I₀ is the average intensity. For the special case of a uniform intensity field, this reduces to 𝒩(𝒞₁τ_lI₀, 𝒞₂τ_lI₀). A centered unitary discrete Fourier transform (DFT) of the pixel values yields

{\tilde{pixel}}_{l m n}^{(noisy)} = \frac{1}{\sqrt{MN}} \sum_{\hat{m} = 1}^{M} \sum_{\hat{n} = 1}^{N} {pixel}_{l \hat{m} \hat{n}}^{(noisy)} exp {- i 2 π [(x_{\hat{m}} - x_{\bar{m}}) ξ_{m} + (y_{\hat{n}} - y_{\bar{n}}) η_{n}]}

~ 𝒩 (𝒞_{1} τ_{l} \hat{I} (ξ_{m}, η_{n}, z_{l}), 𝒞_{2} τ_{l} I_{0}),

where Î(ξ, η, z) = χ̃(ξ, η) [Ĩ(ξ, η, z) ⊗ W_xW_y sinc(W_xξ) sinc(W_yη)] describes the Fourier transform of the intensity after cropping by the sensor area and applying the pixel sensitivity function χ(x, y), whose Fourier transform is χ̃(ξ, η); we define sinc(a) = sin(πa)/(πa). Given the derivations in the previous section, Î(ξ_m, η_n, z_l) = I₀ when ξ_m = η_n = 0; for other frequency coordinates,

\hat{I} (ξ_{m}, η_{n}, z_{l}) = I_{0} \tilde{χ} (ξ_{m}, η_{n}) [{\tilde{P}}_{real} (ξ_{m}, η_{n}, z_{l}) {\tilde{α}}_{m n} + {\tilde{P}}_{imag} (ξ_{m}, η_{n}, z_{l}) {\tilde{β}}_{m n}] .

Let us define the normalized measurements y_l as a vector of length MN − 1 whose entries correspond to

{\tilde{pixel}}_{l m n}^{(noisy)} / (𝒞_{1} τ_{l} I_{0})

for m ≠ m̄ and n ≠ n̄. Let us also define n_l as a random vector of length MN − 1 whose entries are i.i.d. Gaussian random variables with zero mean and variance equal to

σ_{l}^{2} = 𝒞_{2} / (𝒞_{1}^{2} τ_{l} I_{0})

. For notational convenience, we define y and n to be the vertical concatenation of y₁, . . ., y_L and n₁, . . ., n_L, respectively, and we define measurement specification M as the collection of exposure times τ₁, . . ., τ_L. We can then express the forward operator p_M as the sum of a deterministic linear function acting on specimen vector s and a stochastic noise component n, i.e.,

y = p_{M} (s) = Ps + n,

with the linear relationship defined by the L(MN − 1) × 2(MN − 1) sparse matrix P. Let us index the entries of y and thus the rows of P by the triplet lmn, where 1 ≤ l ≤ L, and (m, n) takes on values between (1, 1) and (M, N) with the exclusion of (m̄, n̄). Similarly, let us index the entries of s and thus the columns of P by the triplet pqr, where p and q are spectral coordinates corresponding to m and n, and r ∈ {1, 2} determines whether we are considering the real or imaginary parts of s̃_pq, i.e., s_pqr = α̃_pq if r = 1 and s_pqr = β̃_pq if r = 2. Given these index definitions, then the (lmn, pqr) entry of matrix P is given by

P_{(l m n, p q r)} = {\begin{array}{l} p_{l m n 1} = \tilde{χ} (ξ_{m}, η_{n}) {\tilde{P}}_{real} (ξ_{m}, η_{n}, z_{l}) & if m = p, n = q, r = 1 \\ p_{l m n 2} = \tilde{χ} (ξ_{m}, η_{n}) {\tilde{P}}_{imag} (ξ_{m}, η_{n}, z_{l}) & if m = p, n = q, r = 2 \\ 0 & otherwise \end{array} .

3. Formulation as a convex optimization problem

The sparsity structure of P suggests a linear reconstruction operator of the form r_R(y) = Ry, where R is a 2(MN − 1) × L (MN − 1) matrix with similar sparsity structure parameterized by 2L(MN − 1) coefficients:

R_{(p q r, l m n)} = {\begin{array}{l} x_{m n r l} & if m = p, n = q \\ 0 & otherwise \end{array} .

Let x_l be a complex-valued vector of length 2MN − 2 whose entries consist of the x_mn1ls, which parameterize the reconstruction coefficients for recovering the α̃_mn components of s from the lth image, followed by the x_mn2ls for likewise recovering the β̃_mn components; let x be the vertical concatenation of the x₁, . . ., x_L. Therefore, we can say that x is a parameterization for the reconstruction design R.

As stated previously, the measurement design M is a list of exposure times τ₁, . . ., τ_L ≥ 0 for each candidate imaging plane, with imaging skipped when the exposure time is zero. To make Eq. (1) meaningful, we must impose some cost for the exposure time, or else we can unrealistically aim for infinite exposure time in order to reduce noise. We propose a simple light budget scheme constraining total exposure time to be less than some user-defined constraint τ₀.

With these definitions of M and R as well as the stochastic forward model function p_M defined in the previous subsection, we can rewrite Eq. (1) as a constrained convex problem that attempts to minimize the expectation of reconstruction error across all possible realizations of specimen s in specimen class 𝒮₀ and noise n (which we assume to be uncorrelated):

\underset{x, τ_{1}, \dots, τ_{L}}{minimize} E_{s \in 𝒮_{0}, n} [{‖ s - R (Ps + n) ‖}_{2}^{2}] subject to τ_{0} \geq \sum_{l} τ_{l} .

The objective function can be expanded into

E_{s \in 𝒮_{0}, n} [{‖ s - R (Ps + n) ‖}_{2}^{2}] = E_{s \in 𝒮_{0}, n} {\sum_{m, n} {‖ R_{m n} [P_{m n} (\begin{matrix} {\tilde{α}}_{m n} \\ {\tilde{β}}_{m n} \end{matrix}) + n_{m n}] - (\begin{matrix} {\tilde{α}}_{m n} \\ {\tilde{β}}_{m n} \end{matrix}) ‖}_{2}^{2}},

where n_mn is an L-vector containing the relevant components of n, and R_mn and P_mn are nonoverlapping blocks of R and P:

R_{m n} = (\begin{matrix} x_{m n 11} & x_{m n 12} & \dots & x_{m n 1 L} \\ x_{m n 21} & x_{m n 22} & \dots & x_{m n 2 L} \end{matrix}), P_{m n} = (\begin{matrix} p_{1 m n 1} & p_{1 m n 2} \\ ⋮ & ⋮ \\ p_{L m n 1} & p_{L m n 2} \end{matrix}) .

Defining matrix B_mn via factoring the specimen’s spectral (raw) covariance matrix C_mn,

B_{m n} B_{m n}^{H} = C_{m n} = [\begin{matrix} E_{s \in 𝒮_{0}} ({| {\tilde{α}}_{m n} |}^{2}) & E_{s \in 𝒮_{0}} ({\tilde{α}}_{m n} {\tilde{β}}_{m n}^{*}) \\ E_{s \in 𝒮_{0}} ({\tilde{β}}_{m n} {\tilde{α}}_{m n}^{*}) & E_{s \in 𝒮_{0}} ({| {\tilde{β}}_{m n} |}^{2}) \end{matrix}], B_{m n} = (\begin{matrix} b_{m n 1} & b_{m n 3} \\ b_{m n 2} & b_{m n 4} \end{matrix}),

we can remove the explicit expectation from the objective function to obtain

\sum_{m, n} {‖ (R_{m n} P_{m n} - I) B_{m n} ‖}_{2}^{2} + \sum_{m, n} tr [R_{m n}^{H} R_{m n} E_{n} (n_{m n}^{H} n_{m n})] .

We can further reshuffle Eq. (33) into a more familiar form, i.e.,

\begin{array}{l} \underset{x, τ_{1}, \dots, τ_{L}}{minimize} {‖ b - \sum_{l = 1}^{L} A_{l} x_{l} ‖}_{2}^{2} + \sum_{l = 1}^{L} σ_{l}^{2} {‖ x_{l} ‖}_{2}^{2} \\ subject to τ_{0} \geq \sum_{l} τ_{l}, \end{array}

where b is a vector of length 4(MN − 1) consisting of the b_mn1s followed by the b_mn2s, b_mn3s and b_mn4s, and A_l is a block diagonal matrix whose mnth block A_lmn is given by

A_{l m n} = [\begin{matrix} (p_{l m n 1} b_{m n 1} + p_{l m n 2} b_{m n 2}) I \\ (p_{l m n 1} b_{m n 3} + p_{l m n 2} b_{m n 4}) I \end{matrix}] .

For completeness, we define

σ_{l}^{2} {‖ x_{l} ‖}_{2}^{2} = 0

if both x_l = 0 and τ_l = 0 to avoid ambiguity. If the exposure times in Eq. (34) were fixed, then the σ_ls would become constants, making it a standard regularized least squares problem obtained in the derivation of the Wiener filter; such a optimization problem could easily be solved using standard methods such as conjugate gradients. However, since we are jointly optimizing both reconstruction and measurement design, our solution is no longer standard Wiener filtering, and we need to rewrite Eq. (34) to solve it.

Note that only the second term in the objective function of Eq. (34) depends on the τ_ls; therefore, we can reduce Eq. (34) to an unconstrained optimization over only the x_ls if we can derive a simplified expression for the minimal value of $\sum_{l = 1}^{L} σ_{l}^{2} {‖ x_{l} ‖}_{2}^{2}$ given the x_ls, i.e., solving

\underset{τ_{1}, \dots, τ_{L}}{minimize} \sum_{l = 1}^{L} σ_{l}^{2} {‖ x_{l} ‖}_{2}^{2} subject to τ_{0} \geq \sum_{l} τ_{l} .

The above subproblem has a closed form solution (see Appendix A) where τ_l is proportional to ‖x_l‖₂ for 1 ≤ l ≤ L. We note that this exposure allocation recipe applies to any spectral mixing reconstruction method where noise energy is equally distributed in the frequency domain and images are dominated by Poisson noise with more than ∼10 photons per pixel. The closed form solution for Eq. (36) allows us to rewrite Eq. (34) as an unconstrained convex problem:

\underset{x_{1}, \dots, x_{L}}{minimize} {‖ b - \sum_{l = 1}^{L} A_{l} x_{l} ‖}_{2}^{2} + γ {(\sum_{l = 1}^{L} {‖ x_{l} ‖}_{2})}^{2},

where

γ = 𝒞_{2} / (𝒞_{1}^{2} I_{0} τ_{0})

is a constant that represents the relative noise level in the system; it is a value obtained through the calibration and is not a user-selectable regularization parameter. This objective function has a smooth least-squares term as well as a nonsmooth regularization term that weakly promotes sparsity in the x_ls; it can be solved using proximal gradient methods [27–30], which were designed for minimizing sums of smooth and nonsmooth convex functions. The appropriate proximal operator for the regularization term is derived in Appendix C.

Once the optimal x_ls have been found, creating the reconstruction function r_R(·) is straight-forward. The optimal τ_ls can be obtained by setting $τ_{l} = τ_{0} {‖ x_{l} ‖}_{2} / (\sum_{k} {‖ x_{k} ‖}_{2})$ . The end-to-end spectral imaging performance of the measurement–reconstruction pair can be analyzed by observing the entries of the 2 × 2 matrix H_mn = R_mnP_mn. The diagonal terms give essentially the discretized transfer function for α(x, y) and β(x, y), whereas the cross-terms denote whether information about α bleeds into β and vice versa.

Solving Eq. (37) technically yields a measurement design M and reconstruction design R for estimating the functions α(x, y) and β(x, y) in the general case. For the remainder of this section, we will discuss how SMaRT can be applied specifically to quantitative phase imaging as well as how Eq. (37) can be solved efficiently in practice.

3.1. Estimating the covariance for phase imaging

Solutions to Eq. (37) depend on prior knowledge regarding the set 𝒮₀ of specimens of interest; the A_l matrices and the vector b in Eq. (37) are functions of the C_mns, the (prior) spectral covariance matrices defined in Eq. (32). We propose a generic two-parameter prior for phase imaging that works well in our experiments, although more apt C_mns could be derived for specific applications.

We start with the assumption that our specimen is a piecewise-constant phase specimen ϕ(x, y) where each piece has phase uniformly distributed between −ϕ_max/2 and ϕ_max/2. The corresponding α(x, y) and β(x, y) functions are given by

α (x, y) = ζ cos [ϕ (x, y)] - 1, and β (x, y) = ζ sin [ϕ (x, y)],

where 1/ζ = 2 sin(ϕ_max/2)/ϕ_max is a normalization term to ensure that α(x, y) is zero-mean.

Furthermore, we assume that the probability of any two points belonging to the same “piece” is equal to exp(−Δ_xy/Δ), where Δ_xy is the distance between the two points and Δ is a parameter, which we will define as the average feature size. A convex tessellation of the plane where the number of discontinuities in any line segment is a Poisson distribution would induce this type of exponential falloff, but here we make no additional assumptions about the exact form of ϕ(x, y). Using results from Appendix B, we find that the spectral covariance can be approximated by

C_{m n} = \frac{2 π Δ^{2}}{{[1 + 4 π^{2} Δ^{2} (ξ_{m}^{2} + η_{n}^{2})]}^{3 / 2}} (\begin{matrix} ζ^{2} / 2 + ζ / 2 - 1 & 0 \\ 0 & ζ^{2} / 2 - ζ / 2 \end{matrix}) .

Although most specimens are not piecewise-constant, naturally occurring specimens often have large, nearly constant regions. The Lorentzian-like falloff of the cross spectral density is also reminiscent of natural image priors with power law falloffs [31–33], which can be stochastically approximated using a piecewise-constant “dead leaves” model [34].

We note that in practice, C_mn mismatch results in a noisier phase image, but the reconstructed images are still predictable, as we can examine r_R (p_M(·)) for the transfer function of the imaging system to interpret the results in context. This is in contrast to a regularized reconstruction approach using total variation (TV) [17] or nonlinear diffusion [16], which forces reconstructed phase images to approximate piecewise-constant functions in order to minimize the regularization term. With our approach, we simply use prior statistics as a way to estimate the approximate signal-to-noise ratio at each spectral band in order to allocate measurement resources; our approach does not force compatibility with the prior, and thus our results are also less sensitive to the design of the prior.

3.2. Accomodating tilt in the illumination

For two optical imaging systems whose illumination are identical except for a constant tilt, the optimal measurement–reconstruction pair for one can be adapted for the other. Suppose the two imaging systems’ illumination density functions are related by

ρ_{2} (a, b, ω) = ρ_{1} (a - a_{0}, b - b_{0}, ω) .

Then, the reconstruction operator for the second system is the same as multiplying the input images’ Fourier transform by exp[i2πz_l(ξ_ma₀ + η_nb₀)] and then applying the reconstruction operator for the first system. This is because the tilt induces a linear phase term in P, which can be neutralized by an equal and opposite linear phase term in R. Therefore, in practice, SMaRT allows for a slight tilt in the system, which can be removed using calibration.

3.3. Exploiting symmetry

Directly solving Eq. (37) can present numerical challenges, since the number of unknowns is on the order of 𝒪(LMN), leading to slow convergence and high storage costs. One saving grace is that the solution x₁, . . ., x_L is usually highly redundant. The optical layout of a microscope is often circularly symmetric about the imaging axis, and the spectral covariance of the specimen is circularly symmetric as well, leading to nearly circularly symmetric solutions. Therefore, we propose a modification to Eq. (37) to exploit these symmetries.

Let T ∈ ℝ^(MN−1)×K be an interpolation operator that takes K coefficients and interpolates them linearly into an M × N image, while disregarding the center pixel. In other words, we have a real-valued interpolation map vector t of length MN − 1, and its mn th element is given by 1≤ t_mn ≤ K. We then define the (mn, k)th entry of T to be

T_{(m n, k)} = {\begin{matrix} 1 + ⌊ t_{m n} ⌋ - t_{m n} & if k = ⌊ t_{m n} ⌋ \\ t_{m n} - ⌊ t_{m n} ⌋ & if k = ⌊ t_{m n} ⌋ + 1 \\ 0 & otherwise \end{matrix} .

Let matrices I₂ and I₄ be 2 × 2 and 4 × 4 identity matrices, respectively, and let Q be an equalization operator for T so that TQ has unit singular values and that Q^⊤T^⊤TQ = I; we can easily construct Q from the singular value decomposition of the tridiagonal matrix T^⊤T. With these definitions, we propose an efficient approximation to Eq. (37) by replacing b and the A_ls with b̂ and the

{\hat{A}}_{l}

s given by

\hat{b} = [I_{4} \otimes (Q^{⊤} T^{⊤})] b, {\hat{A}}_{l} = (I_{4} \otimes Q^{⊤}) [(I_{4} \otimes T^{⊤}) A_{l} (I_{2} \otimes T)] (I_{2} \otimes Q),

where ⊗ denotes the Kronecker (tensor) product. We can then solve the resulting problem for an optimal x̂ and compute x_approx = I₂ ⊗ (TQ)x̂; the regularization term in the convex problem remains unchanged because Q^⊤T^⊤TQ = I.

We note that the bracketed expression in ${\hat{A}}_{l}$ ’s definition is sparse and thus can be compactly stored in memory, and only one copy of Q needs to be stored, regardless of the size of L. If T is an operator that interpolates K coefficients into a radially symmetric image, then as long as the illumination pattern ρ(a, b, ω) is radially symmetric about some point (a₀, b₀), the only approximation error incurred is with the pixel function ξ(x, y), which is usually not circularly symmetric. However, in practice this error can be ignored, as the pixel function is usually a rough approximation as well.

3.4. Pruning

While the regularizer in Eq. (37) is a mild sparsity promoter, solutions for large L usually still yield many nonzero τ_ls, some of them very small. This can create two problems in practical applications. First, having too many positions introduces a lot of mechanical motion into the experimental apparatus, which may be undesirable as it can disturb the specimen, gradually cause the apparatus to go out of alignment, and increase the amount of time needed to capture the images, since the image sensor has to wait for any residual motion to subside before exposing. Second, very short exposure times easily violate the assumption of Gaussian noise with variance proportional to average intensity, because quantization noise and readout noise start dominating the desired signal, neither of which scale with average intensity.

We propose a pruning method that, while not yielding a globally optimal solution, still results in a practical measurement–reconstruction design pair. We start by solving Eq. (37) for the full set of possible positions (or terminating after a fixed number of iterations) and then iteratively prune the positions. In each iteration, we remove a small set of positions corresponding to those with the lowest exposure times (or zero exposure), and then recompute Eq. (37) with the reduced set of positions. This is repeated until either: (1) the number of positions is reduced below a certain threshold, or (2) the shortest exposure time is above a certain threshold, or (3) the computed expected error is more than some relative threshold compared to the full position set solution. Since the initial number of unknowns can be large, pruning improves results in practice by speeding up convergence speed, allowing better results in a fixed amount of iterations. A similar approach has been applied successfully in the deep learning literature [35,36]; we can also imagine pruning as a variant of orthogonal matching pursuit [37], wherein we remove basis vectors instead of adding them before recomputing coefficients.

4. Imaging a phase target

We now apply our SMaRT approach to image a standard phase resolution target. For illumination, we use a Newport M-20X objective to focus light from a Thorlabs LED4C1 installed with the amber LED module (nominal wavelength 590 nm) onto a 25 μm pinhole placed at the front focal plane of a 50 mm focal-length plano-convex lens. The collimated light illuminates the specimen, which is then imaged using a Mitsutoyo 10X Plan Apo objective (NA = 0.28) and a 100 mm plano-convex tube lens for a total magnification of 5 × at the intermediate image plane. Both plano-convex lenses have diameters of 25 mm. Images of the propagated intensity were captured using a UI-2240-M camera with pixel pitch 4.65 μm placed on a Newport XMS50 linear stage, with tilt correction followed by cropping to produce 1001 × 1001 pixel images.

For calibrating the camera’s noise level, we took uniform field images at various exposure levels; we assumed 𝒞₁ = 1 and estimated 𝒞₂ ≈ 1.33 × 10⁻⁴. Using a one-second exposure of the target, we obtained an estimate of I₀ ≈ 0.113. We set τ₀ = 3 s and L = 385 for uniformly spaced z positions 250 μm apart between ±24 mm. For our phase imaging prior, we set the average feature size Δ = 59.7 μm and ϕ_max = 0.3 rad. We also pruned down to 7 images on either size of z = 0. Solving Eq. (37) using the proximal gradients algorithm given in [30] with the symmetry modifications as well as pruning yielded a measurement design shown in Fig. 2; computation was via MATLAB, taking ∼23 min on a quad-core Intel Xeon W3520 CPU. The corresponding transfer functions for the real and imaginary parts of the transmission profile, shown in Fig. 3, were computed from the forward model and the optimal reconstruction design.

Fig. 2 Positions and exposure times computed using SMaRT are shown in dark green, with those for GP-TIE shown in tan. Total exposure time for the two methods are the same.

Download Full Size | PDF

Fig. 3 Estimated transfer function for the real (α, left) and imaginary (β, right) components of the specimen s(x, y) are shown in green. The dashed tan line indicates the effective transfer function for comparison images generated using the GP-TIE and TIE methods. A normalized frequency of 1.0 corresponds to the maximum frequency allowable by the numerical aperture and magnification of the system.

Download Full Size | PDF

Although the LED emanates light in a narrow range of wavelengths, its polychromaticity is still an important factor; to see its importance, we also compute a measurement–reconstruction pair (which we will call SMaRT-qm), assuming that the illumination was quasimonochromatic and well approximated by a single wavelength. Furthermore, to facilitate comparison with Gaussian process regression transport of intensity equation phase imaging (GP-TIE) [8], a similar state-of-the-art method, we incorporate an additional set of measurements to capture one image at z = 0 in addition to 14 images at positions exponentially spaced between z = ±250 μm and z = ±24 mm, with exposures at 0.2 s each; this recipe is shown in Fig. 2 as well. For comparison with standard TIE, we took 1 s exposures at z = 0, ±300 μm (TIE short) as well as z = 0, ±10 mm (TIE long). We used code provided by [8] and adjusted the Tikhonov parameter to match the low frequency cutoff of our method, as shown in Fig. 3.

With CTF theory, the closest propagation distance offering the highest contrast for a particular spatial frequency is inversely proportional to the square of the frequency, and both SMaRT and GP-TIE approaches use different propagation distances to capture different sets of spatial frequencies. However, from Fig. 2, SMaRT’s distribution of positions is clearly trimodal compared to the first-principles exponential placement of positions suggested by the GP-TIE approach. It appears that SMaRT chose not to incorporate intermediate positions since they are not as cost efficient as splitting exposure time between a set of positions close to z = 0 and a set of positions far from z = 0 on either side. This sounds surprising at first but can be explained as follows. Since contrast transfer functions are nearly periodic in z, falling off only as a function of partial coherence, the intensity pattern at a position far away from z = 0 contains not only low frequency components, but also partial information about intermediate frequency components. Therefore, we can obviate positions intentionally designed for intermediate frequency components and allocate more time to other positions, reducing noise levels for the remaining images.

To verify our approach, we applied the computed SMaRT measurement–reconstruction pair as well as the GP-TIE and TIE methods to both simulated and experimental data of a phase-only Siemens star spoke pattern.

4.1. Simulation

For ground truth, we computed a 40-spoke Siemens star phase object with a diameter of 2092.5 μm and height of 0.3 rad at the intermediate image plane with an effective sampling interval of 4.65/9 μm and filtered the result to remove all frequency components that cannot be captured due to the NA and magnification of the system before downsampling to a sampling rate equal to the pixel pitch (4.65 μm). Simulated noisy images of size 1001 × 1001 were used to reconstruct phase images using the SMaRT, SMaRT-qm, GP-TIE and TIE approaches. The center 501 × 501 pixel region containing the star is shown in Fig. 4, and two smaller regions of the object are shown magnified in Fig. 5 and Fig. 6 for the SMaRT, SMaRT-qm and GP-TIE approaches. Cross sections of the reconstruction are shown in Fig. 7 (we omit SMaRT-qm for clarity), with the resulting error for the SMaRT and GP-TIE approaches shown in Fig. 8.

Fig. 4 Ground truth and reconstructed phase images of a simulated Siemens star target; root-mean-square error in radians is given in brackets for each reconstruction. Crops “A” and “B” are shown magnified in Fig. 5 and Fig. 6, respectively. Cross sections of the reconstructed phase and reconstruction error along the dashed line “C” are visualized in Fig. 7 and Fig. 8, respectively. The scale bar indicates 500 μm on the intermediate image plane and 100 μm in the specimen.

Download Full Size | PDF

Fig. 5 Magnified crops of the region indicated by “A” in Fig. 4.

Download Full Size | PDF

Fig. 6 Magnified crops of the region indicated by “B” in Fig. 4. The center in the ground truth is a flat circle because of the diffraction limit.

Download Full Size | PDF

Fig. 7 Reconstructed phase of simulated Siemens star along the dashed line “C” in Fig. 4. The ground truth is shown by the gray gradient.

Download Full Size | PDF

Fig. 8 Reconstruction error for simulated Siemens star along the dashed line “C” in Fig. 4.

Download Full Size | PDF

Reconstructed images in Fig. 4 as well as cross sections in Fig. 7 show that standard TIE results in either sharp yet cloudy images (TIE short) or blurry images with good low frequency performance (TIE long). Both SMaRT and GP-TIE leverage diversity to keep both high frequency and low frequency components, yielding significantly better reconstructions. However, the magnified crops in Fig. 5 and Fig. 6 show that SMaRT is more efficient at combating noise, resulting in cleaner spokes in the center of the target as well as flatter phase in uniform regions. Similarly, the rms error values in Fig. 4 and the cross section of the error in Fig. 8 support this observation. Lastly, we see in Fig. 5 and Fig. 6 the importance of considering the polychromaticity of the source, as the SMaRT-qm results—which assume monochromatic illumination—are markedly blurrier than the SMaRT results.

4.2. Experiment

Using the computed measurement design, we captured images of a Benchmark Technologies quantitative phase resolution target, which has a refractive index 1.52 with various binary phase patterns. We imaged the 50 nm height Siemens star, corresponding to a phase height of ∼0.277 rad. Tilt in the illumination was estimated by using the computed reconstruction function r_R(·) on the ±24 μm images individually, finding centroids of the same small point feature in both images and then estimating the slope of the tilt. Images of size 1280 × 1024 from the camera were then adjusted for tilt and cropped to 1001 × 1001 before reconstruction using either the SMaRT, GP-TIE or TIE methods. Reconstructed images, close-ups and cross sections of the phase are shown in Fig. 9, Fig. 10 and Fig. 11, respectively.

Fig. 9 Reconstructed phase images of actual Siemens star. Crops “A” and “B” are shown magnified in Fig. 10. Cross sections of the reconstructed phase along the dashed line “C” are visualized in Fig. 11. The scale bar indicates 500 μm on the intermediate image plane and 100 μm in the specimen.

Download Full Size | PDF

Fig. 10 Crops of reconstructed Siemens star for regions indicated by “A” and “B” in Fig. 9.

Download Full Size | PDF

Fig. 11 Reconstructed phase for the actual Siemens star along the dashed line “C” in Fig. 9.

Download Full Size | PDF

The computed phase from experimental data exhibits behavior similar to the results from the simulated data—both SMaRT and GP-TIE easily outperform single distance TIE reconstructions, but SMaRT has a slight edge in noise performance compared to GP-TIE. The GP-TIE reconstruction appears to be more sensitive to low frequency noise, as seen in Fig. 9. Furthermore, Fig. 10 shows SMaRT reconstructing the flat areas of the phase target with less noise than GP-TIE; the high frequency center area of the target is also cleaner in the SMaRT result.

5. Discussion

We have demonstrated that a SMaRT joint optimization approach can be used to reduce reconstruction error in self-interferometric phase imaging by designing the measurement process jointly with the reconstruction process. Our approach leverages additional information such as the estimated spatial spectral distribution of the specimen class being imaged as well as the spectrum of the polychromatic illumination source to compute an informed decision on how to allocate imaging exposure times, resulting in better noise performance under a fixed photon budget compared to previous techniques.

While we employed a piecewise-constant phase model as a prior, this prior serves only as a guide for estimating the spectral signal-to-noise ratio and does not force the reconstructed phase images to appear piecewise-constant, unlike many other nonlinear methods. Since we do retrieve the real and imaginary parts of the specimen’s transmission profile, our method can be readily adapted to image other weakly-scattering specimens with a different prior, for example replacing standard CTF methods in electron microscopy or X-ray phase imaging, where there may be spatially-varying absorption in addition to optical thickness variations. One can also potentially build a collection of spectral covariance matrices suitable for various imaging scenarios, with more accurate priors when more information is known about the specimen to be imaged.

Although we used a translation stage to image different axial positions, movement can be minimized via judicious use of liquid lenses. Furthermore, while we assumed for simplicity that the objective–tube lens system is diffraction limited, it is trivial to incorporate information about shift-invariant aberrations such as spherical aberration into the forward operator. In general, the resolution of phase images generated by our method is primarily limited by the numerical aperture of the objective lens and any unaccounted optical aberrations in the system.

One path for improvement is that our method cannot be directly applied to arbitrary phase specimens, as large phase differences violate the small-perturbation assumption that we use in estimating noise and modeling the propagated intensity. Automatic differentiation machinery commonly used for deep neural networks, as well as suitable application of transmission cross coefficients from the nonlinear electron microscopy literature, can potentially extend our approach to strong phase objects in the future. Furthermore, we can also directly leverage deep neural networks, converting our model-driven framework to a data-driven framework and solving Eq. (1) using stochastic gradient descent methods.

Appendix A. Optimal exposure allocation subproblem

We give a solution for problems of the form

\underset{v_{1}, \dots, v_{K}}{minimize} \sum_{k = 1}^{K} γ_{k} / v_{k} subject to \sum_{k = 1}^{K} v_{k} \leq v_{0},

where the γ_ks are all positive. First we note that the objective function and constraints are all convex. Furthermore, any solution to the problem must satisfy the Karush-Kuhn-Tucker conditions [38, 39]. The stationarity condition requires that the optimal

v_{k} = \frac{1}{u} \sqrt{γ_{k}}

for some constant u ≥ 0. u cannot be zero because that would make v_k infinite and violate primal feasibility. Thus, complementary slackness requires

\sum_{k} v_{k} = v_{0}

. Therefore, the optimal v_k must be equal to

v_{k} = v_{0} \sqrt{γ_{k}} / \sqrt{γ_{total}}, where \sqrt{γ_{total}} = \sum_{k} \sqrt{γ_{k}} .

This implies that Eq. (43) attains an optimal value of γ_total/v₀.

For our specific problem in Eq. (36), we can write the subproblem in the form of Eq. (43) by setting $γ_{k} = 𝒞_{2} {‖ x_{k} ‖}_{2}^{2} / (𝒞_{1}^{2} I_{0})$ and v_k = τ_k. This yields an optimal value of $γ {(\sum_{l = 1}^{L} {‖ x_{l} ‖}_{2})}^{2}$ , where $γ = 𝒞_{2} / (𝒞_{1}^{2} I_{0} τ_{0})$ . The lth optimal exposure time can be obtained via $τ_{l} = τ_{0} {‖ x_{l} ‖}_{2} / (\sum_{l^{'} = 1}^{L} {‖ x_{l^{'}} ‖}_{2})$ .

Appendix B. Spectral covariance of a piecewise-constant vector function

We consider an ensemble of piecewise-constant two-dimensional real vector-valued functions p(x, y) where the values v for individual pieces of this function are independent and identically distributed (i.i.d.) according to some continuous statistical distribution with density function f_𝒱(v). We define lines through this function using parameters s and θ to specify the line and parameter r to specify position on this line, i.e.,

l (r; s, θ) = p (r cos θ - s sin θ, r sin θ + s cos θ) .

The cross-correlation of the ensemble p(x, y) in polar coordinates is given by

C_{i j} (ρ, θ) = E_{x y} [p_{i} (x, y) p_{j} (x + ρ cos θ, y + ρ sin θ)] = E_{r s} [l_{i} (r; s, θ) l_{j} (r + ρ; s, θ)] .

Given some r for l(r; s, θ), we assume the probability that l(r + ρ; s, θ) is still in the same “piece” and hence has the same value is equal to exp(−ρ/Δ). Since the pieces of this function are i.i.d.,

E_{r s} [l_{i} (r; s, θ) l_{j} (r + ρ; s, θ)] = E (v_{i} v_{j}) exp (- ρ / Δ) + E (v_{i}) E (v_{j}) [1 - exp (- ρ / Δ)]

= E (v_{i}) E (v_{j}) + [E (v_{i} v_{j}) - E (v_{i}) E (v_{j})] exp (- ρ / Δ) .

The (raw) spectral covariance, i.e., the Fourier transform of the cross-correlation, can be computed via the Hankel transform of the exponential, yielding

E (v_{i}) E (v_{j}) δ (f_{x}, f_{y}) + [E (v_{i} v_{j}) - E (v_{i}) E (v_{j})] \frac{2 π Δ^{2}}{{[1 + 4 π^{2} Δ^{2} (f_{x}^{2} + f_{y}^{2})]}^{3 / 2}} .

Appendix C. On the squared sum of norms penalty term

We consider a problem of the form

\underset{x}{minimize} \frac{1}{2} {‖ Ax - b ‖}_{2}^{2} + \frac{1}{2} {(\sum_{i = 1}^{n} w_{i} {‖ x_{i} ‖}_{2})}^{2},

where w_i > 0 are weights, and x₁, . . ., x_n are m-length non-overlapping partitions of x ∈ ℂ^nm. Without loss of generality, we can assume that x consists of the x₁, . . ., x_n vertically concatenated. Let us denote the least squares term f(x) and the penalty term g(x).

We note that g(x) is a convex function of x, since it is the composition of a nondecreasing convex function s(ν) = max{0, ν}² and a convex function, the weighted sum of ℓ₂ norms. Since f(x) is also a convex function, the above problem can be solved by convex optimization. While f(x) is also smooth, g(x) is not smooth, e.g., when one or more of the x_is are zero. This suggests the use of a proximal gradient method [27–29] to solve Eq. (50), and thus we require the proximal operator for g(x), defined as:

{Prox}_{g}^{α} (x) = \underset{y}{arg min} {\frac{1}{2} {‖ y - x ‖}_{2}^{2} + \frac{α}{2} {(\sum_{i = 1}^{n} w_{i} {‖ y_{i} ‖}_{2})}^{2}},

where α > 0, and y₁, . . ., y_n form the same partition of y as the x₁, . . ., x_n do of x. We note the following property of the optimal y and hence y_is:

Theorem 1. The optimal y_i for Eq. (51) are of the form y_i = β_i x_i, where β_i ∈ [0, 1].

Proof. Since the expression in Eq. (51) is also the dual of a convex function, strong duality implies that there exist ∊₁, . . ., ∊_n ≥ 0 such that the optimal y is also a solution to the following constrained optimization problem:

\begin{array}{l} \underset{y}{minimize} {(\sum_{i = 1}^{n} w_{i} {‖ y_{i} ‖}_{2})}^{2} \\ subject to {‖ y_{i} - x_{i} ‖}_{2}^{2} \leq ∊_{i}^{2}, \forall i \in {1, \dots, n} . \end{array}

Since s(ν) = max{0, ν}² is nondecreasing, the above problem is equivalent to:

\begin{array}{l} \underset{y}{minimize} \sum_{i = 1}^{n} w_{i} {‖ y_{i} ‖}_{2} \\ subject to {‖ y_{i} - x_{i} ‖}_{2}^{2} \leq ∊_{i}^{2}, \forall i \in {1, \dots, n} . \end{array}

As now the function to be minimized is separable and the constraints are separable as well, the optimal value of each y_i can be solved separately via:

\underset{y_{i}}{minimize} {‖ y_{i} ‖}_{2} subject to {‖ y_{i} - x_{i} ‖}_{2} \leq ∊_{i},

which has closed form solution

y_{i} = max {0, \frac{{‖ x_{i} ‖}_{2} - ∊_{i}}{{‖ x_{i} ‖}_{2}}} x_{i}

.

Therefore, in order to solve Eq. (51), we can instead solve the following:

\begin{array}{l} \underset{β_{1}, \dots, β_{n}}{minimize} \sum_{i = 1}^{n} \frac{1}{2} {(β_{i} - 1)}^{2} {‖ x_{i} ‖}_{2}^{2} + \frac{α}{2} {(\sum_{i = 1}^{n} w_{i} β_{i} {‖ x_{i} ‖}_{2})}^{2} \\ subject to β_{i} \geq 0, \forall i \in {1, \dots, n} . \end{array}

Define γ₁, . . ., γ_n to be elements of vector γ where γ_i = β_i‖x_i‖₂, and likewise ξ₁, . . ., ξ_n to be elements of vector ξ where ξ_i = ‖x_i‖₂. Hence, the above problem is equivalent to the following nonnegative least squares formulation:

\underset{γ}{minimize} γ^{⊤} γ - 2 ξ^{⊤} γ + α γ^{⊤} w w^{⊤} γ subject to γ ≽ 0 .

Once the optimal γ are obtained, we can then compute the optimal y_is. While any nonnegative least squares algorithm can be used to solve Eq. (56), we note the following property about our specific problem:

Lemma 1. Let γ̂ denote the nonzero components of the optimal γ for Eq. (56), and let ξ̂ and ŵ denote the corresponding elements of ξ and w, respectively. Then, the γ̂ must satisfy:

\hat{γ} = \hat{ξ} - s_{★} \hat{w}, where s_{★} = ({\hat{w}}^{⊤} \hat{ξ}) / (α^{- 1} + {\hat{w}}^{⊤} \hat{w}) .

Proof. Since the zero-valued entries of γ annihilate corresponding components of the expression to be minimized in Eq. (56), we can reduce that expression to

l (\hat{γ}) = {\hat{γ}}^{⊤} \hat{γ} - 2 {\hat{ξ}}^{⊤} \hat{γ} + α {\hat{γ}}^{⊤} \hat{w} {\hat{w}}^{⊤} \hat{γ} = {\hat{γ}}^{⊤} W \hat{γ} - 2 {\hat{ξ}}^{⊤} \hat{γ}

= {‖ W^{1 / 2} \hat{γ} - W^{1 / 2} \hat{ξ} ‖}_{2}^{2} + 𝒪 (1),

where W = I + αŵŵ^⊤. Since α > 0, W is invertible, and thus l(γ̂) has minimizer

\hat{γ} = (I - \frac{α \hat{w} {\hat{w}}^{⊤}}{1 + α {\hat{w}}^{⊤} \hat{w}}) \hat{ξ},

via the Sherman-Morrison formula, and minimal value equal to −ξ̂^⊤γ̂.

We now consider the following theorem regarding the solution to Eq. (56):

Theorem 2. Consider a vector γ ≽ 0; let 𝒫 ⊆ {1, . . ., n} denote the indexes of entries that are positive. Then, γ is a solution to Eq. (56) if and only if the following statements are true:

γ_{i} = ξ_{i} - s_{★} w_{i}, \forall i \in 𝒫 .

ξ_{i} \leq s_{★} w_{i}, \forall i \notin 𝒫 .

where s_★ is as it is defined in Lemma 1.

Proof. By inspection, we note that αŵ^⊤γ̂ = αw^⊤γ = s_★, based on the definition of s_★ and the structure of γ. Thus, the first and second statements are equivalent to

\hat{γ} = α \hat{w} w^{⊤} γ = \hat{ξ}, α \bar{w} w^{⊤} γ ≽ \bar{ξ},

where w̄ and ξ̄ are entries of w and ξ, respectively, that correspond to the zero-valued entries of γ. Thus, both statements being true is equivalent to:

0 = (I + α w w^{⊤}) γ - ξ - u,

where the entries u_i of u are zero for i ∈ 𝒫 and nonnegative for i ∉ 𝒫. This equation itself is equivalent to the stationarity condition of the Karush-Kuhn-Tucker (KKT) conditions, and the structure of u is equivalent to the combination of complementary slackness and dual feasibility. Primal feasibility is satisfied by the nonnegativity of γ assumed by the theorem. Thus, satisfying the two statements of the theorem under the assumed conditions is equivalent to the KKT conditions. Since the primal problem exhibits strong duality, γ is a solution to Eq. (56) if and only if it satisfies the two statements.

Without loss of generality, let’s assume that

ξ_{1} / w_{1} \geq ξ_{2} / w_{2} \geq \dots \geq ξ_{n} / w_{n},

since we can sort and unsort the entries as needed if this assumption doesn’t hold. Under this assumption, we have the following corollary:

Corollary 1. If Eq. (65) holds and γ is a solution to Eq. (56), then 𝒫 = {1, . . ., n_★} for some 0 ≤ n_★ ≤ n, with n_★ = 0 if and only if ξ = 0.

Proof. 𝒫 is a contiguous range of indexes via a combination of Eq. (62) and Eq. (65). If n_★ is zero, then 𝒫 = ∅ and hence 0 ≽ ξ. Since the elements of ξ are norms, the only possible conclusion is that ξ = 0. Now, suppose ξ = 0, then since s_★w_i ≥ 0 by construction, no γ_i > 0 can exist via Eq. (61) and thus 𝒫 = ∅.

With these observations, we propose Algorithm 1 for computing Eq. (51). This algorithm has asymptotic complexity equal to 𝒪(mn + n²), where the 𝒪(mn) part is due to computing the ℓ₂ norms in step 3 and rescaling in step 18, whereas the 𝒪(n²) part is due to the while loop of at most n iterations wherein each iteration requires inner products of vectors of at most length n. Sorting the elements only takes 𝒪(n log n) time and is dominated by the other operations.

Algorithm 1. Proximal operator for squared (weighted) sum of norms.

View Table

Funding

National Research Foundation Prime Minister’s Office Singapore (CREATE Programme Singapore-MIT Alliance for Research and Technology BioSystems and Micromechanics IRG).

Acknowledgments

The authors would like to thank Colin Sheppard, Jon Petruccelli and Chenglong Bao for their helpful discussions.

References

1. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of the phase from image and diffraction plane pictures,” Optik 35, 237–246 (1972).

2. J. R. Fienup, “Phase retrieval algorithms: A comparison,” Appl. Opt. 21, 2758–2769 (1982). [CrossRef] [PubMed]

3. E. J. Candès, Y. C. Eldar, T. Strohmer, and V. Voroninski, “Phase retrieval via matrix completion,” SIAM J. Imaging Sci. 6, 199–225 (2013). [CrossRef]

4. M. R. Teague, “Deterministic phase retrieval: a Green’s function solution,” J. Opt. Soc. Am. 73, 1434–1441 (1983). [CrossRef]

5. N. Streibl, “Phase imaging by the transport equation of intensity,” Opt. Commun. 49, 6–10 (1984). [CrossRef]

6. A. Barty, K. A. Nugent, D. Paganin, and A. Roberts, “Quantitative optical phase microscopy,” Opt. Lett. 23, 817–819 (1998). [CrossRef]

7. L. Waller, L. Tian, and G. Barbastathis, “Transport of intensity phase-amplitude imaging with higher order intensity derivatives,” Opt. Express 18, 12552–12561 (2010). [CrossRef] [PubMed]

8. J. Zhong, R. A. Claus, J. Dauwels, L. Tian, and L. Waller, “Transport of intensity phase imaging by intensity spectrum fitting of exponentially spaced defocus planes,” Opt. Express 22, 10661–10674 (2014). [CrossRef]

9. J.-P. Guigay, “Fourier transform analysis of Fresnel diffraction patterns and in-line holograms,” Optik 49, 121–125 (1977).

10. Y. I. Nesterets and T. E. Gureyev, “Partially coherent contrast-transfer-function approximation,” J. Opt. Soc. Am. A 33, 464–474 (2016). [CrossRef]

11. J. C. Petruccelli, L. Tian, and G. Barbastathis, “The transport of intensity equation for optical path length recovery using partially coherent illumination,” Opt. Express 21, 14430–14441 (2013). [CrossRef] [PubMed]

12. T. Chakraborty and J. C. Petruccelli, “Source diversity for transport of intensity phase imaging,” Opt. Express 25, 9122–9137 (2017). [CrossRef] [PubMed]

13. B. Xue, S. Zheng, L. Cui, X. Bai, and F. Zhou, “Transport of intensity phase imaging from multiple intensities measured in unequally-spaced planes,” Opt. Express 19, 20244–20250 (2011). [CrossRef] [PubMed]

14. S. Zheng, B. Xue, W. Xue, X. Bai, and F. Zhou, “Transport of intensity phase imaging from multiple noisy intensities measured in unequally-spaced planes,” Opt. Express 20, 972–985 (2012). [CrossRef] [PubMed]

15. J. Sun, C. Zuo, and Q. Chen, “Iterative optimum frequency combination method for high efficiency phase imaging of absorptive objects based on phase transfer function,” Opt. Express 23, 28031–28049 (2015). [CrossRef] [PubMed]

16. L. Tian, J. C. Petruccelli, and G. Barbastathis, “Nonlinear diffusion regularization for transport of intensity phase imaging,” Opt. Lett. 37, 4131–4133 (2012). [CrossRef] [PubMed]

17. L. Tian, J. C. Petruccelli, Q. Miao, H. Kudrolli, V. Nagarkar, and G. Barbastathis, “Compressive x-ray phase tomography based on the transport of intensity equation,” Opt. Lett. 38, 3418–3421 (2013). [CrossRef] [PubMed]

18. E. R. Dowski Jr. and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. 34, 1859–1866 (1995). [CrossRef] [PubMed]

19. W. T. Cathey and E. R. Dowski, “New paradigm for imaging systems,” Appl. Opt. 41, 6080–6092 (2002). [CrossRef] [PubMed]

20. M. D. Stenner, A. Ashok, and M. A. Neifeld, “Multi-domain optimization for ultra-thin cameras,” in Frontiers in Optics, (Optical Society of America, 2006), p. FWH5. [CrossRef]

21. E. J. Tremblay, J. Rutkowski, I. Tamayo, P. E. X. Silveira, R. A. Stack, R. L. Morrison, M. A. Neifeld, Y. Fainman, and J. E. Ford, “Relaxing the alignment and fabrication tolerances of thin annular folded imaging systems using wavefront coding,” Appl. Opt. 46, 6751–6758 (2007). [CrossRef] [PubMed]

22. J. Frank, Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State (Oxford University, 2006). [CrossRef]

23. M. Vulović, L. M. Voortman, L. J. van Vliet, and B. Rieger, “When to use the projection assumption and the weak-phase object approximation in phase contrast cryo-EM,” Ultramicroscopy 136, 61–66 (2014). [CrossRef]

24. E. Wolf, “New theory of partial coherence in the space-frequency domain. Part I: spectra and cross spectra of steady-state sources,” J. Opt. Soc. Am. 72, 343–351 (1982). [CrossRef]

25. L. Mandel and E. Wolf, “Spectral coherence and the concept of cross-spectral purity,” J. Opt. Soc. Am. 66, 529–535 (1976). [CrossRef]

26. A. Goy, K. Arthur, S. Li, and G. Barbastathis, “Low photon count phase retrieval using deep learning,” arXiv preprint arXiv:1806.10029 (2018).

27. Y. Nesterov, “Gradient methods for minimizing composite objective function,” CORE Discussion Papers 2007076, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE) (2007).

28. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci. 2, 183–202 (2009). [CrossRef]

29. B. O’Donoghue and E. Candès, “Adaptive restart for accelerated gradient schemes,” Found. Comp. Math. 15, 715–732 (2015). [CrossRef]

30. C. Bao, G. Barbastathis, H. Ji, Z. Shen, and Z. Zhang, “Coherence retrieval using trace regularization,” SIAM J. Imaging Sci. 11, 679–706 (2018). [CrossRef]

31. G. J. Burton and I. R. Moorhead, “Color and spatial structure in natural scenes,” Appl. Opt. 26, 157–170 (1987). [CrossRef] [PubMed]

32. D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A 4, 2379–2394 (1987). [CrossRef] [PubMed]

33. D. J. Tolhurst, Y. Tadmor, and T. Chao, “Amplitude spectra of natural images,” Ophthalmic Physiol. Opt. 12, 229–232 (1992). [CrossRef] [PubMed]

34. A. B. Lee, D. Mumford, and J. Huang, “Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model,” Int. J. Comput. Vis. 41, 35–59 (2001). [CrossRef]

35. Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in Advances in Neural Information Processing Systems 2, D. S. Touretzky, ed. (Morgan-Kaufmann, 1990), pp. 598–605.

36. P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” in Proceedings of the International Conference on Learning Representations (ICLR), (2015).

37. Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, (1993), pp. 40–44 vol.1. [CrossRef]

38. W. Karush, “Minima of functions of several variables with inequalities as side constraints,” Ph.D. thesis, University of Chicago (1939).

39. H. W. Kuhn and A. W. Tucker, “Nonlinear programming,” in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, (University of California, 1951), pp. 481–492.

Simultaneous measurement and reconstruction tailoring for quantitative phase imaging

Abstract

1. Introduction

2. Forward propagation model

2.1. Thin specimen parametrization

2.2. Noiseless propagated intensity for coherent illumination

2.3. Noiseless propagated intensity for partially coherent illumination

2.4. Noisy imaging model

3. Formulation as a convex optimization problem

3.1. Estimating the covariance for phase imaging

3.2. Accomodating tilt in the illumination

3.3. Exploiting symmetry

3.4. Pruning

4. Imaging a phase target

4.1. Simulation

4.2. Experiment

5. Discussion

Appendix A. Optimal exposure allocation subproblem

Appendix B. Spectral covariance of a piecewise-constant vector function

Appendix C. On the squared sum of norms penalty term

Funding

Acknowledgments

References

Cited By

Figures (11)

Tables (1)

Equations (65)

Optics Express