## Abstract

In order to efficiently image a non-absorbing sample (a phase object), dedicated phase contrast optics are required. Typically, these optics are designed with the assumption that the sample is weakly scattering, implying a linear relation between a sample’s phase and its transmission function. In the strongly scattering, nonlinear case, the standard optics are ineffective, and the transfer functions used to characterize them are uninformative. We use the Fisher information (FI) to assess the efficiency of various phase imaging schemes and to calculate an information transfer function (ITF). We show that a generalized version of Zernike phase contrast is efficient given sufficient prior knowledge of the sample. We show that with no prior knowledge, a random sensing measurement yields a significant fraction of the available information. Finally, we introduce a generalized approach to common path interferometry that can be optimized to prioritize sensitivity to particular sample features. Each of these measurements can be performed using Fourier lenses and phase masks.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. INTRODUCTION

In a phase contrast microscope, transparent objects are imaged using optics that convert phase variations to amplitude variations. This modality is important for visible light (notably, biological samples), x rays [1–3], and electrons [4]. While some phase contrast is intrinsic in systems with a limited numerical aperture (NA) [5] and more can be generated by adding defocus [6], much of the information about the sample phase shift can only be accessed with dedicated optics. Zernike developed the first method for optically generated phase contrast using a phase-shifting filter in the backfocal plane of the objective lens (or some conjugate plane) [7,8]. Zernike phase contrast (ZPC) is particularly effective for imaging weak phase objects (WPOs), which have transmission functions close to unity. A probe passed through a WPO will retain a strong undiffracted component, which can be used as an interferometric reference. Many phase contrast applications involve phase objects for which the WPO approximation (WPOA) is dubious [9,10]. For these applications, ZPC is only partially effective. When the undiffracted component of the beam is entirely depleted, for example, due to a strongly scattering sample matrix, then ZPC produces no contrast at all.

Some common phase contrast methods are compatible with strongly scattering samples, for example, the class of schemes sensitive to phase gradients, which include differential interference contrast [11], Hoffmann modulation contrast [12], and spiral phase contrast [13]. However, these techniques are insensitive to low spatial frequency features, making them sub-optimal for some measurements. When it is possible to establish a reference channel that circumvents the sample, then quantitative phase contrast [14,15] and various versions of holography [16–18] are possible. These generate some contrast for phase objects of any strength, and their limitations are not as obvious.

Comparing the effectiveness of these methods is especially difficult outside of the WPOA. In the strong scattering regime, the imaging process remains linear with respect to the sample transmission function but becomes nonlinear with respect to the sample phase. As a result of this nonlinearity, the performance of the imaging system will depend on the joint properties of the optics and the particular sample. One example of an attempt to move beyond the WPOA is generalized phase contrast (GPC) [19]. GPC, like ZPC, uses the undiffracted probe component as a reference wave. The relative phase and extinction applied to the reference wave can be optimized based on prior knowledge about the sample to maximize the visibility (contrast) or peak irradiance or to establish an unambiguous phase-to-intensity mapping. While GPC avoids invoking the WPOA, it still relies on a strong undiffracted component in the exit wave function. To form an even more general theory of phase imaging, we must consider a wider class of measurements.

To this end, we recast the imaging process as a many-parameter estimation problem and employ the Fisher information (FI) to optimize it. While the FI is a prominent tool for experimental design, especially in the field of optics, it is usually applied to optimize measurements of one or a few image parameters. To apply it in a more general imaging scenario with $n$ unknown parameters, we must compute the ${n^2}$ elements of the FI matrix (FIM) describing all of the parameters and their correlations. For even modestly sized images ($n \gtrsim 100$), the FIM is expensive to calculate, let alone to optimize over all possible measurements. As the optimum may depend on the particular sample, it will be critical to develop efficient heuristics rather than to attempt an explicit optimization for each measurement. Implementing the measurements will require programmable optics. Such technology is available for optical microscopy (i.e., spatial light modulators) and is newly emerging for electron microscopy [20].

In the field of quantum metrology, the FI maximized over all measurements permitted by quantum mechanics is called the quantum Fisher information (QFI [21–23]). The measurement that achieves this maximum generally depends on the values of the parameters being measured. This is no obstacle to many QFI applications, where the goal is to make increasingly precise measurements of an already well-characterized parameter. In the limit of high measurement resources, which we will call the asymptotic regime, we can efficiently measure an unknown parameter by allocating a negligible fraction of the resources for pre-estimation. In situations where measurement resources are limited, which we will call the Bayesian regime, the optimal measurement may depend strongly on the prior knowledge of the parameters [24]. A measurement sequence in the Bayesian regime will ideally be adaptive so that each measurement is refined using information gathered from previous measurements. Rather than considering the properties of such a measurement sequence, we will focus on optimizing an individual measurement.

In order to constrain the scope of this project, we make several simplifying assumptions. We assume the sample is a pure phase object with a negligible depth of field, that the measurement is performed with a deterministic source of unentangled scalar particles (i.e., polarization/spin degrees of freedom are not considered), and that the dominant source of noise is projection noise.

In the next section, we motivate the transition from contrast to information and introduce the relevant FI formalism. In Section 3, we apply this formalism in the asymptotic regime to the idealized scenario of multi-phase estimation (MPE) where arbitrary, lossless transformations can be applied to the exit wave function. This perspective helps to clarify the value of a reference channel when projection noise alone limits the measurement efficiency. In Section 4, we explore MPE in the Bayesian regime, where prior knowledge of the sample becomes a second limitation. We will develop a suite of methods for phase measurements of strongly scattering samples that are effective for various optimization priorities and levels of prior knowledge. Finally, in Section 5, we restrict the optimization to a set of measurements that can be implemented with only a few optical components and take into account the limited NA of the objective lens.

## 2. FROM CONTRAST TO INFORMATION

The properties of a linear optical system can be described by the point spread function or by its Fourier transform, the optical transfer function. The complex modulus of the optical transfer function is called the modulation transfer function or the contrast transfer function (CTF, especially in electron microscopy [4]). The CTF characterizes the frequency-dependent efficiency with which the system transports information from the sample to the detector. For absorbing imaging targets, spatial resolution (limited by lens aberrations and the NA) is often the primary concern. For phase-shifting imaging targets, the CTF expresses another important limitation: how efficiently the optics convert phase variations to intensity variations.

The CTF is insufficient for describing the properties of the transfer optics when there is not a one-to-one correspondence between spatial frequencies in the sample phase shift and spatial frequencies in the detected intensity. For example, a strong sinusoidal phase grating, unlike an amplitude grating, diffracts to many orders. We might account for this by replacing the conventional CTF with a scattering matrix (or vector-valued function), which, for each spatial frequency ${\vec q_k}$ in the sample, gives the resulting intensity contrast at each spatial frequency ${\vec q_j}$ at the detector. However, it is not obvious how to condense this data into a figure of merit for optimizing the optics. Instead, we will optimize using a cost function, which can be constrained by the FI. The FI formalism also provides a quantum limit for the minimal cost, which can be used as an optimization benchmark.

We assume the sample can be discretized into $n$ regions with unknown phase shifts $\{{\phi _k}\} _{k = 0}^{n - 1}$ at positions ${\vec r_k}$, which form a regular, square array. Let $\Theta = [{\theta _0},{\theta _1},\ldots,{\theta _{n - 1}}]$ be a vector of linear parameters so ${\phi _k} = \sum\nolimits_a {\theta _a}v_k^{(a)}$ with orthonormal basis vectors $\{{v^{(a)}}\} _{a = 0}^{n - 1}$. We will often use a “phase grating basis,” where $v_k^{(a = 0)} = 1/\sqrt n$, and for $a \gt 0$,

The set $A$, the normalization constant ${c_a}$, and the indexing system for the spatial frequencies ${\vec q_a}$ are defined in Supplement 1 (Section 1).

Figure 1 shows the general scheme
for estimating $\Theta$. It begins with the preparation of a
single-particle probe $| P \rangle =
\sum\nolimits_{k = 0}^{n - 1} {\alpha _k}| {{{\vec r}_k}}
\rangle$ (with $\sum\nolimits_k |{\alpha
_k}{|^2} = 1$). The action of the sample on the probe
is described by the operator $\hat \Phi (\Theta) =
\sum\nolimits_k {\Phi _k}(\Theta)| {{{\vec r}_k}} \rangle \langle
{{{\vec r}_k}} |$, where ${\Phi _k}(\Theta) = \exp
(i{\phi _k}(\Theta))$. Note that $\Theta$ is a *nonlinear* parameterization of $\hat \Phi
(\Theta)$. The transfer optics that relay the exit
wave function $| {\psi (\Theta)} \rangle =
\hat \Phi (\Theta)| P \rangle$ to the detector are represented by the
operator $\hat T = \sum\nolimits_{j,k}
{T_{j,k}}| {{{\vec s}_j}} \rangle \langle {{{\vec r}_k}} |$, where the index $j$ labels pixels on the detector at
locations ${\vec s_j}$. The intensity at the detector is

Minimizing the expected cost involves choosing an optimal probe wave function $P$, transfer function $T$, and simultaneously an optimal estimator $\bar \Theta$. However, with the choice of the quadratic cost function, the Cramer–Rao bound (CRB) provides a simple way to calculate the lowest achievable variance of any (unbiased) estimator [25]. This makes it possible to optimize the measurement without finding the optimal estimator. The CRB can be expressed as a matrix or scalar bound:

If we make the dependence of FI on the transfer function $T$ explicit by writing ${\cal I}(\Theta ,T)$, then the QFI can be expressed as

The CRB applied to the QFI is called the quantum CRB (QCRB) [22]. Since the QFI is independent of $T$, it can be considered a measure of the “information” about ${\theta _a}$ available in the exit wave function. To be precise, the QFI describes the variance-reducing power of a measurement and has units of $1/\theta _a^2$ rather than entropy (bits), which is generally considered a more elementary measure of information. Nevertheless, QFI is seen as a fundamental quantity in quantum metrology.Multi-parameter measurements are limited by a quantum FIM (QFIM), which is larger than the FIM (in the positive semidefinite sense) for any particular measurement. Whereas the quantum information limit is always attainable in the single parameter case, the matrix bound for multiple parameters may not be attainable when the parameters are associated with incompatible observables [26]. For example, the limited NA of an objective lens restricts the transverse momentum of the exit wave function, thereby performing a counterfactual measurement incompatible with spatial measurements of the sample.

Another factor that may prevent the saturation of the QCRB is the level of prior knowledge. We can express our prior knowledge of $\Theta$ using a probability distribution $\lambda (\Theta)$. We will mainly consider two types of prior distributions. The first kind can be explicitly written as a joint distribution over independent distributions for each parameter. For example, the WPOA could be incrementally relaxed by setting $\lambda (\Theta) = \prod\nolimits_a {\cal N}({\theta _a};{\sigma ^2})$, where each ${\theta _a}$ is drawn from an independent normal distribution with zero mean and variance ${\sigma ^2} \ll 1$. Outside of the WPOA, a plausible source of prior knowledge is a known diffraction pattern $\Lambda$. To define $\lambda$, we apply the principle of indifference and assume a uniform distribution over all phase objects with the same diffraction pattern. We generally cannot express $\lambda$ in closed form, and we say it is induced by $\Lambda$. To sample $\lambda$, we can apply the Gerchberg–Saxton algorithm [27] using $\Lambda$ and the known probe intensity.

Given $\lambda$, the goal is to minimize the weighted average of the expected variance. The expected measurement cost is

Various Bayesian versions of the QCRB exist that hold for arbitrary amounts of data but cannot necessarily be saturated [30–32]. They have been used, for example, to design quantum schemes for estimating waveforms where prior information was critical for developing the initial parameterization [33]. To promote the van Trees bound to a quantum bound, we should maximize ${\langle {{\cal I}(\Theta ,T)} \rangle _\lambda}$ over all possible measurements. To form a tight upper bound, the maximization should be done after taking the expectation value, which produces the quantum van Trees information [31]. In general, the bound can only be computed by finding the specific measurement that achieves it. Instead we will use generalized QFI (GQFI) ${\cal Z}$ [31,34], which is obtained by simply replacing ${\langle {{\cal I}(\Theta ,T)} \rangle _\lambda}$ with ${\langle {\cal J} \rangle _\lambda}$:

While this bound is typically unattainable, it is generally easier to calculate and thus more suitable as an optimization benchmark. Caution is warranted in interpreting these bounds, as it may not be simple or even possible to devise an efficient estimator (one which saturates the bound) with limited information. We may regard the lower bound on ${\langle C \rangle _\lambda}$ as the value we would assign a measurement in retrospect, after collecting enough information to accurately estimate $\Theta$.

When using a uniform weighting $W = {\mathbb I}$, the cost is minimized by prioritizing sensitivity to parameters with large prior variance. This is sometimes undesirable. Suppose the sample consists of a WPO $\Phi ({\Theta _f})$ embedded in a strongly scattering matrix $\Phi ({\Theta _b})$. The combined transmission function is $\Phi (\Theta) = \Phi ({\Theta _f})\Phi ({\Theta _b}) = \Phi ({\Theta _f} + {\Theta _b})$. We will call ${\Theta _f}$ the foreground and ${\Theta _b}$ the background and assume the corresponding prior distributions, ${\lambda _f}({\Theta _f})$ and ${\lambda _b}({\Theta _b})$, are independent, so $\lambda (\Theta) = {\lambda _f}({\Theta _f}){\lambda _b}({\Theta _b})$. Since ${\lambda _b}$ contains larger variances, a measurement optimized using cost function defined in Eq. (8) will be tailored for measuring the background. We could attempt to find a non-uniform weighting to increase the cost of foreground error, but it is not obvious how to choose the weights. In Supplement 1 (Section 3), we derive a van Trees-like bound on the cost function for variance reduction in the foreground:

A similar bound can be used to account for nuisance parameters. In that context, ${\cal I}(\Theta ,T) - \Delta {\cal I}$ is called the partial FIM [35]. Suppose ${\cal I}({\lambda _b}) = \eta {}\mathbb I$. In the limit where $\eta \to 0$ (complete ignorance of the background), $\Delta {\cal I} \to {\langle {{\cal I}(\Theta ,T)} \rangle _\lambda}$, so the measurement cost is constant (no information can be gained about the foreground). In the limit where $\eta \to \infty$ (complete knowledge of the background), $\Delta {\cal I} \to 0$ and $\lambda \to {\lambda _f}$, so the bound on ${{\cal C}_f}$ becomes identical to the standard van Trees bound. This cost function tends to prioritize sensitivity to parameters that have a small prior background variance. A lower bound on this cost function for all possible measurements is obtained by replacing ${\langle {I(\Theta ,T)} \rangle _\lambda}$ with ${\cal J}$.

While the cost functions are useful for optimization, they provide little insight into the properties of a particular measurement. For this purpose, it will be useful to define an information transfer function (ITF) that describes the information gained about each parameter. The diagonal of the FIM is not sufficient for this purpose, as it does not account for correlations between parameters, both from the prior distribution and the measurement. Instead we define the ITF as the decrease variance achieved for each parameter (determined using the van Trees bound) relative to the decrease in variance allowed by the GQFI:

We will use a similar formulation to evaluate optimization outcomes in terms of the decrease in cost indicated by the van Trees bound relative to the maximum decrease allowed by the GQFI. For the cost function in Eq. (8),

## 3. MULTI-PHASE ESTIMATION IN THE ASYMPTOTIC REGIME

Before applying the above to phase imaging with optics with limited NA, we will consider the more idealized scenario of MPE to clarify some of the fundamental limitations of phase imaging with various amounts of prior knowledge. MPE is a well-studied problem in the field of quantum metrology [36,37]. Instead of free space modes, the probe states of MPE occupy $n$ discrete channels upon which we can apply arbitrary (lossless) transformations. In order to keep close analogy with phase imaging, we will imagine the channels are arranged in a grid so we may parameterize the $\phi$ in terms of its 2D spatial frequency components. The QFIM is often used to optimize the probe state for MPE. Our main purpose in calculating the QFIM will be to determine the maximum FIM for a uniform-intensity probe and to assess the value of including a reference channel with known phase shift.

We will assume the probe is a pure, single-particle state with amplitude ${\alpha _j}$ in channel $j$ and amplitude $\beta$ in the reference channel ($\sum\nolimits_j \alpha _j^2 + {\beta ^2} = 1$). The QFIM for a pure state $\psi$ can be written explicitly [21]:

For the case where $n = 1$, it is simple to verify that ${\cal J} = 1$ and a measurement that achieves this limit can be performed with a Mach–Zehnder interferometer (MZI) with ${\alpha ^2} = {\beta ^2} = 1/2$. A natural guess for an efficient measurement for $n \gt 1$ is to divide the probe evenly among $n$ parallel MZIs (so half of the total probe intensity still passes through the reference arm). The FIM for this measurement is ${\cal I} = {\mathbb I}/n$, so the total information is ${\rm{Tr}}({\cal I}) = 1$, and the bound on the total variance from the CRB is ${\cal C} = {\rm{Tr}}({{\Sigma _{\bar \Theta}}}) \ge {\rm{Tr}}({{{\cal I}^{- 1}}}) = {n^2}$. However, the quantum limit is superior: ${\rm{Tr}}({\cal J}) = 4n/(1 + \sqrt n {)^2}$ and ${\cal C} \ge n{(1 + \sqrt n)^2}/4$. This implies there is some advantage to simultaneous parameter estimation (if we allow $\psi$ to be a multi-particle entangled state, then this relative advantage is even more pronounced [36]).

In order to explain the advantage of simultaneous estimation, it is helpful use a parameterization that diagonalizes ${\cal J}$. If we use a uniform probe ${\alpha _j} = \alpha$, the QFIM has two distinct parameter eigenspaces. One corresponds to the average phase shift ${\theta _0} = \frac{1}{n}\sum\nolimits_k {\phi _k}$ and has eigenvalue ${{\cal J}_0} = 4{\alpha ^2}{\beta ^2} \le 1/n$, which is maximized using probe amplitudes ${\beta ^2} = n{\alpha ^2} = 1/2$. The other eigenspace has rank $n - 1$ and contains information about all parameters independent of ${\theta _0}$. Its eigenvalue is ${{\cal J}_ \bot} = 4{\alpha ^2} \le 4/n$, which achieves its largest value when $\beta = 0$. Thus, all but one of the degrees of freedom can be measured optimally without a reference channel (see [38] for an analysis of quantum MPE without a reference channel), and the total variance is minimized by setting ${\beta ^2} \sim 0$ for large $n$ (explicitly, ${\beta ^2} = \sqrt n /(n + \sqrt n) \sim 1/\sqrt n$). Simultaneous estimation schemes have an advantage, then, because they are able to invest more in the ${{\cal J}_ \bot}$ eigenspace, where fewer measurement resources are required to achieve the same variance reduction.

In some microscopy applications, the relevant measurement resource is the total dose, $d = n{\alpha ^2}$. In this case, we should maximize ${\cal J}/d$. The matrix eigenvalues become ${{\cal J}_0} = 4{\beta ^2}/n$ and ${{\cal J}_ \bot} = 4/n$. This consideration does not change the conclusion that the reference channel is not helpful in the ${{\cal J}_ \bot}$ eigenspace. Indeed, there are often practical advantages in dispensing with the reference channel. In many imaging applications, the value of $\langle \phi \rangle$ is irrelevant (e.g., the thickness of the sample matrix) and may even be considered a nuisance parameter. For example, when an imaging system has multiple optical axes, their relative phase stability becomes an added engineering challenge [39]. We will proceed under the assumption that $\langle \phi \rangle$ is an extraneous parameter and specialize to measurements that are in-line (lacking a reference channel) and, therefore, only sensitive to the ${{\cal J}_ \bot}$ eigenspace (we will suppress the $\bot$ subscript in the future). We will also continue to assume that the probe amplitude is uniform across the channels. With these assumptions, ${\cal J} = (4/n){\mathbb I}$. Since ${\cal J}$ is a scalar matrix, it is invariant to reparameterization: a measurement that achieves ${\cal J}$ is optimal for estimating any individual parameter or set of parameters.

## 4. MULTI-PHASE ESTIMATION WITH LIMITED PRIOR KNOWLEDGE

We will now discuss several types of in-line measurements that are useful with various levels of prior knowledge. We will assume the measurements are projective so that the transfer function can be represented by a unitary matrix $T$ (this precludes measurement schemes that use multiple detectors to make non-commuting measurements). We can factorize $T$ as

where $M$ is a diagonal matrix of unit-norm eigenvalues ${M_{g,g}} = \exp (i{\mu _g})$ and $U$ (a unitary matrix) will be called the interferometric basis (see Fig. 2). A sufficient condition for $T$ to obtain the QFIM is if $U$ concentrates all of the intensity in the exit wave function into a single basis vector. This is possible in the asymptotic regime [where $\Phi (\Theta)$ is known] by setting $U = {\cal F}{\Phi ^{- 1}}$, where ${\cal F}$ is the Fourier transform matrix and ${\Phi ^{- 1}}$ is the inverse of the sample transmission function. Then it is simple to show ${\cal I}(\Theta ,T) = {\cal J}$ if ${\mu _{g = 0}} = \pi /2$ and ${\mu _{g \gt 0}} = 0$. For WPOs (${\Phi ^{- 1}} \sim 1$), this measurement is equivalent to ZPC. The reparameterization-invariance of ${\cal J}$ implies that ZPC performs optimally for measuring any feature of a WPO. It is interesting to note that while a general projective measurement of a state with $n$ degrees of freedom is described by a unitary transform with ${n^2}$ real parameters, an optimal measurement can be performed using only using ZPC optics (with no degrees of freedom) and a phase mask with $n$ degrees of freedom. This makes it practical to design efficient phase imaging optics for any sample using relatively few optical elements.In the Bayesian regime, we do not have precise knowledge of $\Phi$ and, therefore, cannot choose an interferometric basis that concentrates $\psi$ into single component. As shown in the Supplement 1 (Section 4), this precludes finding a projective measurement that achieves the quantum information limit. To find an efficient measurement, we must optimize based on the prior distribution $\lambda$. As a starting point, we make use of the available translation-specific prior knowledge to flatten the expected exit wavefront by applying a phase mask ${\bar \Phi _k}^{- 1} \equiv {e^{- i{{\langle {{\phi _k}} \rangle}_\lambda}}}$. This approach has been proposed in [40] as part of an adaptive procedure to increase the sensitivity of phase imaging for thick phase objects. Since $U = {\cal F}{\bar \Phi ^{- 1}}$ will not focus the probe to a single point for non-trivial $\lambda$, the information gained using ZPC will be less than the QFI. When a significant fraction of the probe intensity cannot be refocused, it may be beneficial to adjust the Zernike phase $\mu\equiv {\mu _{g = 0}}$. We will call this class of measurement generalized ZPC (GZPC). The maximum efficiency of GZPC depends on how much intensity ${\Lambda _g} = {\langle {{{| {{{(U\psi)}_g}} |}^2}} \rangle _\lambda}$ is focused into component $g = 0$ in the interferometric basis. In Supplement 1 (Section 5), we calculate ${\langle {\cal I} \rangle _\lambda}$ for GZPC in the particular case that each of the parameters are independently, normally distributed with variance ${\sigma ^2}$. For ${\Lambda _0} = {e^{- {\sigma ^2}}}\gtrsim 0.8$, the result using $\mu= \pi /2$ is ${\langle {\cal I} \rangle _\lambda} \sim {\Lambda _0}{\cal J}$ (accurate to 1.5%). This linear approximation underestimates ${\langle {\cal I} \rangle _\lambda}$ in the region $0.8 \gt {\Lambda _0} \gt 0.5$, where it plateaus to a value of ${\langle {\cal I} \rangle _\lambda} \sim \frac{3}{4}{\cal J}$. For ${\Lambda _0} \lt 0.5$, ${\langle {\cal I} \rangle _\lambda}$ drops precipitously. The effectiveness of GZPC can be extended to lower values of ${\Lambda _0}$ by increasing the value of $\mu$, in which case ${\langle {\cal I} \rangle _\lambda} \sim \frac{3}{4}{\cal J}$ is maintained until ${\Lambda _0} \lt \frac{1}{4}$. The optimal choice for $\mu$ is given by

If no prior knowledge is available, we can assign a uniform distribution to each phase ${\phi _k}$. In this case, ${\Lambda _0} \sim 1/n$ and GZPC (for any $\mu$) is uninformative. However, if we randomly set each ${\mu _g}$ to either 0 or $\pi$, the expected FI is ${\langle {\cal I} \rangle _\lambda} \sim \frac{1}{2}{\cal J}$. This measurement, which we will call random sensing, is similar to the technique described by Oe and Namura [41], which uses a diffuser to generate in-line phase contrast. Oe and Namura rely on the WPOA to reconstruct the phase object. For strongly scattering samples, we must resort to a general phase retrieval algorithm such as Gerchberg–Saxton or Fienup [42]. Despite the factor of 2 discrepancy between the FI for random sensing and the QFI, this measurement outperforms the parallel MZI scheme (provided that the reconstruction algorithm produces the full variance reduction allowed by the CRB).

In some circumstances, it is possible to find a measurement more efficient than both GZPC and random sensing. For example, if there exists $U$ with a set $G$ of basis vectors with $|G| \ll n$ such that $R = \sum\nolimits_{g \in G} {\Lambda _g} \sim 1$, then it is simple to show that setting ${\mu _{g \in G}} = \pi /2$ and defines a measurement that gives ${\langle {\cal I} \rangle _\lambda} \sim {\cal J}$. Such a set exists, for example, if $\Phi$ is a crystal with a known, sharp diffraction pattern (but perhaps unknown translation). In general, if it is possible to choose $U$, which concentrates $\psi$ into a small subspace, then this strategy produces an efficient measurement: if $|G| \ll n$ and $R \sim 1$, ${\rm{Tr}}({{{\langle {\cal I} \rangle}_\lambda}}) \sim {\rm{Tr}}({\cal J})R(1 - |G|/n)$. However, rather than trying to explicitly optimize $U$, we will focus on applications where the nature of the prior knowledge about the sample leads to a natural choice of interferometric basis. For example, $U = {\cal F}$ is the natural choice when $\lambda$ is induced by an expected diffraction envelope. For the rest of this section, we will assume $\Lambda$ is given for a particular $U$ and proceed to optimize $M$.

Having chosen the interferometric basis, we can define a family of measurements called generalized common path interferometry (GCPI), which is parameterized by the set $G$ and the phase shift $\mu$ applied to basis vectors $g \in G$. A procedure for optimizing over $\mu$ and $G$ is described in Supplement 1 (Section 6). The optimization is especially likely to identify a measurement more efficient than GZPC or random sensing when specializing to foreground variance reduction (using the cost function 12) or to high spatial frequency measurements. For example, in dose-limited electron microscopy, the high spatial frequency features degrade quickly, and the achievable resolution scales with the fourth power of the dose [43]. This motivates a parameter weighting ${W_{a,a}} = |{\vec q_a}{|^4}$, where ${\vec q_a}$ is the spatial frequency associated with ${v^{(a)}}$ in the phase grating basis.

In Fig. 3, we compare the efficiency of GZPC, random sensing, GCPI, and dark-field microscopy (DF). DF, like ZPC, produces contrast by operating on the unscattered beam, but replaces the phase-shifting element with an absorbing element. On the left, $\Delta C$ is calculated for samples that are known to have a Gaussian intensity distribution $\Lambda$ in basis $U$ for various peak intensities ${\Lambda _0}$. When the weighting on the parameters is uniform and the optimization is done using the full cost Eq. (8), GCPI offers no advantage over the best choice among the other methods. However, when the weighting is ${W_{a,a}} = |{\vec q_a}{|^4}$, GPCI achieves a lower measurement cost by sacrificing sensitivity at low spatial frequencies in exchange for increased sensitivity at high spatial frequencies. GCPI also exploits this trade-off to outperform other methods when specializing to foreground variance reduction using the cost function Eq. (12). The change in weighting and cost function has a negligible effect on $\Delta C$ for ZPC, DF, and RS curves. On the right, $\Lambda$ is a 2D Lorentz distribution ${\Lambda _g} = \frac{1}{{2\pi}}\frac{w}{{{{({|{{\vec w}_g}{|^2} + {w^2}})}^{3/2}}}}$, where ${\vec w_g}$ is the coordinate vector corresponding to channel $g$ and $w = (2\pi {\Lambda _0}{)^{- 1/2}}$. In accordance with the rule of thumb described above, GCPI is especially effective for the Gaussian prior, where $\Lambda$ is more concentrated.

## 5. PHASE IMAGING WITH A LIMITED NUMERICAL APERTURE

Given some prior knowledge of the diffraction pattern of a sample, the measurements described in the previous sections can be performed using a spatial phase modulator to implement $M$ and two Fourier lenses to implement $U$ and ${U^*}$. Given some translation-specific prior knowledge, it may also be beneficial to add a second spatial phase modulator to a conjugate-image plane before the first lens to implement $U = {\cal F}{\bar \Phi ^{- 1}}$. In this section, we will account for loss due to the limited NA of real lenses. This adds some intrinsic phase contrast, which becomes significant for strongly scattering samples but also reduces the total amount of information that can reach the detector. The achievable efficiency will depend on how well the probe, which is focused in the condenser aperture to provide plane wave illumination, can be refocused in a conjugate plane after passing through the sample.

Let $A(\vec q)$ be a hard aperture function with $A(|\vec q| \lt {q_{{\max}}}) = 1$ and $A(|\vec q| \gt {q_{{\max}}}) = 0$. For a WPO (or an amplitude object), $A$ blocks all information about spatial frequencies with a magnitude larger than ${q_{{\max}}}$. However, the intensity pattern at the detector depends on all spatial frequencies present in a strong phase object, regardless of ${q_{{\max}}}$. For example, the diffraction pattern of the superposition two phase gratings at spatial frequencies ${\vec q_a}$ and ${\vec q_b}$ contains the beat frequencies ${\vec q_a} \pm {\vec q_b}$. Even if both $|{\vec q_a}| \gt {q_{{\max}}}$ and $|{\vec q_b}| \gt {q_{{\max}}}$, it is possible that $|{\vec q_a} - {\vec q_b}| \lt {q_{{\max}}}$. This principle makes it possible to achieve super-resolution using structured illumination [44]. Since the illumination is also limited by the NA, structured illumination can only improve resolution over the standard limit by a factor of 2. But with a sufficiently informative prior distribution $\lambda$ providing known structure in the sample itself, diffraction no longer imposes a fundamental resolution limit [44,45]. There remains, however, an information limit.

The measurements that can be applied to exit wave function $\psi$ using diffraction-limited optics are non-projective and cannot be described using a unitary transfer function of rank $n$. However, we will assume that the measurement applied to the wave function exiting the Fourier aperture $\Psi (\vec q) = A(\vec q){\cal F}(\psi)(\vec q)$ is unrestricted. Then the diffraction-limited QFIM, $\tilde {\cal J}$, can be calculated by applying Eq. (15) to $\Psi$. A small amount of additional information is available when using a deterministic source by measuring the total intensity missing at the detector. In Supplement 1 (Section 7), we argue this information negligible. Unlike the QFIM for MPE, $\tilde {\cal J}$ depends on $\Theta$ and is not diagonal, making the calculation of the GQFI less trivial: $\tilde {\cal Z}(\lambda) = {\cal I}(\lambda) + {\langle {\tilde J(\Theta)} \rangle _\lambda}$. The off-diagonal elements are generally small, and a good approximation is

While more strongly scattering samples send a larger portion of the probe intensity outside the NA, they are also more sensitive to spatial frequencies higher than ${q_{{\max}}}$ through the beating effect described above. These effects act in equal measure, and the fractional QFI lost to the aperture ${\rm{Tr}}({{\cal J} - \tilde {\cal J}})/{\rm{Tr}}({\cal J}) \sim \sum\nolimits_{\vec q} A(\vec q)/n$ is roughly constant regardless of $\lambda$.

Using $\tilde {\cal Z}(\lambda)$, we can write an envelope function for the ITF that represents the maximum diffraction-limited variance reduction,

Figure 4 shows the ITF for various phase contrast schemes. Since natural images often have spectra with power ${\sim}{q^2}$ [46–48], we assume the diffraction pattern has a 2D Lorentz distribution with unscattered intensity ${\Lambda _0} = 0.6$ (left), ${\Lambda _0} = 0.2$ (middle), and ${\Lambda _0} = 0.1$ (right). The black curve is the envelope function defined in Eq. (19). Comparing the three plots, we see that, as more intensity scatters outside the NA, the decrease in ${\cal E}$ below ${q_{{\max}}}$ is accompanied by an approximately equal increase above ${q_{{\max}}}$. The cyan curve is the ITF for the intrinsic (bright-field) contrast due to scattering outside the NA. The blue curves are the ITFs for ZPC using $\mu= \pi /2$ (solid line) and $\mu= \pi$ (dotted line). The red curve is the ITF for random sensing. The remaining curves are ITFs for GCPI using $\mu= \pi /2$ with varying $|G|$. As $|G|$ increases, information about high spatial frequency parameters is gained at the cost of information about low spatial frequency parameters. While the decrease in low spatial frequency information is strictly a disadvantage from the perspective of any (positively weighted) cost function, it may be a positive feature in some circumstances. For example, filtering out low spatial frequencies may simplify data interpretation (finding an efficient estimator).

In Fig. 5, we optimize a GCPI filter for measuring a WPO in a strongly scattering background using the cost function in Eq. (12). We again assume a Lorentzian diffraction pattern and set $\Lambda (\vec q = 0) = 0.2$. The foreground WPO is a 20 µm diameter pinwheel. The phase of the combined foreground and background is shown in Fig. 5A. The detected intensity distributions using ZPC with $\mu= \pi /2$ (Fig. 5B), $\mu= \pi$ (Fig. 5C), and using GCPI (Fig. 5E) are shown with identical color scales. The optimized Fourier filter for GCPI is shown in (Fig. 5D). A phase shift of ${\sim}0.52\pi$ is applied in the central (white) region relative to the outer (gray) region. The black region is absorptive and establishes a NA of 0.8 using 500 nm light. Besides providing good contrast for high spatial frequency features, foreground-optimized GCPI filters out the much of the background. A similar filtering affect can be achieved simply by blocking the prominent spatial frequencies in the background. In (Fig. 5F), the GCPI filter is modified so that the central (white) region is completely absorbing. This high-pass filter produces significantly less contrast: the color scale in (Fig. 5F) is emphasized by a factor of 10 compared to the color scale in (Fig. 5E).

## 6. CONCLUSION

Outside of the WPOA, each spatial frequency in the sample phase affects many spatial frequencies in the intensity at the detector. This nonlinearity makes it difficult to design efficient phase imaging transfer optics. We have approached this problem using FI as a rigorous optimization framework and developed an ITF to study the properties of various schemes. As a rule of thumb, the amount of information that can be extracted from a single measurement depends on how well the exit wave function can be concentrated into a small subspace using prior knowledge of the sample. We showed that in the asymptotic regime, where a significant amount of prior knowledge (or a large allocation of measurement resources) is available, an optimal measurement strategy involves flattening the exit wavefront and then applying ZPC. We also studied three measurements that are effective under different conditions in the Bayesian regime. When less than 75% of the probe intensity is scattered by the sample (${\Lambda _0} \gt 0.25$), GZPC provides at least 75% of the variance reduction relative to the quantum limit. When more than 80% of the probe intensity is scattered, RS is a more effective protocol, attaining half of the quantum-limited variance reduction. A third option, GCPI, has superior performance for the full range of ${\Lambda _0}$ when the cost function uses non-trivial weights $W$ or when specialized to imaging WPOs in a strongly scattering background.

It would be straightforward to extend these methods and the ITF to phase objects with finite depth of field and finite absorption, and also to include lens aberrations and limited coherence. The ITF could also be used to characterize an aggregate measurement including multiple modalities (e.g., phase contrast and fluorescence) by summing their individual FIMs. Future work in search of even more efficient protocols could involve jointly optimizing the probe intensity pattern along with the transfer optics.

## Funding

U.S. Department of Energy (DE-SC0019174-00); Gordon and Betty Moore Foundation (5723).

## Disclosures

The authors declare no conflicts of interest.

## Supplemental document

See Supplement 1 for supporting content.

## REFERENCES

**1. **R. Fitzgerald, “Phase-sensitive x-ray
imaging,” Phys. Today **53**, 23–26
(2000). [CrossRef]

**2. **U. Bonse and M. Hart, “An X-ray
interferometer,” Appl. Phys. Lett. **6**, 155–156
(1965). [CrossRef]

**3. **H. Wen, A. A. Gomella, A. Patel, S. K. Lynch, N. Y. Morgan, S. A. Anderson, E. E. Bennett, X. Xiao, C. Liu, and D. E. Wolfe, “Subnanoradian X-ray
phase-contrast imaging using a far-field interferometer of nanometric
phase gratings,” Nat. Commun. **4**, 2659 (2013). [CrossRef]

**4. **H. K. L. Reimer, *Transmission Electron
Microscopy*, 5th ed.
(Springer-Verlag, 2008),
vol. 36.

**5. **P. W. Hawkes and E. Kasper, “The Theory of Bright-field
Imaging,” in *Principles of Electron
Optics*, P. W. Hawkes and E. B. T. Kasper, eds., 3rd ed.
(Academic Press, 1994),
pp. 1385–1440.

**6. **D. J. Johnson and D. Crawford, “Defocusing phase contrast
effects in electron microscopy,” J.
Microsc. **98**,
313–324 (1973). [CrossRef]

**7. **F. Zernike, “Phase contrast, a new method
for the microscopic observation of transparent
objects,” Physica **9**,
686–698 (1942). [CrossRef]

**8. **F. Zernike, “Phase contrast, a new method
for the microscopic observation of transparent objects part
II,” Physica **9**,
974–986 (1942). [CrossRef]

**9. **M. Vulović, L. M. Voortman, L. J. van Vliet, and B. Rieger, “When to use the projection
assumption and the weak-phase object approximation in phase contrast
cryo-EM,” Ultramicroscopy **136**, 61–66
(2014). [CrossRef]

**10. **D. L. Misell, “On the validity of the
weak-phase and other approximations in the analysis of electron
microscope images,” J. Phys. D **9**, 1849–1866
(1976). [CrossRef]

**11. **R. D. Allen and G. B. David, “The zeiss-Nomarski
differential interference equipment for transmitted-light
microscopy,” Z. Wiss Mikrosk. **69**, 193–221
(1969).

**12. **R. Hoffman, “The modulation contrast
microscope: principles and performance,” J.
Microsc. **110**,
205–222 (1977). [CrossRef]

**13. **S. Fürhapter, A. Jesacher, S. Bernet, and M. Ritsch-Marte, “Spiral phase contrast imaging
in microscopy,” Opt. Express **13**, 689–694
(2005). [CrossRef]

**14. **C. Hu and G. Popescu, *Quantitative Phase Imaging: Principles
and Applications* (Springer International
Publishing, 2019),
pp. 1–24.

**15. **Y. Park, C. Depeursinge, and G. Popescu, “Quantitative phase imaging in
biomedicine,” Nat. Photonics **12**, 578–589
(2018). [CrossRef]

**16. **D. Gabor and W. L. Bragg, “Microscopy by reconstructed
wave-fronts,” Proc. R. Soc. London. Ser.
A **197**, 454–487
(1949). [CrossRef]

**17. **G. Möllenstedt and H. Düker, “Beobachtungen und messungen an
biprisma-interferenzen mit elektronenwellen,”
Z. Phys. **145**,
377–397 (1956). [CrossRef]

**18. **J. Cowley, “Twenty forms of electron
holography,” Ultramicroscopy **41**, 335–348
(1992). [CrossRef]

**19. **J. Glückstad and D. Palima, *Generalized Phase Contrast*
(Springer Netherlands,
2009) Vol. 146.

**20. **J. Verbeeck, A. Béché, K. Müller-Caspary, G. Guzzinati, M. Luong, and M. Hertog, “Demonstration of a 2 × 2
programmable phase plate for electrons,”
Ultramicroscopy **190**,
58–65 (2017). [CrossRef]

**21. **A. Holevo, *Probabilistic and Statistical Aspects
of Quantum Theory* (Springer,
1982).

**22. **C. W. Helstrom, “Minimum mean-squared error of
estimates in quantum statistics,” Phys. Lett.
A **25**, 101–102
(1967). [CrossRef]

**23. **S. L. Braunstein and C. M. Caves, “Statistical distance and the
geometry of quantum states,” Phys. Rev.
Lett. **72**,
3439–3443 (1994). [CrossRef]

**24. **R. Demkowicz-Dobrzański, “Optimal phase estimation with
arbitrary a priori knowledge,” Phys. Rev.
A **83**, 061802
(2011). [CrossRef]

**25. **H. Cramér, *Mathematical Methods of
Statistics* (Princeton University
Press, 1999)
Vol. 43.

**26. **O. E. Barndorff-Nielsen and R. D. Gill, “Fisher information in quantum
statistics,” J. Phys. A **33**, 4481–4490
(2000). [CrossRef]

**27. **R. W. Gerchberg and W. O. Saxton, “Practical algorithm for
determination of phase from image and diffraction plane
pictures,” Optik **35**,
237–246 (1972).

**28. **H. L. Van Trees, *Detection, Estimation, and Modulation
Theory, Part I: Detection, Estimation, and Linear Modulation
Theory* (John Wiley &
Sons, 2004).

**29. **R. D. Gill and B. Y. Levit, “Applications of the van Trees
inequality: A Bayesian Cramér-Rao Bound,”
Bernoulli **1**,
59–79
(1995). [CrossRef]

**30. **J. Rubio and J. Dunningham, “Bayesian multiparameter
quantum metrology with limited data,” Phys.
Rev. A **101**, 032114
(2020). [CrossRef]

**31. **E. Martínez-Vargas, C. Pineda, F. M. C. Leyvraz, and P. Barberis-Blostein, “Quantum estimation of unknown
parameters,” Phys. Rev. A **95**, 012136 (2017). [CrossRef]

**32. **R. Demkowicz-Dobrzański, W. Górecki, and M. Guţă, “Multi-parameter estimation
beyond quantum Fisher information,” J. Phys.
A **53**, 363001
(2020). [CrossRef]

**33. **M. Tsang, H. M. Wiseman, and C. M. Caves, “Fundamental quantum limit to
waveform estimation,” Phys. Rev.
Lett. **106**, 090401
(2011). [CrossRef]

**34. **M. G. A. Paris, “Quantum estimation for quantum
technology,” Int. J. Quantum Inf. **07**, 125–137
(2009). [CrossRef]

**35. **J. Suzuki, Y. Yang, and M. Hayashi, “Quantum state estimation with
nuisance parameters,” J. Phys. A **53**, 453001 (2020). [CrossRef]

**36. **P. C. Humphreys, M. Barbieri, A. Datta, and I. A. Walmsley, “Quantum enhanced multiple
phase estimation,” Phys. Rev. Lett. **111**, 070403
(2013). [CrossRef]

**37. **J. Liu, H. Yuan, X.-M. Lu, and X. Wang, “QuantumFisher information
matrix and multiparameter estimation,” J.
Phys. A **53**, 023001
(2019). [CrossRef]

**38. **A. Z. Goldberg, I. Gianani, M. Barbieri, F. Sciarrino, A. M. Steinberg, and N. Spagnolo, “Multiphase estimation without
a reference mode,” Phys. Rev. A **102**, 022230
(2020). [CrossRef]

**39. **M. Mir, B. Bhaduri, R. Wang, R. Zhu, and G. Popescu, “Quantitative phase
imaging,” Prog. Opt. **57**, 133–217
(2012). [CrossRef]

**40. **T. Juffmann, A. de los Ríos Sommer, and S. Gigan, “Local optimization of
wave-fronts for optimal sensitivity phase imaging
(lowphi),” Opt. Commun. **454**, 124484
(2020). [CrossRef]

**41. **K. Oe and T. Nomura, “Twin-image reduction method
using a diffuser for phase imaging in-line digital
holography,” Appl. Opt. **57**, 5652–5656
(2018). [CrossRef]

**42. **J. R. Fienup, “Phase retrieval algorithms: a
comparison,” Appl. Opt. **21**, 2758–2769
(1982). [CrossRef]

**43. **N. de Jonge, “Theory of the spatial
resolution of (scanning) transmission electron microscopy in liquid
water or ice layers,” Ultramicroscopy **187**, 113–125
(2018). [CrossRef]

**44. **W. Lukosz, “Optical systems with resolving
powers exceeding the classical limit,” J. Opt.
Soc. Am. **56**,
1463–1471 (1966). [CrossRef]

**45. **J. M. Guerra, “Super-resolution through
illumination by diffraction-born evanescent waves,”
Appl. Phys. Lett. **66**,
3555–3557 (1995). [CrossRef]

**46. **G. J. Burton and I. R. Moorhead, “Color and spatial structure in
natural scenes,” Appl. Opt. **26**, 157–170
(1987). [CrossRef]

**47. **D. Field, “Relations between the
statistics of natural images and the response properties of cortical
cells,” J. Opt. Soc. Am. A **4**, 94–2379
(1987). [CrossRef]

**48. **D. Tolhurst, Y. Tadmor, and T. Chao, “Amplitude spectra of natural
images,” Ophthalmic Physiolog. Opt. **12**, 229–232
(1992). [CrossRef]