Factored form descent: a practical algorithm for coherence retrieval

Zhengyun Zhang; Zhi Chen; Shakil Rehman; George Barbastathis

doi:10.1364/OE.21.005759

1. Coherence retrieval

The mutual intensity function [1] for a stationary quasi-monochromatic partially coherent field contains all of the information needed to predict the time-averaged intensity at any point in the field after it has passed through any known first order optical system. Thus, its measurement enables many applications in modeling, simulation and imaging. While the mutual intensity function cannot be measured directly, its reconstruction can be posed as an inverse problem – compute the mutual intensity function from a suitable number of time-averaged intensity measurements of the field after it has passed through one or more known first order optical systems. This process of retrieving the mutual intensity of a partially coherent field from intensity measurements is known as coherence retrieval[2].

In this process, partially coherent light is passed through one or more known first order optical systems, and the resulting light intensities provide information about, and thus constraints on, the state of coherence of the original partially coherent field. The best known coherence retrieval methods are based on phase-space tomography [2–6], although other methods do exist, such as the direct measurement of the far field intensities of two pinholes [7], the imaging of optically-produced Wigner distribution for one-dimensional fields [8, 9], spectrogram-based methods [10] and others [11–13].

Not only are there many measurement methods for retrieving the necessary information to determine the state of coherence, there are also many different algorithms that reconstruct the state of coherence from these measurements. In this paper, we instead propose a simpler yet more versatile convex mathematical formulation and a principled solution method. Our measurement method-agnostic formulation is a constrained weighted least-squares problem based on physical first principles, and it exploits the inherent positivity of the mutual intensity. We also use this positivity in designing a practical solution method.

Unlike iterated projections algorithms [5], our formulation is convex and therefore does not suffer from potential local minima problems found when projecting onto consecutive non-convex sets. Unlike methods based on Fourier space gridding or inverse Radon transforms [3,4], we take advantage of the positivity of the mutual intensity matrix in a principled way, without the use of ad-hoc regularizers or additional projections [6]. Lastly, the flexibility of our formulation and solution method allows for every single intensity measurement to be weighted differently and removes the need for measurements to be incorporated an entire plane at a time.

We will now describe our formulation in more detail by first making some common assumptions about the partially coherent field to be measured: (a) it is quasi-monochromatic, (b) it has no evanescent components, and (c) it has negligible intensity outside a finite region in the plane.

Assumption (a) indicates that the mutual intensity function is sufficient to fully describe the partially coherent field; otherwise, the full mutual coherence function would be needed. Furthermore, this assumption places a lower bound on the wavelengths present in the field. This lower bound, along with assumption (b), imposes a spatial band-limit on the field. This band-limit and assumption (c) enables accurate modeling of the continuous mutual intensity function using a finite number of samples. In other words, the partially coherent field in question can be accurately described using a mutual intensity matrix, a discretized form of the mutual intensity function where the two spatial variables are replaced by row and column indexes corresponding to spatial sample locations [14, 15]. Furthermore, let us adopt a Gaussian noise model for the intensity measurements, since it can be adapted to approximately model various sources of noise in intensity measurements, including photon shot noise, sensor read-out noise, thermal noise and quantization.

With these assumptions and models in mind, we can formulate the coherence retrieval problem as the following convex problem on mutual intensity matrices J:

Problem 1.

\begin{array}{l} minimize & f (J) = \sum_{m = 1}^{M} σ_{m}^{- 2} {(y_{m} - {k_{m}}^{H} J k_{m})}^{2} \\ subject to & J ≽ 0 \end{array}

where:

J is a Hermitian N × N matrix.
N is the number of spatial samples in the mutual intensity.
M is the number of intensity measurements.
y_m is the m^th intensity measurement.
σ_m is the standard deviation of the additive Gaussian noise source for the m^th intensity measurement.
k_m is a vector describing propagation from the original plane where J is sought to the location where y_m is measured. Let K be the N × M matrix whose columns are k_m; this matrix is the discretized version of the transmission function K(P, Q) used for propagating the mutual intensity [1].

This formulation is a constrained least-squares problem with a quadratic merit function over the space of positive semi-definite matrices of size N × N. Being convex, the problem has a single globally optimal point at best and a contiguous globally optimal set at worst. In other words, no suboptimal local minima exist for this generalized formulation of coherence retrieval.

The naive approach to this convex program is to use a generic interior point method with barrier functions [16]. However, the inner loop second-order solver requires the inversion of a Hessian, which results on the order of O(N⁶) operations per iteration at the worst and O(N⁴) operations per iteration at best. Furthermore, even with optimizations such as quasi-Newton methods, storage comparable to that of a Hessian is still required, which in this case would be of size O(N⁴), making such an approach not scalable for large mutual intensity matrices. Since coherence retrieval isn’t a generic quadratic programming problem, one would expect that its special structure can be used in designing a simpler and less memory-intensive optimization method. This is what we propose in this paper.

In Section 2, we describe the algorithm and its theoretical justification. In Section 3, we apply the algorithm to a specific case of a Schell-model source and present numerical simulations and experimental verification.

2. Factored form descent algorithm

We start by rewriting the constrained convex problem into an unconstrained problem by exploiting the fact that all positive semi-definite matrices can be factored into the product of some complex matrix and its complex conjugate transpose, with no additional constraints:

J = {X X}^{H}

Physically, this is equivalent to saying that partially coherent fields can be represented as incoherent ensembles of coherent fields, with each coherent field represented by a single column vector in the matrix X. Many possible factorizations are possible for any particular J; if the columns of X happen to be orthogonal, then they also form a coherent-mode decomposition of the partially coherent field [15, 17]. However, to simplify discourse in the context of this paper, we will refer to the columns of X as modes even if they are not orthogonal, and we will call the space of matrices X modes space.

Since any J can be factored into a product of an unconstrained matrix X and its conjugate transpose, we can convert Problem 1 into the following unconstrained quartic problem over the space of complex N × N matrices X:

Problem 2.

minimize \hat{f} (X) = \sum_{m = 1}^{M} σ_{m}^{- 2} {(y_{m} - {k_{m}}^{H} X X^{H} k_{m})}^{2}

While there are no direct methods for solving multi-variate quartic minimization problems, the above problem can be solved using iterative methods. We propose an iterative algorithm using the nonlinear conjugate gradient method [18] to solve Problem 2. In each iteration, this algorithm aims to decrease the value of the merit function by updating the factored representation X, whose columns are the modes of the current estimate of the source mutual intensity J; we will call this algorithm the factored form descent algorithm:

Algorithm 1. 1. Set X(1) to a random N × N complex matrix, and S(0) to the zero N × N matrix.

2. For each iteration i

Compute intensity errors Δ_m(i) = y_m − k_m^HX(i)X^H(i)k_m
Set the weighted error matrix E(i) to be the M × M diagonal matrix with entries $σ_{m}^{- 2} Δ_{m} (i)$
Compute the mutual intensity space steepest descent direction D(i) = 2KE(i)K^H.
Compute the modes space steepest descent direction G(i) = 2D(i)X(i)
If i = 1 or G(i−1) = 0, then setβ(i) = 0, otherwise use the modified Polak-Ribière formula: $β (i) = max [0, Re {\frac{〈 G (i), G (i) - G (i - 1) 〉}{{‖ G (i - 1) ‖}_{F}^{2}}}]$
Compute the conjugate gradient direction S(i) = G(i) + β(i)S(i − 1)
Findα(i) that minimizes the single variable quartic polynomial f̂(X(i) +αS(i)).
Update the iterate X(i + 1) = X(i) +α(i)S(i)

2.1. Algorithm behavior

Except for some rare pathological cases, the above algorithm will produce a sequence of iterates X(i) such that f̂(X(i)) approaches the globally optimal value of f̂, thus yielding a sequence of mutual intensity matrices X(i)X^H(i) such that f(X(i)X^H(i)) approaches the globally minimal value in Problem 1. In all cases, the following theorem applies to the above algorithm:

Theorem 1. Algorithm 1 produces a sequence of iterates X(i) with corresponding monotonically non-increasing merit function values f̂(X(i)) and converges when G(i) becomes zero.

This theorem describes the overall behavior of the algorithm, including the unsurprising termination criterion, and its proof is given in Appendix A. In order to determine what value of X(i) the algorithm converges to, it would be useful to determine what G(i) being zero implies about X(i)X^H(i) with regards to the original convex problem. To do that, let us first define an orthonormal basis for the space of N × N Hermitian matrices:

B_nn = u_n(i)u_n^H (i) for n = 1,..., N, and
B_np = (1/2)^(1/2) (u_n(i)u_p^H (i) + u_p(i)u_n^H (i)) for n = 1,...,N and p = n + 1,...,N, and
B_pn = j (1/2)^(1/2) (u_n(i)u_p^H (i) − u_p(i)u_n^H (i)) for n = 1,...,N and p = n + 1,...,N.

where u_n for n = 1,...,N are the left singular vectors of X(i) with monotonically non-increasing singular values. That is, if U(i)S(i)V^H(i) is the singular value decomposition of X(i), then u_n are the columns of U(i) and the diagonal entries of S(i) are non-increasing. Let R be the rank of X(i) and let ℬ̄ be the set of (N − R)² basis matrices B_nn, B_np, B_np where R < n < p and let ℬ̄ be the set of the remaining basis matrices. If we consider the geometry of the convex cone formed by the set of all positive semi-definite matrices, then the boundary of this cone is the set of matrices that are also rank-deficient. For a rank-deficient X(i), the basis ℬ̄ describes a space orthogonal to the boundary of the cone and protruding from X(i)X^H(i).

Using the above definitions, the following theorem specifies a relationship between the modes space steepest descent direction G(i) and the mutual intensity space steepest descent direction D(i):

Theorem 2. When G(i) is zero, the mutual-intensity space steepest descent direction D(i) at position X(i)X^H(i) is orthogonal to all the basis matrices in ℬ.

Proof. We can write G(i) as:

G (i) = \sum_{n = 1}^{N} D (i) σ_{n} (i) u_{n} (i) V^{H} (i)

Since we know G(i) to be zero and since V(i) is an orthonormal matrix, we have:

0 = \sum_{n = 1}^{N} D (i) σ_{n} (i) u_{n} (i)

This means that D(i)u_n(i) is zero for n ≤ R. Note that any quadratic form of D(i) can be written as an inner product:

{u_{n}}^{H} (i) D (i) u_{p} (i) = 〈 u_{n} (i) {u_{p}}^{H} (i), D (i) 〉

Therefore, for any B ∈ ℬ, 〈D(i), B〉 = 0.

In other words, once G(i) = 0, D(i) can only point in a direction perpendicular to the boundary of the convex cone. As a special case, if X(i) is full-rank, then G(i) being zero would imply D(i) being zero. Since convex problems only have one global minimum, this implies that X(i)X^H(i) is the global minimum of Problem 1. If X(i) is rank-deficient, then the following theorem applies:

Theorem 3. If G(i) = 0 and D(i) is a negative semi-definite matrix, then X(i)X^H(i) is the global optimum of the original convex problem.

In other words, if we are at rank-deficient global minimum, then X(i)X^H(i) lies on the boundary of the convex cone and D(i) must point away from the cone. To prove this, let us consider the set of feasible (i.e. positive semi-definite) mutual intensity matrices in a local neighborhood around X(i)X^H(i). Any such Ĵ in this set can be written as:

\hat{J} = X (i) X^{H} (i) + ε S

where ε is a small but positive number and S is a Hermitian matrix. There is a constraint on S to ensure that Ĵ is positive semi-definite. To explore this constraint, let us first project S onto null-space of X(i)X^H(i):

\hat{S} = {\hat{B}}^{H} S \hat{B}

where B̂ is a (N − R) × N matrix consisting of column vectors u_R₊₁,..., u_N. Given these definitions, the following lemma applies:

Lemma 1.Ĵ is positive semi-definite if and only if Ŝ is also positive semi-definite.

Proof. First, it is easy to see that if Ŝ is positive semi-definite, then Ĵ is also positive semi-definite. Now, consider the case that Ŝ is not positive semi-definite. Let v̂ be the eigenvector corresponding to a negative eigenvalue of Ŝ. Let v = B̂v̂:

v^{H} \hat{J} v = {\hat{v}}^{H} {\hat{B}}^{H} X (i) X^{H} (i) \hat{B} \hat{v} + ε {\hat{v}}^{H} \hat{S} \hat{v}

X^H(i)B̂ has to be equal to zero because we chose B̂ to contain only eigenvectors corresponding to the zero-valued eigenvalues in X(i). Therefore,

v^{H} \hat{J} v = ε {\hat{v}}^{H} \hat{S} \hat{v}

Since ε > 0 and v̂ is an eigenvector corresponding to a negative eigenvalue of Ŝ, v^HX̂v < 0 and thus Ĵ is not positive semi-definite. Therefore, Ĵ is positive semi-definite if and only if Ŝ is also positive semi-definite.

With Lemma 1 proven, Theorem 3 can now be proven using proof by contradiction.

Proof. (of Theorem 3) In order for X(i)X^H(i) to not be a local/global minimum of Problem 1, there must exist some Ĵ such that f (Ĵ) < f (X(i)X^H(i)), and thus the corresponding step direction S must be aligned with the steepest descent direction:

〈 S, D (i) 〉 > 0

Since D(i) is negative semi-definite, it can be written as a sum of rank-one matrices:

D (i) = - \sum_{n} e_{n} {e_{n}}^{H}

Furthermore, since G(i) = 0, e_n = B̂B̂^He_n as a consequence of Theorem 2. Therefore, we can write:

\begin{array}{l} 〈 S, D (i) 〉 & = & - 〈 S, \sum_{n} e_{n} {e_{n}}^{H} 〉 \\ = & - \sum_{n} {e_{n}}^{H} S e_{n} \\ = & - \sum_{n} {({\hat{B}}^{H} e_{n})}^{H} \hat{S} ({\hat{B}}^{H} e_{n}) < 0 \end{array}

This contradicts the alignment requirement specified by Eq. (9). Hence, if D(i) is negative semi-definite, then there exists no matrix Ĵ in the neighborhood of X(i)X^H(i) that exhibits a lower merit function value, and therefore X(i)X^H(i) is the global minimum.

If D(i) is not negative semi-definite, then X(i) is a saddle point of Problem 2. However, in practice, the algorithm will rarely converge to such a point because saddle points are inherently unstable. That is, if we approach such a saddle point, the iterate will “slide” off, away from the saddle point, unless it is approaching from a pathological direction, of which there is only a set of measure zero. It is also very easy to determine whether the algorithm has actually converged to a saddle point by examining the eigenvalues of the mutual intensity space steepest descent direction D(i). If D(i) is not negative semi-definite, then we can continue the algorithm after nudging the current iterate by a small fraction of a matrix composing of all the positive eigenvectors of D(i):

X (i + 1) = X (i) + ε (max (λ_{1}, 0) v_{1}, \dots, max (λ_{N}, 0) v_{N})

where v_n are the eigenvectors corresponding to eigenvalues λ_n of D(i) and ε is a small positive number.

2.2. Algorithm complexity

While the nonlinear conjugate gradient method makes no guarantees about the number of iterations needed, it is possible to at least determine the asymptotic computational complexity of each iteration:

Propagated intensity can be computed by performing a matrix-matrix multiplication K^HX(i) followed by finding the element-wise magnitude square of the result and then summing across columns. This results in O(MN²) operations, dominated by the matrix multiplication.
The weighted error can be computed in O(M) operations.
The mutual intensity space steepest descent direction D(i) is computed from a matrix-matrix multiplication resulting in O(N²M) operations.
The modes space steepest descent direction is another matrix-matrix multiplication, resulting in O(N³) operations.
Computation of β(i) takes O(N²) operations.
Computation of S(i) takes O(N²) operations.
Computing the terms of the quartic polynomial in α requires propagation of both X(i) and S(i), resulting in also O(MN²) operations.
Updating the iterate takes O(N²) operations.

Note that if we wish to solve for a N × N mutual intensity matrix with M measurements, then we need M ≥ N². Hence, the computational complexity per iteration is O(MN²) or at least O(N⁴). With the availability of parallel computing, the runtime can be made shorter since the expensive O(MN²) matrix-matrix multiplications can be parallelized up to MN ways, massively reducing the run-time needed.

Storage-wise, the largest matrices are the intermediate products K^HX(i) and K^HS(i). However, the resulting output is much smaller and these computations can be split across blocks of K, reducing the size of the intermediate product at any particular instance in time. Therefore, the absolute minimal amount of storage needed would include the storage of K itself (which is the bulk of the storage) as well as iterates X(i), past steepest descent directions and some constant amount of scratch space, resulting in an asymptotic storage complexity of O(MN) or O(N³); this is much more efficient than the O(N⁴) asymptotic storage complexity needed for interior point methods.

3. Example application

In order to verify that the factored form descent algorithm also works in practice, we designed a verifiable one-dimensional phase-space tomography experiment to demonstrate retrieval of the known mutual intensity of a Schell-model source by the algorithm. In simulation, we used an ideal source and modeled the phase-space tomography optical arrangement to obtain a sequence of “captured” images, from which we reconstructed the mutual intensity of the ideal source with very little error. We also built the entire optical arrangement, including a 2-f system to generate the Schell-model source, and managed to reconstruct a mutual intensity that reasonably approximates the ideal source.

3.1. Design

The first aspect of the design was to specify a partially coherent source for the experiment. A one-dimensional Schell-model source was chosen for its prevalence in the literature and because it is easy to build one in practice.

In general, a Schell-model source [19] in one dimension has a mutual intensity function of the form:

J (x_{1}, x_{2}) = a (x_{1}) a^{*} (x_{2}) μ (x_{1} - x_{2})

and can be generated using an amplitude mask illuminated by a fully incoherent area source placed effectively at infinity. An optical system consisting of an amplitude mask at both the front and back focal planes of a thin convex lens, as shown in Fig. 1, generates a Schell-model source at the back-focal plane if the front-focal plane is illuminated with uniform, fully incoherent quasi-monochromatic light. In this case, slits are used for the two masks and the resulting mutual intensity function at the back-focal plane is given by:

J (x_{1}, x_{2}) = I_{0} rect (x_{1} / W_{2}) rect (x_{2} / W_{2}) sinc (W_{1} (x_{1} - x_{2}) / (λ F))

where I₀ is the maximum point-wise intensity of the output field, W₁ is the width of the front-focal plane slit, W₂ is the width of the back-focal plane slit, F is the focal length of the thin lens and λ is the wavelength of the incoming light. Based on the availability of specific optical components, the following parameters were chosen for both the simulation and the actual experiment:

λ = 532 nm, F = 100 mm, W_{1} = 100 μ m, W_{2} = 500 μ m

As was discussed earlier, there are many capture methods to obtain the data needed to reconstruct the source mutual intensity. The most prevalent methods are phase-space tomography methods, where the transverse intensity of a partially coherent beam is captured at various axial positions along the beam, i.e. a focal stack. The idea is that Fourier transforms of the intensity form different slices through the origin of the ambiguity function [20, 21], which in turn can be mapped one-to-one to the mutual intensity.

Fig. 1 Optical arrangement that generates a Schell-model beam. Uniform spatially incoherent quasi-monochromatic light is used to illuminate an amplitude mask at the front focal plane (I) of a convex lens (II) with focal length f. The partially coherent field immediately after an amplitude mask at the back focal plane (III) is that of a Schell-model source.

Name	Input	Iterations	Weighting
`RUN_0`	y₀	500	uniform
`RUN_01`	y₀₁	500	uniform
`RUN_1`	y₁	500	uniform
`RUN_WP`	y_p	500	matching
`RUN_UP`	y_p	500	uniform

Name	Input	Iterations	Weighting
`RUN_EXP_U`	y_exp	500	uniform
`RUN_EXP_W`	y_exp	500	σ_exp
`RUN_EXP1_U`	y_exp1	500	uniform
`RUN_EXP1_W`	y_exp1	500	σ_exp

Abstract

1. Coherence retrieval

2. Factored form descent algorithm

2.1. Algorithm behavior

2.2. Algorithm complexity

3. Example application

3.1. Design

3.2. Simulation

3.3. Experiment

4. Conclusions

A. Proof of Theorem 1

Acknowledgments

References and links

Cited By

Figures (15)

Tables (2)

Equations (19)

Optics Express