Denoising Poisson phaseless measurements via orthogonal dictionary learning

Huibin Chang; Stefano Marchesini

doi:10.1364/OE.26.019773

1. Introduction

In lensless imaging, only the magnitudes of Fourier transformation are recorded, and the central computational task, known as “Phase Retrieval”, is to recover underling images from phaseless measurements. It is a challenging inverse quadratic problem. For noiseless case, related multivariable quadratic systems should be solved, the complexity of which is possibly NP-complete [1]. In practice, the measurements are usually contaminated by noise, and one shall seek an approximate solution by solving the reformulated noncovex optimization problem based on the maximum a posteriori (MAP) estimate of noise.

Numerically, researchers have developed numerous iterative algorithms, such as an alternating projection algorithm [2, 3] for the classical phase retrieval problem (Classical phase retrieval refers to retrieval phase from Fourier measurements). Several other heuristic first order algorithms popular in the optics community including [4–7] were proposed to solve the classical phase retrieval problems, and one can refer to [8,9] and the references therein. These methods are based on projections onto a nonconvex set, and it is therefore very difficult to describe and mathematically prove the convergence. Consequently, more recent work has focused on designing fast algorithms with convergence guarantee for such nonconvex minimization problem, e.g. [10–16], with random-masking or oversampling measurements. In the optics community, in order to increase the imaging resolutions and robustness, a popular approach, known as ptychography [17], uses a sequence of phaseless diffraction measurements recorded while translating the specimen with respect to a constant illumination mask. In order to prevent getting trapped into a local minimum for solving conventional nonconvex minimization problem by projection methods, another scheme is to model phase retrieval by convex methods including PhaseLift [18] and PhaseCut [19] based on semi-definite programming(SDP), as well as PhaseMax [20, 21] operating in the original signal space with much less computational cost compared with SDP based convex methods.

An early work for phase retrieval by employing sparse prior of images was the shrink-wrap algorithm [22], which was established by iteratively shrinking the size of support set for unknown objects. It has been successfully applied to several ground breaking X-ray Free Electron Lasers based experiments [23], which also demonstrates that sparse prior should be considered for modeling phase retrieval problem in order to increase the robustness and accuracy. The L¹ norm based phase retrieval method with a compressibility constraint was proposed in [24,25]. SDP-based convex methods combined with L¹ regularization of the lifted matrix were proposed to solve a sparse phase retrieval problem [26, 27]. With the prior knowledge of exact sparsity level, a sparse Fienup method [3] was proposed in [28]. A generic two-step iterative scheme was proposed in [29], which consisted of updating support set, and using damped Gauss-Newton algorithm to solve a nonconvex optimization problem with updated support set. A two-stage solution technique was presented in [30], where first by using arbitrary phase retrieval technique, sampling measurements were recovered and then by compressive sensing the unknown signals were recovered as well. If the transform domain of the unknown objects are corrupted by the additive noise, a probabilistic method based on the generalized approximate message passing algorithm was given in [31]. Hard thresholding based truncated gradient iterations [32] were employed in order to refine a sparse orthogonality-promoting initialization. An L⁰ regularized variational model was established for the sparse phase retrieval problem accompanied by an efficient algorithms based on the alternating direction method of multipliers [33] with adaptive stepsizes. Under the assumption of objects possessing a sparse representation in the transform domain, shearlet and total variation based regularization methods were considered in [34,35]. Dictionary learning methods were proposed to reconstruct the image [36,37] from Gaussian noised measurements with outliers, where Gaussian distributions only provide a limited approximation in most real applications.

Roughly speaking, the intensities are recorded by detectors within a limited time and incident flux, which causes Poisson noise. For linear measurements, in order to remove the Poisson noise, one can adopt the Anscombe root transformation [38,39], where a pointwise nonlinear transformation (variance stabilizing transformation) was introduced such that one just need deal with Gaussian data. The total variation based method was proposed in [40], where Kullback-Leibler (KL) divergence was derived based on the Bayesian framework in the case of Poisson noise. In a similar manner, for phaseless nonlinear measurements of phase retrieval, variational methods based on Tikhonov and total variation regularization were proposed to recover images from Poisson measurements [41, 42]. The recovered images [41] exhibited sharp edges and clean backgrounds reconstructed from very noisy and limited data. However, as such method was built using the local-sparsity regularization term, there exist visible staircase artifacts, and the smaller repetitive features (textures) could not be preserved, particularly for measurements generated with deterministic illuminations in the ptychographic phase retrieval problem. In order to further increase the accuracy and visual quality of recovery results for practical applications, one possible direction is to exploit the redundancy between overlapping image patches to boost the sparsity, such as using dictionary learning based denoising method [43–45], which pursues a sparse representation by a dictionary learned from image patches.

In this paper, we propose a novel dictionary learning model to reconstruct high-quality images from Poisson phaseless measurements, which can deal with both real and complex-valued images. It consists of three terms, i.e. a representation term by an orthogonal dictionary, an L⁰ pseudo norm of the coefficient matrix, and a Kullback-Leibler divergence term to fit phaseless Poisson data. The pseudo L⁰ norm is used as a sparse regularization of the coefficient matrix with both isotropic and anisotropic forms. Meanwhile, in order to solve the proposed nonconvex nondifferential optimization model, fast iterative algorithms are designed with the convergence guarantee. Since an orthogonal dictionary is employed, subproblems w.r.t. the dictionary and coefficients both have closed-form solutions. Numerous experiments for coded diffraction and ptychographic patterns are performed, which demonstrate that our proposed methods can recover images with sharp edges, clean background compared with the phase retrieval methods without any regularization. Moreover, these methods are capable of producing higher quality results by preserving the texture features compared with the total variation based method [41]. Additionally, by unitizing the introduced anisotropic L⁰ pseudo norm, the proposed methods further increase the quality for reconstructing complex-valued images.

The rest of this paper is organized in the following way. The proposed dictionary learning model is given in section 2. An alternative minimization method is given in section 3. Then a proximal alternating linearized minimization method is further designed in section 4 with the theoretical convergence guarantee. Numerous experiments are performed in section 5 to demonstrate the efficiency of proposed methods. We conclude the paper in section 6.

2. Proposed model

We consider phase retrieval for a 2-dimensional (2D) image in a discrete setting, i.e., an underlying object u : Ω = {0, 1, · · · , n − 1} → ℂ is of size n with n = n₁ × n₂, in which we represent a 2D object in terms of a vector of size n by a lexicographical order. For the classical phase retrieval problem, the measured data are only the magnitudes of the discrete Fourier transform of u, i.e., |ℱu|², where | · |² denotes the pointwise square of the absolute value of a vector, ℱ : ℂⁿ → ℂⁿ denotes the discrete 2D Fourier transform (DFT). In this paper, we consider a general phase retrieval problem [18], i.e.

To find u \in ℂ^{n}, s . t . {| 𝒜 u |}^{2} = b,

where 𝒜 : ℂⁿ → ℂ^m is a linear operator in the complex Euclidean space and b : Ω̃ = {0, 1, · · · , m − 1} → ℝ₊. Assuming that in the best scenario, photon counting procedures to collect the measurements b are contaminated by Poisson noise, i.e. for noisy measurements

f \in ℝ_{+}^{m}

, the intensity values located at each position are independent and identically distributed (i.i.d.) random variables, and the probability can be written as

Pr (f | b) = Π_{j \in \tilde{Ω}} \frac{exp (- b (j) {) (b (j))}^{f (j)}}{f (j)!}

, with the ground truth

b \in ℝ_{+}^{m}

as the mean value and variance.

We will first give a brief review of dictionary learning based denoising for the linear inverse problem, and regularized methods for denosing noisy phaseless measurements contaminated by Poisson noise in the following subsection.

2.1. Review of dictionary learning and Poisson denoising

2.1.1. Dictionary learning and denoising

Given a matrix Y = [Y(0), Y(1), · · · ,Y(d − 1)] ∈ ℂ^l,d, whose columns represent 1-D signals, the target of dictionary learning (or sparse coding) [46] is to find a “sparse” linear representation for the matrix Y by a learned dictionary D = [D(0), D(1), · · · , D(c − 1)] ∈ ℂ^l,c, i.e. Y ≈ Dα, such that the coefficient matrix α has more zeros (“sparse”) entries. Obviously, the dictionary D and coefficient matrix α ∈ ℂ^c,d are coupled and both needed in order to derive an optimal sparse representation. Roughly speaking, it can be reformulated as below:

{min}_{D, α} \frac{1}{2} {‖ Y - D α ‖}^{2} + τ {‖ α ‖}_{0}, s . t . ‖ D (j) ‖ = 1 \forall 0 \leq j \leq c,

where ‖ · ‖ denotes the Frobenius norm and L² norm for a matrix and a vector respectively, ‖ · ‖₀ denotes the L⁰ pseudo norm for counting the number of nonzero elements, τ is a positive constant to balance the data fitting term (first term) and the regularization term (second term), and the constraints guarantee that the dictionary is normalized.

The well-known work named K-SVD denoising [43] is to first produce redundant signals by sliding a small window over an entire image, and then to solve a sparse coding problem [47]. Given the noisy image f, the K-SVD denoising model can be given as follows:

{min}_{u, D, α} \frac{1}{2} {‖ R (u) - D α ‖}^{2} + τ {‖ α ‖}_{0} + \frac{η}{2} {‖ u - f ‖}^{2}, s . t . ‖ D (j) ‖ = 1 \forall 0 \leq j \leq c,

where image patches are selected redundantly by the linear selection operator R(u) = [R₀u, R₁u, · · · , R_d−1u] ∈ C^l,d to build totally d atoms of the underlying image u such that it can be sparsely represented by D with the coefficient matrix α, and D ∈ ℂ^l,c is a normalized dictionary. The variational methods combining K-SVD have been widely applied to different image processing tasks, especially for Poisson denoising [45] and deblurring [44].

2.1.2. Regularized Poisson denoising for phase retrieval

Thibault and Guizar-Sicairos [42] proposed the following model in coherent diffractive imaging with the Tikhonov regularization min_u∈ℂⁿ λ‖∇u‖² + ℬ(|𝒜u|², f), with the Kullback-Leibler divergence denoted as follows,

ℬ (h, f) : = \frac{1}{2} \sum_{j \in \tilde{Ω}} (h (j) - f (j) log h (j)),

where ∇ denotes the gradient operator and 𝒜 represents the masked Fourier transform generated by a set of illumination masks. Consequently, significant gains in reconstructed accuracy and robustness to noise were achieved compared with the method without regularization. In order to further increase the quality, Chang et al. [35] established the total variation based model for general phase retrieval tasks in the case of both the real and complex-valued images.

2.2. Proposed dictionary learning phase retrieval (DicPR)

Assuming that the measurements f = Poisson(|𝒜u|²) ∈ ℝ^m in (1) are corrupted by Poisson noise, we establish a minimization problem driven by the patch based sparsity [43–45] of the underlying images, referred to as “DicPR”, which reads,

{min}_{u, D, α} \frac{1}{2} {‖ D α - R (u) ‖}^{2} + τ {‖ α ‖}_{0} + η ℬ ({| 𝒜 u |}^{2}, f), s . t . D^{*} D = I,

where u is an underlying image that we want to reconstruct from the intensity f, D ∈ ℂ^l,c is a learned orthogonal dictionary as [48] which is combined with α by the L⁰ pseudo norm denoted by ‖ · ‖₀ to represent the underling image sparsely, τ and η are two parameters to balance the sparsity and data fitting terms, the identity matrix is denoted as I, and ℬ(·, ·) is defined in (4). We rewrite (5) as

{min}_{u, D, α} ϒ (u, D, α) : = ℋ (u, D, α) + ℱ (u) + ℐ_{K} (D) + τ {‖ α ‖}_{0},

with

ℋ (u, D, α) : = \frac{1}{2} {‖ D α - R (u) ‖}^{2}

, ℱ(u) := ηℬ(|𝒜u|², f), K = {D: D^*D = I}. The related indicator function ℐ_K is introduced as ℐ_K (D) = 0, if D ∈ K ; ℐ_K (D) = +∞, other wise. We introduce two different definitions of ‖α‖₀ for complex-valued α with isotropic and anisotropic L⁰ pseudo norms as follows:

{\begin{cases} Isotropic form : {‖ α ‖}_{0}^{iso} : = # {(s, t) : | α (s, t) | \neq 0 \forall 0 \leq s \leq c - 1, 0 \leq t \leq d - 1}, \\ Anisotropic form : {‖ α ‖}_{0}^{aniso} : = {‖ ℜ (α) ‖}_{0}^{iso} + {‖ ℑ (α) ‖}_{0}^{iso}, \end{cases}

where ℜ and ℑ denote the real and imaginary parts of the complex-valued matrix. One readily has

{‖ α ‖}_{0}^{iso} = {‖ α ‖}_{0}^{aniiso}

if α ∈ ℝ^c,d. For complex-valued matrix, they are quite different, and one can see the advantage of using the anisotropic form in the numerical experiments of this paper. Hereafter, for simplicity ‖α‖₀ represents the isotropic version pseudo norm if not specified.

Instead of directly using K-SVD [43], we have proposed an orthogonal dictionary learning model such that subproblems involving D and α both have closed forms. Moreover, the denoising results with the orthogonal dictionary seem comparable with the nonorthogonal one [48]. Due the the existence of the data fitting term for phaseless measurements and L₀ regularization of the coefficient matrix, the proposed model is a nonconvex and nonsmooth optimization problem, which is difficult to design the corresponding algorithm with the convergence guarantee. In the following sections, we first give a simple Alternating Minimization Method (AMM). Furthermore, a convergent algorithm called Proximal Alternating Linearized Minimization method (PALM) will be provided.

3. Algorithms 1: Alternating minimization method (AMM)

Since the variables u, D, α in (5) are coupled together, the overall iterative algorithm of AMM to solve the problem consists of three steps:

{\begin{cases} u^{k + 1} = arg {min}_{u} ϒ (u, D^{k}, α^{k}), \\ D^{k + 1} = arg {min}_{D} ϒ (u^{k + 1}, D, α^{k}), \\ α^{k + 1} = arg {min}_{α} ϒ (u^{k + 1}, D^{k + 1}, α), \end{cases}

when given the approximate solution (u^k, D^k, α^k) of the k_th iteration. In the following subsections, we will present how to solve these subproblems, where we omit all the superscripts of notations in (8) for simplicity.

3.1. Subproblem w.r.t. the variable u

In order to avoid directly calculating the gradient of the KL divergence (which is possibly unbounded), the alternating direction of multiplier method (ADMM) is adopted to solve the above subproblem. By introducing an auxiliary variable z, we have:

{min}_{u} \frac{1}{2} {‖ Y - R (u) ‖}^{2} + η ℬ ({| z |}^{2}, f), s . t . z = 𝒜 u,

with Y = [Y(0), Y(1), · · · , Y(d − 1)] := D^kα^k. An equivalent form of (9) is derived as

{min}_{u} \frac{1}{2} {\sum_{t = 0}^{d - 1} ‖ Y (t) - R_{t} u ‖}^{2} + η ℬ ({| z |}^{2}, f), s . t . z = 𝒜 u .

The corresponding augmented Lagrangian reads

ℒ_{r} (u, z; Λ) : = \frac{1}{2} \sum_{t} {‖ Y_{t} - R_{t} u ‖}^{2} + η ℬ ({| z |}^{2}, f) + ℜ 〈 z - 𝒜 u, Λ 〉 + \frac{r}{2} {‖ z - 𝒜 u ‖}^{2},

where Λ ∈ ℂ^m is the complex-valued multiplier, r is a positive parameter, 〈·〉 denotes the Hermitian inner product of two complex-valued vectors. The ADMM is designed as follows,

{\begin{cases} u_{j + 1} = arg min ℒ_{r} (u, z_{j} : Λ_{j}), \\ z_{j + 1} = arg min ℒ_{r} (u_{j + 1}, z; Λ_{j}), \\ Λ_{j + 1} = Λ_{j} + r (z_{j + 1} - 𝒜 u_{j + 1}), \end{cases}

starting from the previous iterative solution (u_j, z_j, Λ_j). Following the framework in [41], one can readily obtain the closed-form solution for the first subproblem of (12) as

[\begin{matrix} r ℜ (𝒜^{*} 𝒜) + W & - ℑ (𝒜^{*} 𝒜) \\ ℑ (𝒜^{*}) 𝒜 & r ℜ (𝒜^{*} 𝒜) + W \end{matrix}] [\begin{matrix} ℜ (u_{j + 1}) \\ ℑ (u_{j + 1}) \end{matrix}] = [\begin{matrix} r ℜ (𝒜^{*} v_{j}) + \sum_{t} R_{t}^{T} ℜ (Y (t)) \\ r ℑ (𝒜^{*} v_{j}) + \sum_{k} R_{t}^{T} ℑ (Y (t)) \end{matrix}],

with v_j = z_j + Λ_j/r, and

W = \sum_{t = 0}^{d - 1} R_{t}^{T} R_{t}

. The solution of above subproblem can be further simplified if the matrix 𝒜 produces Fourier measurements with masks

{I_{k}}_{k = 0}^{K - 1}

as

{(𝒜 u)}^{T} = (ℱ {(I_{0} ○ u)}^{T}, ℱ {(I_{1} ○ u)}^{T}, \dots, ℱ {(I_{K - 1} ○ u)}^{T}),

where ○ denotes the pointwise multiplication, I_k is a mask indexed by k each of which is represented by a vector in ℂⁿ in a lexicographical order. Therefore, we have

𝒜^{*} 𝒜 = diag (\sum_{j} I_{j}^{*} ○ I_{j}) = diag (\sum_{j} {| I_{j} |}^{2}),

which is a real-valued matrix. Finally we derive the solution

u_{j + 1} = {(r 𝒜^{*} 𝒜 + W)}^{- 1} (r 𝒜^{*} v_{j} + \sum_{t} R_{t}^{T} Y (t)),

if the diagonal matrix 𝒜^*𝒜 + W is non-singular.

For the second subproblem of (12), we have

{min}_{z} η ℬ ({| z |}^{2}, f) + \frac{r}{2} {‖ z - 𝒜 u_{j + 1} + Λ_{j} / r ‖}^{2} .

With Poisson noised data f, the closed form solution is readily obtained as

z_{j + 1} (t) = \frac{r | w (t) | + \sqrt{r^{2} {| w (t) |}^{2} + 4 η (η + r) f (t)}}{2 (η + r)} sign (w (t)),

with w_j = 𝒜u_j+1 − Λ_j/r, and

sign (w (t)) = \frac{w (t)}{| w (t) |}

. In summary, the overall algorithm of the subproblem w.r.t. u of (8) is listed as follows:

We remark that alternatively one can consider a gradient descent type algorithm with adaptive stepsizes such as Wirtinger flow [10], or the nonlinear conjugate gradient method or Newton type algorithm [49] if the number of measurements are sufficient. The ADMM can work pretty well with very few measurements [41], and has low computation cost compared to Newton type algorithm. In the experimental tests, very few iterations are adopted.

3.2. Subproblems w.r.t. variables D and α

For the second subproblem of (8) w.r.t. D, we have

\hat{D} : = {arg min}_{D} \frac{1}{2} {‖ D α - R (u) ‖}^{2}, s . t . D^{*} D = I .

The minimizer D̂ has a closed form [48] as

\hat{D} = U V^{*},

with R(u)α^* = UΣV^*, and corresponding singular vectors U, V and singular values Σ by singular value decomposition (SVD).

For the third subproblem of (8) w.r.t α, since D^*D = I, we have

\hat{α} : = {arg min}_{α} \frac{1}{2} {‖ α - D^{*} R (u) ‖}^{2} + τ {‖ α ‖}_{0} .

For the isotropic L⁰ norm, it has a closed form solution (known as “hard thresholding”) as below,

\hat{α} = {Thresh}_{\sqrt{2 τ}} (D^{*} R (u)),

where the hard thresholding operator Thresh_∊(α) is defined

{Thresh}_{∊} (α) (s, t) = {\begin{array}{l} \begin{array}{l} α (s, t), & if | α (s, t) | \geq ∊, \\ 0, & otherwise, \end{array} & \forall 0 \leq s \leq c - 1, 0 \leq t \leq d - 1 . \end{array}

For the anisotropic case, one readily obtains the closed form solution by separating the real and imaginary parts

\hat{α} = {Thresh}_{\sqrt{2 τ}} (ℜ (D^{*} R (u))) + i \times {Thresh}_{\sqrt{2 τ}} (ℑ (D^{*} R (u))) .

We present the overall algorithm for (5) in Algorithm 1.

Algorithm 1:. AMM for “DicPR” in (5)

View Table | View all tables in this article

In order to accelerate the computation, in the first iterations (three iterations in the numerical experiments), we use a truncated version for (20) where only parts of all columns of U and V are selected corresponding to the few largest singular values.

Assumption 1 The sequence ${u^{k}}_{k = 0}^{\infty}$ is bounded if and only if ${| 𝒜 u |}_{k = 0}^{\infty}$ is bounded.

In order to guarantee the uniqueness of the solution for phase retrieval, oversampling or multiple measurements are required. For example, for the coded diffraction patter and ptychographic pattern, 𝒜^*𝒜 is a diagonal matrix as (15). One readily sees that Assumption 1 holds, if we assume that the diagonal elements are all non-zeros. 𝒜^*𝒜 is usually invertible.

Theorem 1 Under Assumption 1, there exists an accumulation point Z̃ of {Z^k} such that lim_k→∞ ϒ(Z^k) = ϒ(Z̃).

Proof: Readily one knows that the objective functional values of the iterative sequences generated by Algorithm 1 are non-increasing, i.e. ϒ(u^k+1, D^k+1, α^k+1) ≤ ϒ(u^k, D^k, α^k). Hence, there exists a positive constant C independent with k, such that ℋ(u^k, D^k, α^k)+ ℱ(u^k)+ℐ_K (D^k)+ τ‖α^k ‖₀ ≤ C. First we have ℱ(u^k) ≤ C and readily one has {|𝒜u^k |} is bounded. By Assumption 1, the boundedness of {u^k} is proved. Since (D^k)^*D^k = I, it implies the boundedness of {D^k}. Since ℋ(u^k, D^k, α^k) ≤ C, we have ‖α^k −(D^k)^*R(u^k)‖ ≤ C by the orthogonality of the dictionary, which leads to the boundedness of {α^k}. There must exist a subsequence ${Z^{k_{n}}}_{n = 0}^{\infty} \subset {Z^{k}}$ with the limit point Z̃ = (ũ, D̃, α̃), s.t. lim_n→∞ Z^k_n = Z̃. Therefore we conclude the theorem.

For AMM algorithm, we only prove that it converges to the accumulation point of the iterative sequences. Generally speaking, for a nonconvex problem, the AMM algorithm may cycle indefinitely without converging [52]. Although numerically AMM for the proposed model works well, it is still challenging to study the convergence mathematically. Inspired by the proximal linearized technique, a proximal type splitting algorithm will be further given, where the sufficient decrease of the objective function is guaranteed due to the introduction of the additional proximal terms such that the convergence of such algorithm can be derived following [50].

4. Algorithm 2: Proximal alternating linearized minimization (PALM)

We adopt the proximal alternating linearized minimization method (PALM) [50] to solve (5). First we give the partial derivatives of

{\begin{cases} \nabla_{u} ℋ (u, D, α) = \sum_{t = 0}^{d - 1} R_{t}^{T} (R_{t} u - D α (t)), \\ \nabla_{D} ℋ (u, D, α) = (D α - R (u)) α^{*}, \\ \nabla_{α} ℋ (u, D, α) = D^{*} (D α - R (u)), \end{cases}

with α = [α(0), α(1), · · · , α(d − 1)]. With three stepsizes c^k, d^k and e^k, the PALM is given as

u^{k + 1} = {arg min}_{u} ℱ (u) + \frac{c^{k}}{2} {‖ u - {\hat{u}}^{k} ‖}^{2}, with {\hat{u}}^{k} = u^{k} - \frac{1}{c^{k}} \nabla_{u} ℋ (u^{k}, D^{k}, α^{k}),

D^{k + 1} = {arg min}_{D} ℐ_{K} (D) + \frac{d^{k}}{2} {‖ D - {\hat{D}}^{k} ‖}^{2}, with {\hat{D}}^{k} = D^{k} - \frac{1}{d^{k}} \nabla_{D} ℋ (u^{k + 1}, D^{k}, α^{k}),

α^{k + 1} = {arg min}_{α} τ {‖ α ‖}_{0} + \frac{e^{k}}{2} {‖ α - {\hat{α}}^{k} ‖}^{2}, with {\hat{α}}^{k} = α^{k} - \frac{1}{e^{k}} \nabla_{α} ℋ (u^{k + 1}, D^{k + 1}, α^{k}) .

For u–subproblem in (26), one can readily derive the ADMM following Algorithm 1-1, where only the solver for u in Step 2 is slightly different, and we just list Algorithm 2-1 directly.

Algorithm 1-1:. ADMM for u–subproblem in (8)

View Table | View all tables in this article

Algorithm 2-1:. ADMM for u–subproblem in (26)

View Table | View all tables in this article

For the D–subproblem in (27), we need to solve the following problem:

D^{k + 1} = {arg min}_{D} {‖ D - {\hat{D}}^{k} ‖}^{2}, s . t . D^{*} D = I .

One can readily derive the closed form solution as

D^{k + 1} = U V^{*},

where U, V are the singular vectors for the SVD of D̂^k denoted below:

\begin{array}{l} {\hat{D}}^{k} & = D^{k} - \frac{1}{d^{k}} (D^{k} α^{k} - R (u^{k + 1})) {(α^{k})}^{*} \\ = D^{k} (I - \frac{1}{d^{k}} α^{k} {(α^{k})}^{*}) + \frac{1}{d^{k}} R (u^{k + 1}) {(α^{k})}^{*} . \end{array}

For the α–subproblem in (28), one can directly get the closed form represented by the hard thresholding as

α^{k + 1} = {Thresh}_{\sqrt{2 τ / e^{k}}} ({\hat{α}}^{k}),

and

α^{k + 1} = {Thresh}_{\sqrt{2 τ / e^{k}}} (ℜ ({\hat{α}}^{k})) + i \times {Thresh}_{\sqrt{2 τ / e^{k}}} (ℑ ({\hat{α}}^{k})),

with

{\hat{α}}^{k} = (1 - \frac{1}{e^{k}}) α^{k} + \frac{1}{e^{k}} {(D^{k + 1})}^{*} R (u^{k + 1})

for the isotropic and anisotropic L⁰ pseudo norms respectively.

Therefore we can give an overall PALM for (5) as below.

In the following parts, we will provide convergence analysis for Algorithm 2. For simplicity, the analysis is conducted for real-valued images, and it can be easily generalized to the complex-valued images, where one only needs to reformulate the related norms and operators of variables w.r.t. their corresponding real and imaginary parts respectively [41].

Algorithm 2:. PALM for “DicPR” in (5)

View Table | View all tables in this article

The existence of the critical point of proposed model is given below.

Lemma 4.1 The critical point of (10) exists with the finite functional value.

Following Lemma 5 in [50] and the lower semi-continuity of ℱ, ℐ_D and ‖ · ‖₀, one can readily prove it, and details are omitted here. We need the assumption of the boundedness of u^k.

Assumption 2 (Boundedness) The sequences {u^k} generated by (26) is bounded.

Based on above assumption, the boundedness of the iterative sequences (u^k, D^k, α^k) can be proved, and see details in Lemma .1 in the appendix.

Lemma 4.2 (Lipschitz-Gradient) Under Assumption 2, for the functional ℋ(u, D, α), we have

ℋ is differential, and inf H > −∞.
∇_uℋ,∇_Dℋ and ∇_αℋ are Lipschitz continuous respectively with moduli L_u(D, α), L_D(u, α), L_α(u, D), and the Lipschitz constants are bounded, i.e. there exist three positive constants $λ_{u}^{+}$ , $λ_{D}^{+}$ , $λ_{α}^{+}$ , such that $sup L_{u} (D^{k}, α^{k}) \leq λ_{u}^{+}$ , $sup L_{D} (u^{k}, α^{k}) \leq λ_{D}^{+}$ , $sup L_{α} (u^{k}, D^{k}) \leq λ_{α}^{+}$ .
The gradient ∇ℋ(u, D, α) is Lipschitz continuous with Lipschitz constant M on a bounded domain {(u, D, α) : ‖(u, D, α)‖ ≤ Const}.
∇_Dℋ(u, D, α) is Lipschitz continuous with Lipschitz constant $λ_{D}^{-}$ w.r.t. α for the bounded sequences.

The proof for the first item is trivial. By (25) and Assumption 2, one can readily finish the proof for the left parts, based on the boundedness of iterative sequences proved by Lemma .1 in the appendix. Therefore, we omit the details. Furthermore, We assume that stepsizes are big enough.

Assumption 3 (Step Sizes) $λ^{+} : = min {c^{k} - λ_{u}^{+}, d^{k} - λ_{D}^{+}, e^{k} - λ_{α}^{+}} > 0$ .

Then we can consider the convergence of Algorithm 2. Readily one knows that ϒ(u, D, α) is semi-algebra [50,51] with the data term ℬ(·, ·) in (4), and finally we can get the final convergence theorem as follows.

Theorem 2 Let Assumptions 2, 3 hold and min_k e^k > 1/2. The sequence {u^k, D^k, α^k} generated by Algorithm 2 converges to a critical point of (6).

Based on Lemma .2 and Lemma .3 in the appendix, one can readily finish the proof following Theorem 1 in [50], and therefore, we omit the details here.

Remark 4.1 Although in this paper, we focus on the orthogonal dictionary, the proposed PALM can also be applied to the case with non-orthogonal dictionary, and further speedup of PALM should be investigated as in [51]. We leave them as future work.

At the end of this section, we analyze the computational complexity of Algorithm 1 and Algorithm 2 which have similar cost. Assuming that the operator 𝒜 generates Fourier masked measurements which means fast Fourier transformation can be adopted, the computation complexity for Algorithm 1-1 and Algorithm 2-1 is of O(n log(n)T_in + ln) after T_in inner iterations. The complexity for Step 3 and 4 is O(l³ + l²d). Hence the complexity of Algorithm 1 and Algorithm 2 is O(n log(n)T_in + ln + l³ + l²d)T_out) after T_out outer iterations. In our experiments, we set d ≈ n, l ≪ n and as a result the complexity is about O(n(log(n)T_in + l²)T_out).

5. Numerical experiments

All the tests are performed on a laptop with Intel I7-5600U2.6GHZ, and 16GB RAM, and the codes are implemented in MATLAB. The dictionary is initialized by discrete cosine transform. The model parameters η and τ for Algorithm 1 and Algorithm 2 are selected by hand. In the inner iterative algorithm e.g. Algorithm 1-1 and Algorithm 2-1, we set default parameter r = 1 × 10⁻³, and the default inner iteration number to five. For the left parameters for Algorithm 2, one can use dynamic schemes to update the parameters as [50,51]. In our experiments, for simplicity, these parameters are fixed. We stop Algorithm 1 and Algorithm 2 after a given maximum outer iteration number T to guarantee the convergence, which will be specified in the following subsections. All the other needed parameters will also be addressed in the following subsections if we do not give or use the default values.

Set image patch size to 8 × 8 empirically such that l = c = 64, and the number of patches $d = {(\sqrt{n} - 7)}^{2}$ for square images with n pixels. In this paper we have given a framework for phase retrieval with arbitrary linear operator 𝒜. However, it it more practical to consider the Fourier type transform, and we will show the performance on Fourier masked measurements involved with two types of patterns: Coded diffraction pattern (CDP) with random masks and ptychographic phase retrieval with deterministic masks generated by zone plate lens. Especially, ptychographic phase retrieval is a very promising technique to generate high resolution images with large field of view compared with the traditional diffraction imaging and meanwhile it requires less temporal and spatial coherence. However, the related phase retrieval problem is more challengeable.

The ground truth images are provided in Fig. 1, where four real-valued images with resolution 512 × 512 are put in Figs. 1(a)–1(d), and a complex-valued image with resolution 256 × 256 is put in Figs. 1(e)–1(g). Given a ground truth u, the noisy measurement is generated as f(j) = Poisson(|(𝒜u_δ)(j)|²) ∀j ∈ Ω̃, with u δ = δu at peak level δ (Noise level increases as peak level δ decreases). We measure the quality of the reconstructed image ũ by signal-to-noise ratio (SNR) (The SNR of noise free image is + ∞. The reconstructed image with larger SNR usually means higher visual quality) SNR (ũ, u) = −20 min_{ς∈{ς ∈ℂ: |ς|=1}} log(‖ςũ − u ‖/‖ũ‖), with the ground truth image u, where ς denotes the phase factor for the trivial ambiguities. In order to measure the sparsity of coefficient matrix α ∈ ℂ^c,d or ℝ^c,d, we introduce the sparsity level (Smaller sparsity values means sparser of the data with a higher level) $S (α) = \frac{{‖ α ‖}_{0}^{iso}}{c \times d} \times 100 %$ .

Fig. 1 Ground truth images. Real-valued images of resolution 512 × 512 in (a): Peppers; (b): Fingerprint; (c): Barbara; (d): House; A complex-valued image “Goldballs” of resolution 256 × 256 with magnitude, real and imaginary parts in (e), (f) and (g), respectively.

Download Full Size | PDF

We compare our proposed methods with ADMM for phase retrieval algorithm without any regularization [9] and the total variation (TV) based Poisson noise removal method [41]. Denote ADMM for phase retrieval [9], TV based Poisson denoising method [41], our proposed Algorithm 1 and Algorithm 2 with isotropic L ⁰ norm by “PR”, “TVPR”, “ALGI” and “ALGII”, respectively. We denote Algorithm 1 with anisotropic L ⁰ term for complex-valued images by “ALGI_aniso”. Since our proposed algorithms solve the nonconvex optimization problem, they are initialized by the results of “PR” in order to increase the robustness and convergence speed.

5.1. CDP for real-valued and complex-valued images

For the first pattern, the octanary CDP is explored, and specifically each element of I_j in (14) takes a value randomly among the eight candidates, i.e., { $\pm \sqrt{2} / 2$ , $\pm \sqrt{2} i / 2$ , $\pm \sqrt{3}$ , $\pm \sqrt{3} i$ }. Set the number of masks K = 2, 4 as [41] for real-valued and complex-valued images, respectively.

5.1.1. Real-valued Images

We will show the performance of our proposed algorithms from noisy measurement on four real-valued images shown in Figs. 1(a)–1(d), with peak level δ ∈ {5.0 × 10⁻³, 1.0 × 10⁻²}. Set η = 8.0 × 10⁻⁶, τ = 4.5 × 10⁻⁴ in Algorithm 1 and Algorithm 2. Set stepsizes c^k = 10, d^k = 50, and e^k = 1.5, 1.2 for δ = 5.0 × 10⁻³, 1.0 × 10⁻² respectively in Algorithm 2. Both algorithms stop after T = 100 iterations. All four images share the same parameters with the others.

Reconstructed results with different noise levels are put in Fig. 2–Fig. 3, and zoom-in parts of corresponding results are put in Fig. 4–Fig. 5. Readily one can see that “PR” generates very noisy results by observing the first rows of Fig. 2–Fig. 3 and the second rows in the zoom-in results of Fig. 4–Fig. 5, where the edges, background and the repetitive structures are contaminated severely. By TV regularization method and our proposed dictionary learning methods, all the recovery images have sharper edges and cleaner background. However, by observing the zoom-in results in the third rows of Fig. 4–Fig. 5, “TVPR” generates images with visible stair case artifacts, and some important texture information cannot be kept, while “ALGI” and “ALGII” can both produce high quality images and one does not find any visible staircase artifacts in the zoom-in parts. By observing Figs. 4(m), 4(q) or Figs. 5(m), 5(q), “ALGI” and “ALGII” generate cleaner background than “TVPR”. Moreover, by Figs. 4(n)–4(p), 4(r)–4(t) or Figs. 5(n)–5(p), 5(r)–5(t), the textures are preserved pretty well compared with the results by “TVPR”. In order to qualify the improvement of our proposed methods, the corresponding SNRs are put in Table 1 and Table 2, where the maximum in the same row is marked in bold font. Inferred from these two tables, SNRs by “TVPR”, “ALGI” and “ALGII” are about double of those by “PR”, and SNRs by “ALGI” and “ALGII” increase averagely about 1.5dB and 2dB, respectively. The increase or improvement with the proposed methods compared with “PR” and “TVPR” is more obvious on the image with more texture information e.g. “Fingerprint” or “Barbara” than that e.g. “Peppers” with piecewise smooth features, which demonstrates that its advantage is to handle images with sophisticated texture features.

Fig. 2 CDP with δ = 5.0 × 10⁻³. First row: PR; Second row: TVPR; Third row: ALGI; Fourth row: ALGII.

Download Full Size | PDF

Fig. 3 CDP with δ = 1.0 × 10⁻². First row: PR; Second row: TVPR; Third row: ALGI; Fourth row: ALGII

Download Full Size | PDF

Fig. 4 Zoom-in images of Fig. 2 for CDP with δ = 5.0 × 10⁻³. First row: Ground truth; Second row: PR; Third row: TVPR; Fourth row: ALGI; Fifth row: ALGII

Download Full Size | PDF

Fig. 5 Zoom-in images of Fig. 3 for CDP with δ = 1.0 × 10⁻². First row: Ground truth; Second row: PR; Third row: TVPR; Fourth row: ALGI; Fifth row: ALGII.

Download Full Size | PDF

Table 1. SNRs for CDP with δ = 5.0 × 10⁻³.

View Table | View all tables in this article

Table 2. SNRs for CDP with δ = 1.0

View Table | View all tables in this article

By observing the zoom-in parts, the results in the fourth rows of Fig. 4 and Fig. 5 seem more smoothing by “ALGI” than those in the fifth rows by “ALGII”. On the other hand, the results by “ALGII” contain a bit more features than by those by “ALGI”. In order to investigate the performance differences of “ALGI” and ”ALGII”, the sparsity of the coefficient matrix α is put in Table 3, and one can readily see that the coefficient matrix is sparser by “ALGI” than by “ALGII”, which possibly leads to those above behaviors. Anyway, both two algorithms produce comparable results with cleaner background and well-preserved textures. Hereafter, we only provide the results by “ALGI” since it has fewer parameters and seems more suitable for practical use.

Table 3. Sparsity S(α) for CDP on real-valued images.

View Table | View all tables in this article

We also report the runtimes of all compared algorithms in Table 4, where one can readily see that the proposed dictionary learning algorithms are much slower than the other two algorithms, since the inner loops for the subproblem of u are needed in Algorithm 1-1 and Algorithm 2-1. In the future, we will explore much faster algorithm without inner loops.

Table 4. Runtimes (in seconds) for CDP with δ = 5.0 × 10⁻³.

View Table | View all tables in this article

5.1.2. Complex-valued Image

More experiments on the complex-valued image “Goldballs” in Fig. 1 for CDP with peak level δ ∈ {0.08, 0.1} are performed, and please see the reconstructed results in Fig. 6. Notice that for complex valued images, we introduce two kinds of definitions for L⁰ pseudo norm in (7), and therefore we have two versions of PALM, i.e. “ALGI” and “ALGI_aniso”. Set the parameters η = 3.0 × 10⁻⁵, τ = 1.0 × 10⁻³ for “ALGI” and η = 2.5 × 10⁻⁵, τ = 8.0 × 10⁻⁴ for “ALGI_aniso”. Set iteration number in the outer loop as T = 50. By observing Fig. 6, from noisy measurements one can only derive noisy results by “PR”. By “TVPR”, “ALGI” and “ALGI_aniso”, the recovery results are almost noise free and also have very clean background. “TVPR” can only recover the large scale structures at the left top corner, but cannot keep the smaller repetitive structures. Obviously our proposed methods can produce better results, where both the large and small structures can be preserved well. Related SNRs are put in Table 5, about 1dB, 2dB increases are gained by “ALGI”, and “ALGI_aniso” compared with “TVPR”.

Fig. 6 CDP for complex-valued image with δ ∈ {0.08, 0.1}. First row: δ = 0.08; Second row: δ = 0.1. From left to right: reconstructed images by PR in (a) and (e), TVPR in (b) and (f), ALGI in (c) and (g) and ALGI_aniso in (d) and (h).

Download Full Size | PDF

Table 5. SNRs of CDP for complex-valued image

View Table | View all tables in this article

One also notices that “ALG_aniso” produces the recovery results with higher accuracy than “ALGI”. In order to investigate the improvements by the anisotropic norm, we show the learned dictionaries in Fig. 7, and the sparsity of coefficient matrix α in Table 6. Especially in Figs. 7(d) and 7(h) the imaginary parts of learned dictionary by “ALGI_aniso” seem to have more features than those in Figs. 7(b) and 7(f) by “AlGI”. Meanwhile the real parts of learned dictionaries by the anisotropic version algorithm are quite close to the imaginary parts, which is consistent with the similarity of structure of real and complex parts of ground truth in Figs. 1(f) and 1(g). Therefore, it produces better dictionaries by the anisotropic style in (7). Moreover, due to such better dictionaries learned by “ALGI_aniso”, the representation is much sparser, i.e. the overall sparsity level S(ℜ(α)) + S (ℑ(α)) by “ALGI_aniso” is much higher than by “ALGI”, since S (ℑ(α)) is much smaller by “ALGI_aniso” than by “ALGI”, and corresponding sparsity levels of the real part are almost same in Table 6. It demonstrates that the anisotropic version algorithm can help to improve the image qualities for complex-valued images by learning the better dictionary and sparser coefficient matrix.

Fig. 7 Learned dictionaries for CDP corresponding to the results in Fig. 6. First row: δ = 0.08; Second row: δ = 0.1. From left to right: Real and complex parts of learned dictionaries by ALGI in the first two columns, and by ALGI_aniso in the third and fourth columns.

Download Full Size | PDF

Table 6. Sparsity for CDP on complex-valued images with different L⁰ pseudo norms.

View Table | View all tables in this article

5.2. Ptychographic phase retrieval (PtychoPR) for complex-valued image

5.2.1. Fixed sliding distances

For PtychoPR, a zone plate lens and responding illumination mask are employed as in [13]. Set sliding distance SlidDist = 16. The number of frames is 16 × 16 and the frame size is of 64 × 64, and in such setting total number of measurements m = 16n. Set peak level δ ∈ {0.2, 0.5}. Here only two inner iterations for PALM are sufficient. Set outer iteration number T = 50, τ = 5.0 × 10⁻³, η = 3.0 × 10⁻² for δ = 0.2 and τ = 8.0 × 10⁻³, η = 3.0 × 10⁻² for δ = 0.5. The reconstructed results are shown in Fig. 8, and the corresponding SNRs are shown in Table 7. Inferred from Figs. 8(a) and 8(e), it seems very blurry in the results of “PR” from noisy measurements, since the phaseless data are generated by structured deterministic illumination, and corrupted low frequency parts are worse than high frequency parts. By “TVPR”, at higher noise level δ = 0.2 in the first row of Fig. 8, it can produce results with sharp edges for large scale features and clean background, but cannot preserve the smaller features at all. For the case with peak level δ = 0.5, “TVPR” can produce pretty good recovery results for both smaller and larger scale features. The proposed “ALGI” and “ALGI_aniso” can recover the smaller features very well especially at peak level δ = 0.2. Inferred from Table 7, SNRs are increased about 3.5dB, 4dB for “ALGI” and “ALGI_aniso”, respectively compared with “TVPR” at peak level δ = 0.2; SNRs are increased about 1.5dB, 1.7dB for “ALGI” and “ALGI_aniso” respectively compared with “TVPR” at peak level δ = 0.5. Such gains in SNRs show that our proposed algorithms can produce more accurate results. Similarly to the previous subsection for complex-valued image, the anisotropic version algorithm “ALGI_aniso” has better performances than isotropic version algorithm.

Fig. 8 PtychoPR for complex-valued image with δ ∈ {0.2, 0.5}. First row: δ = 0.2; Second row: δ = 0.5. From left to right: reconstructed images by PR in (a) and (e), TVPR in (b) and (f), ALGI in (c) and (g) and ALGI_aniso in (d) and (h).

Download Full Size | PDF

Table 7. SNRs of PtychoPR for complex-valued image corresponding to Fig. 8

View Table | View all tables in this article

5.2.2. Variable sliding distance

In order to further study the robustness of the proposed algorithms, more experiments with different sliding distances (collecting less data by increasing the sliding distances) are done. Set peak level δ = 0.2. Same parameters as the previous tests are used. Performances for PtychoPR with different number of measurements by increasing SlidDist are shown in Fig. 9, where m/n = 12.25, 9, and 7.56 when the sliding distances SlidDist = 18, 20, 22 respectively. On can see that in the first column of Fig. 9, the results of “PR” are not only blurry, but also contain some visible structured artifacts. “TVPR” can remove such artifacts, and recover some edges of large scale feature. Our proposed algorithms can further recover most of smaller features in Figs. 9(c) and 9(d). In an extreme case with SlidDist = 22 shown in the third row of Fig. 9, the proposed algorithms can also recover some parts of smaller features. We also put the corresponding SNRs in Table 8, where one can see the obvious gains at about 1.3dB, 1.6dB for “ALGI” and “ALGI_aniso” respectively compared with “TVPR”.

Fig. 9 PtychoPR for complex-valued image with different sliding distances (peak level δ = 0.2). From first row to the last row: SlidDist= 18, 20, 22 respectively. From left to right: reconstructed images by PR in (a), (e) and (i), TVPR in (b),(f) and (j), ALGI in (c), (g) and (k), and ALGI_aniso in (d), (h) and (l).

Download Full Size | PDF

Table 8. SNRs of PtychoPR with different sliding distances for complex-valued image corresponding to Fig. 9

View Table | View all tables in this article

5.3. Convergence study

To check the convergence of proposed algorithms, we monitor the histories of SNRs and the successive errors of iterative solution u^k and D^k w.r.t. the iteration number k, which are defined as $\frac{‖ u^{k} - u^{k - 1} ‖}{‖ u^{k} ‖}$ and $\frac{‖ D^{k} - D^{k - 1} ‖}{‖ D^{k} ‖}$ . We show the histories of errors and SNRs in Fig. 10 for CDP on “Peppers”, which implies the proposed algorithms are stable and convergent, which is consistent with the provided theories. It seems that the dictionary converges fast by “ALGII” than by “ALGI”. By inferred from Figs. 10 (c) and 10(f), SNR first increase, and then get stable, which demonstrates that the proposed algorithms are quite robust. Moreover, when the noise level increases, more iterations are needed. It is very interesting and important to give an optimal iteration condition, and we leave it as a future study.

Fig. 10 Convergence histories for successive errors and SNRs on “Peppers”. First row: δ = 5.0 × 10⁻³; Second row: δ = 1.0 × 10⁻². Histories of successive error of dictionaries, the iterative solution u and SNRs from the left to right rows respectively.

Download Full Size | PDF

5.4. Performances w.r.t parameters

Finally we provide some experiments to test the parameter impact, especially on the parameters τ and η which balance sparsity term and data fitting term. We show the impact of “ALGI” on “Peppers” with δ = 5.0 × 10⁻³. Particularly, we select (η, τ) ∈ {η⁰ × 2^−x, η⁰ × 2^−x+1, · · · , η⁰ × 2^x−1, η⁰×2^x}×{τ⁰×2^−y, τ⁰×2^−y+1, · · · , τ⁰×2^y−1, τ⁰×2^y}, with x = y = 4, η₀ = 8.0×10⁻⁶, τ₀ = 4.5 × 10⁻⁴, and plot the corresponding SNRs of the reconstructed images in Fig. 11. One can see that too large or smaller parameters decrease the quality of reconstructed results. In the future, an automatic schemes for optimal parameter selection should be investigated for the best results.

Fig. 11 SNRs for ALGI w.r.t different parameters τ and η.

Download Full Size | PDF

6. Conclusion

In this paper, we have proposed a novel orthogonal dictionary learning based phase retrieval model to denoise the phaseless measurements contaminated by Possion noise, and further designed two efficient algorithms. The conducted experiments demonstrate that both algorithms are capable of producing higher quality results, especially preserving the texture features compared with the TV denoising method. An incoherent dictionary based method and further speedup of the proposed algorithms should be investigated in the future.

APPENDIX

Lemma .1 Assuming that there exist two positive constants e⁻, e⁺, such that

\frac{1}{2} < e^{-} \leq e^{k} \leq e^{+} \forall k,

the sequence {u^k, D^k, α^k} is bounded.

Proof: Since the dictionary D^k is orthogonal, one can readily get boundedness for it. Therefore we can set |(D^k+1)^*R(u^k+1)| ≤ C₀, with a positive matrix C₀ independent with k, and the notation “≤” denotes the pointwise relation for a matrix. By (33), we have

\begin{array}{l} | α^{k + 1} | & \leq | 1 - \frac{1}{e^{k}} | | α^{k} | + \frac{1}{e^{k}} | {(D^{k + 1})}^{*} R (u^{k + 1}) | \\ \leq | 1 - \frac{1}{e^{+}} | | α^{k} | + \frac{1}{e^{-}} C_{0} \\ \leq {| 1 - \frac{1}{e^{+}} |}^{2} | α^{k - 1} | + (| 1 - \frac{1}{e^{+}} | + 1) \frac{1}{e^{-}} C_{0} \\ \leq {| 1 - \frac{1}{e^{+}} |}^{k + 1} | α^{0} | + ({| 1 - \frac{1}{e^{+}} |}^{k} + {| 1 - \frac{1}{e^{+}} |}^{k - 1} + \dots + | 1 - \frac{1}{e^{+}} | + 1) \frac{1}{e^{-}} C_{0} \\ = {| 1 - \frac{1}{e^{+}} |}^{k + 1} | α^{0} | + \frac{1 - {| 1 - \frac{1}{e^{+}} |}^{k + 1}}{(1 - | 1 - \frac{1}{e^{+}} |) | e^{-}} C_{0} . \end{array}

By (35),

| 1 - \frac{1}{e^{+}} | < 1

, and finally one obtains the boundedness of {α^k}.

Denote Z = (u, D, α), Z^k = (u^k, D^k, α^k) with the norm ‖Z‖² = ‖u‖² + ‖D‖² + ‖α‖². The nonincrease of the objective functional can be derived as follows.

Lemma .2 $ϒ (Z^{k + 1}) \leq ϒ (Z^{k}) - \frac{λ^{+}}{2} {‖ Z^{k + 1} - Z^{k} ‖}^{2}$ .

Proof: By Lemma 4.2 and the sufficient decrease property of proximal operator in Lemma 2 of [50], one can readily prove it for the three iterative steps in Eqn. (26)–Eqn. (28).

One can also readily estimate the low bound of the subgradient of ϒ as follows.

Lemma .3 Denote three quantities for the subgradient as

\begin{array}{l} A_{u}^{k} = c^{k - 1} (u^{k - 1} - u^{k}) + \nabla_{u} ℋ (u^{k}, D^{k}, α^{k}) - \nabla_{u} ℋ (u^{k - 1}, D^{k - 1}, α^{k - 1}), \\ A_{D}^{k} = d^{k - 1} (D^{k - 1} - D^{k}) + \nabla_{D} ℋ (u^{k}, D^{k}, α^{k}) - \nabla_{D} ℋ (u^{k}, D^{k - 1}, α^{k - 1}), \\ A_{α}^{k} = e^{k - 1} (α^{k - 1} - α^{k}) + \nabla_{α} ℋ (u^{k}, D^{k}, α^{k}) - \nabla_{α} ℋ (u^{k}, D^{k - 1}, α^{k - 1}) . \end{array}

For

A^{k} = (A_{u}^{k}, A_{D}^{k}, A_{α}^{k})

, we have A^k ∈ ∂ϒ(u^k, D^k, α^k), and ‖A^k ‖ ≤ (3λ̃ + M)‖Z^k − Z^k−1‖, with

\tilde{λ} : = {max}_{k} max {c^{k - 1}, d^{k - 1}, e^{k - 1}, λ_{D}^{+}, λ_{D}^{-}, λ_{α}^{+}}

.

Proof: The first relation can be readily proved by computing the first order optimal condition of (26) – (28). We just estimate the bounded of $A_{D}^{k}$ as a example, and it is similar for the other two.

\begin{array}{l} ‖ A_{D}^{k} ‖ & \leq d^{k - 1} ‖ D^{k - 1} - D^{k} ‖ + ‖ \nabla_{D} ℋ (u^{k}, D^{k}, α^{k}) - \nabla_{D} ℋ (u^{k}, D^{k - 1}, α^{k - 1}) ‖ \\ \leq d^{k - 1} ‖ D^{k - 1} - D^{k} ‖ + ‖ \nabla_{D} ℋ (u^{k}, D^{k}, α^{k}) - \nabla_{D} ℋ (u^{k}, D^{k - 1}, α^{k}) ‖ + ‖ \nabla_{D} ℋ (u^{k}, D^{k - 1}, α^{k}) \\ - \nabla_{D} ℋ (u^{k}, D^{k - 1}, α^{k - 1}) ‖ \\ \leq d^{k - 1} ‖ D^{k - 1} - D^{k} ‖ + λ_{D}^{+} ‖ D^{k} - D^{k - 1} ‖ + λ_{D}^{-} ‖ α^{k} - α^{k - 1} ‖ \\ \leq (d^{k - 1} + λ_{D}^{+}) ‖ D^{k - 1} - D^{k} ‖ + λ_{D}^{-} ‖ α^{k} - α^{k - 1} ‖ . \end{array}

For

A_{u}^{k}

,

A_{α}^{k}

, we have

‖ A_{u}^{k} ‖ \leq c^{k - 1} ‖ u^{k - 1} - u^{k} ‖ + M ‖ Z^{k} - Z^{k - 1} ‖

, and

‖ A_{α}^{k} ‖ \leq (e^{k - 1} + λ_{α}^{+}) ‖ α^{k - 1} - α^{k} ‖

. By summing up the above three estimates, we conclude this lemma.

Funding

National Natural Science Foundation of China (Nos.11501413, 11426165); Natural Science Foundation of Tianjin (No. 18JCYBJC16600); The China Scholarship Council; 2017-Outstanding Young Innovation Team Cultivation Program (No. 043-135202TD1703) and Innovation Project (No. 043-135202XC1605) of Tianjin Normal University; Tianjin Young Backbone of Innovative Personnel Training Program; Program for Innovative Research Team in Universities of Tianjin (No. TD13-5078); Center for Applied Mathematics for Energy Research Applications, a joint ASCR-BES funded project within the Office of Science, US Department of Energy, under contract DOE-DE-AC03-76SF00098.

Acknowledgments

We thank the reviewers for their valuable comments, which helped improve our paper greatly.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References and links

1. A. Ben-Tal and A. Nemirovski, Lectures on modern convex optimization: analysis, algorithms, and engineering applications (Society for Industrial and Applied Mathematics, 2001), vol. 2. [CrossRef]

2. R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of the phase from image and diffraction plane pictures,” Optik 35(2), 237–246 (1972).

3. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. 21(15), 2758–2769 (1982). [CrossRef] [PubMed]

4. H. H. Bauschke, P. L. Combettes, and D. R. Luke, “Phase retrieval, error reduction algorithm, and fienup variants: a view from convex optimization,” J. Opt. Soc. Amer. A 19(7), 1334–1345 (2002). [CrossRef]

5. H. H. Bauschke, P. L. Combettes, and D. R. Luke, “Hybrid projection-reflection method for phase retrieval,” J. Opt. Soc. Amer. A 20(6), 1025–1034 (2003). [CrossRef]

6. V. Elser, “Phase retrieval by iterated projections,” J. Opt. Soc. Am. A 20(1), 40–55 (2003). [CrossRef]

7. D. R. Luke, “Relaxed averaged alternating reflections for diffraction imaging,” Inverse Probl. 21(1), 37–50 (2005). [CrossRef]

8. S. Marchesini, “Invited article: A unified evaluation of iterative projection algorithms for phase retrieval,” Rev. Sci. Instrum. 78(1), 011301 (2007). [CrossRef] [PubMed]

9. Z. Wen, C. Yang, X. Liu, and S. Marchesini, “Alternating direction methods for classical and ptychographic phase retrieval,” Inverse Probl. 28(11), 115010 (2012). [CrossRef]

10. E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015). [CrossRef]

11. Y. Chen and E. Candes, “Solving random quadratic systems of equations is nearly as easy as solving linear systems,” in Adv. Neural. Inf. Process. Syst., 739–747 (2015).

12. P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alternating minimization,” IEEE Trans. Signal Process. 63(18), 4814–4826 (2015). [CrossRef]

13. S. Marchesini, Y.-C. Tu, and H.-T. Wu, “Alternating projection, ptychographic imaging and phase synchronization,” Appl. Comput. Harmon. Anal. 41(3), 815–851 (2016). [CrossRef]

14. C. Ma, X. Liu, and Z. Wen, “Globally convergent Levenberg-Marquardt method for phase retrieval,” preprint.

15. D. Noll and A. Rondepierre, “On local convergence of the method of alternating projections,” Found. Comut. Math. 16(2), 425–455 (2016). [CrossRef]

16. P. Chen and A. Fannjiang, “Fourier phase retrieval with a single mask by Douglas-Rachford algorithm,” Appl. Comput. Harmon. Anal. 44(3), 665–699 (2018). [CrossRef] [PubMed]

17. J. M. Rodenburg and H. M. Faulkner, “A phase retrieval algorithm for shifting illumination,” Appl. Phys. Lett. 85(20), 4795–4797 (2004). [CrossRef]

18. E. J. Candés, Y. C. Eldar, T. Strohmer, and V. Voroninski, “Phase retrieval via matrix completion,” SIAM J. Imaging Sci. 6(1), 199–225 (2013). [CrossRef]

19. I. Waldspurger, A. Aspremont, and S. Mallat, “Phase recovery, maxcut and complex semidefinite programming,” Math. Program., Ser. A 149(1), 47–81 (2015). [CrossRef]

20. T. Goldstein and C. Studer, “Phasemax: Convex phase retrieval via basis pursuit,” IEEE Trans. Inf. Theory 64(4), 2675–2689 (2018). [CrossRef]

21. S. Bahmani and J. Romberg, “Phase retrieval meets statistical learning theory: A flexible convex relaxation,” PMLR 54, 252–260 (2017).

22. S. Marchesini, H. He, H. N. Chapman, S. P. Hau-Riege, A. Noy, M. R. Howells, U. Weierstall, and J. C. Spence, “X-ray image reconstruction from a diffraction pattern alone,” Phys. Rev. B 68(14), 140101 (2003). [CrossRef]

23. H. Chapman, A. Barty, M. Bogan, S. Boutet, M. Frank, S. Hau-Riege, S. Marchesini, B. Woods, S. Bajt, W. Benner, R. London, E. Plonjes, M. Kuhlmann, R. Treusch, S. Dusterer, T. Tschentscher, J. Schneider, E. Spiller, T. Moller, C. Bostedt, M. Hoener, D. Shapiro, K. Hodgson, D. Spoel, F. Burmeister, M. Bergh, C. Caleman, G. Huldt, M. Seibert, F. Maia, R. Lee, A. Szoke, N. Timneanu, and J. Hajdu, “Femtosecond diffractive imaging with a soft-x-ray free-electron laser,” Nat. Phys. 2(12), 839–843 (2006). [CrossRef]

24. M. L. Moravec, J. K. Romberg, and R. G. Baraniuk, “Compressive phase retrieval,” in Optical Engineering Applications, International Society for Optics and Photonics, 120–670 (2007).

25. Z. Yang, C. Zhang, and L. Xie, “Robust compressive phase retrieval via l1 minimization with application to image reconstruction,” arXiv preprint arXiv:1302.0081 (2013).

26. H. Ohlsson, A. Yang, R. Dong, and S. Sastry, “Cprl–an extension of compressive sensing to the phase retrieval problem,” in Adv. Neural. Inf. Process. Syst., 1367–1375 (2012).

27. X. Li and V. Voroninski, “Sparse signal recovery from quadratic measurements via convex programming,” SIAM J. Math. Anal. 45(5), 3019–3033 (2013). [CrossRef]

28. S. Mukherjee and C. S. Seelamantula, “Fienup algorithm with sparsity constraints: application to frequency-domain optical-coherence tomography,” IEEE Trans. Signal Process. 62(18), 4659–4672 (2014). [CrossRef]

29. Y. Shechtman, A. Beck, and Y. C. Eldar, “Gespar: Efficient phase retrieval of sparse signals,” IEEE Trans. Signal Process. 62(4), 928–938 (2014). [CrossRef]

30. M. Iwen, A. Viswanathan, and Y. Wang, “Robust sparse phase retrieval made easy,” Appl. Comput. Harmon. Anal. 42(1), 135–142 (2017). [CrossRef]

31. P. Schniter and S. Rangan, “Compressive phase retrieval via generalized approximate message passing,” IEEE Trans. Signal Process. 63(4),1043–1055 (2015). [CrossRef]

32. G. Wang, L. Zhang, G. B. Giannakis, M. Akcakaya, and J. Chen, “Sparse phase retrieval via truncated amplitude flow,” IEEE Trans. Signal Process. 66(2), 479–491 (2018). [CrossRef]

33. Y. Duan, C. Wu, Z.-F. Pang, and H. Chang, “L0-regularized variational methods for sparse phase retrieval,” arXiv preprint arXiv:1612.02538 (2016).

34. S. Loock and G. Plonka, “Phase retrieval for fresnel measurements using a shearlet sparsity constraint,” Inverse Probl. 30(5), 055005 (2014). [CrossRef]

35. H. Chang, Y. Lou, M. K. Ng, and T. Zeng, “Phase retrieval from incomplete magnitude information via total variation regularization,” SIAM J. Sci. Comput. 38(6), A3672–A3695 (2016). [CrossRef]

36. A. M. Tillmann, Y. C. Eldar, and J. Mairal, “Dolphin-dictionary learning for phase retrieval,” IEEE Trans. Signal Process. 64(24), 6485–6500 (2016). [CrossRef]

37. T. Qiu and D. P. Palomar, “Undersampled Sparse Phase Retrieval via Majorization-Minimization,” IEEE Trans. Signal Process. 65(22), 5957–5969 (2017). [CrossRef]

38. F. J. Anscombe, “The transformation of poisson, binomial and negative-binomial data,” Biometrika 35(3/4), 246–254 (1948). [CrossRef]

39. M. Mäkitalo and A. Foi, “Optimal inversion of the anscombe transformation in low-count poisson image denoising,” IEEE Trans. Image Process. 20(1), 99–109 (2011). [CrossRef]

40. T. Le, R. Chartrand, and T. Asaki, “A variational approach to reconstructing images corrupted by Poisson noise,” J. Math. Imaging Vis. 27(3), 257–263 (2007). [CrossRef]

41. H. Chang, Y. Lou, Y. Duan, and S. Marchesini, “Total variation based phase retrieval for poisson noise removal,” SIAM J. Imag. Sci. 11(1), 24–55 (2018). [CrossRef]

42. P. Thibault and M. Guizar-Sicairos, “Maximum-likelihood refinement for coherent diffractive imaging,” New J. Phys. 14(6), 063004 (2012). [CrossRef]

43. M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process. 15(12), 3736–3745 (2006). [CrossRef] [PubMed]

44. L. Ma, L. Moisan, J. Yu, and T. Zeng, “A dictionary learning approach for poisson image deblurring,” IEEE Trans. Med. Imaging 32(7), 1277–1289 (2013). [CrossRef] [PubMed]

45. R. Giryes and M. Elad, “Sparsity-based poisson denoising with dictionary learning,” IEEE Trans. Image Process. 23(12), 5057–5069 (2014). [CrossRef] [PubMed]

46. I. Tosic and P. Frossard, “Dictionary learning,” IEEE Signal Process. Mag. 28(2), 27–38 (2011). [CrossRef]

47. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54(11), 4311–4322 (2006). [CrossRef]

48. C. Bao, J.-F. Cai, and H. Ji, “Fast sparsity-based orthogonal dictionary learning for image restoration,” in ICCV, 3384–3391 (2013).

49. J. Qian, C. Yang, A. Schirotzek, F. Maia, and S. Marchesini, “Efficient algorithms for ptychographic phase retrieval,” Inverse Problems and Applications, Contemp. Math 615, 261–280 (2014).

50. J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearized minimization for nonconvex and nonsmooth problems,” Math. Program. 146(1–2), 459–494 (2014). [CrossRef]

51. C. Bao, H. Ji, Y. Quan, and Z. Shen, “Dictionary learning for sparse coding: Algorithms and convergence analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 38(7),1356–1369 (2016). [CrossRef]

52. M. Powell, “On search directions for minimization algorithms,” Math. Program. 4(1), 193–201 (1973). [CrossRef]

Images	PR	TVPR	ALGI	ALGII
Peppers	4.80	20.12	20.80	21.19
Fingerprint	5.96	16.43	18.32	18.46
Barbara	4.68	16.27	18.04	18.05
House	5.14	15.04	16.71	16.61
Average	5.15	16.97	18.48	18.58

Images	PR	TVPR	ALGI	ALGII
Peppers	9.65	23.03	24.23	24.27
Fingerprint	11.31	20.14	21.96	22.06
Barbara	9.61	18.67	21.99	21.44
House	10.11	18.48	20.11	19.60
Average	10.17	20.08	22.07	21.84

Images	δ = 5.0×10⁻³		δ = 1.0×10⁻²
Images	ALGI	ALGII	ALGI	ALGII
Peppers	3.9%	5.4%	6.9%	8.6%
Fingerprint	9.8%	15.3%	20.5%	24.3%
Barbara	4.9%	7.7%	13.2%	14.4%
House	8.6%	14.7%	20.8%	22.3%

Images	PR	TVPR	ALGI	ALGII
Peppers	13.6	25.0	221.3	249.0
Fingerprint	12.6	24.6	220.3	250.9
Barbara	12.6	25.1	218.3	249.7
House	12.6	24.8	250.7	246.9
Average	12.9	24.9	227.6	249.1

δ	PR	TVPR	ALGI	ALGI_aniso
0.08	2.91	10.66	11.52	12.72
0.1	4.17	11.81	12.75	14.03

Denoising Poisson phaseless measurements via orthogonal dictionary learning

Abstract

1. Introduction

2. Proposed model

2.1. Review of dictionary learning and Poisson denoising

2.1.1. Dictionary learning and denoising

2.1.2. Regularized Poisson denoising for phase retrieval

2.2. Proposed dictionary learning phase retrieval (DicPR)

3. Algorithms 1: Alternating minimization method (AMM)

3.1. Subproblem w.r.t. the variable u

3.2. Subproblems w.r.t. variables D and α

4. Algorithm 2: Proximal alternating linearized minimization (PALM)

5. Numerical experiments

5.1. CDP for real-valued and complex-valued images

5.1.1. Real-valued Images

5.1.2. Complex-valued Image

5.2. Ptychographic phase retrieval (PtychoPR) for complex-valued image

5.2.1. Fixed sliding distances

5.2.2. Variable sliding distance

5.3. Convergence study

5.4. Performances w.r.t parameters

6. Conclusion

APPENDIX

Funding

Acknowledgments

Disclosures

References and links

Cited By

Figures (11)

Tables (12)

Equations (38)

Optics Express

Isotropic/Anisotropic L⁰	δ = 0.08			δ = 0.1
Isotropic/Anisotropic L⁰	S(α)	S(ℜ(α))	S(ℑ(α))	S(α)	S(ℜ(α))	S(ℑ(α))
ALGI	18.9%	18.9%	18.9%	24.9%	24.8%	24.9%
ALGI_aniso	20.5%	20.5%	0.25%	25.9%	25.9%	0.16%

SlidDist	PR	TVPR	ALGI	ALGI_aniso
18	6.50	8.67	10.83	11.23
20	6.12	8.27	9.24	9.32
22	5.97	8.21	8.86	9.28
Average	6.20	8.38	9.64	9.94

δ	PR	TVPR	ALGI	ALGI_aniso
0.2	6.85	8.91	12.35	12.84
0.5	8.25	15.03	16.52	16.75

δ	PR	TVPR	ALGI	ALGI_aniso
0.2	6.85	8.91	12.35	12.84
0.5	8.25	15.03	16.52	16.75