## Abstract

The incremental gradient approaches, such as PIE and ePIE, are widely used in the field of ptychographic imaging due to their great flexibility and computational efficiency. Nevertheless, their stability and reconstruction quality may be significantly degraded when non-negligible noise is present in the image. Though this problem is often attributed to the non-convex nature of phase retrieval, we found the reason for this is more closely related to the choice of the step-size, which needs to be gradually diminishing for convergence even in the convex case. To this end, we introduce an adaptive step-size strategy that decreases the step-size whenever sufficient progress is not made. The synthetic and real experiments on Fourier ptychographic microscopy show that the adaptive step-size strategy significantly improves the stability and robustness of the reconstruction towards noise yet retains the fast initial convergence speed of PIE and ePIE. More importantly, the proposed approach is simple, nonparametric, and does not require any preknowledge about the noise statistics. The great performance and limited computational complexity make it a very attractive and promising technique for robust Fourier ptychographic microscopy under noisy conditions.

© 2016 Optical Society of America

## 1. Introduction

Ptychography is a computational imaging approach in which a limited-size illumination probe is moved sequentially across an extended object, while the resulting diffraction patterns are collected [1,2]. With sufficient overlap between these illumination spots, the acquired images can be processed into an estimate of the object’s complex function (i.e., its amplitude and phase) with high sensitivity and a larger field of view than a single recorded diffraction pattern. Recently, the counterpart of (conventional) ptychography in the Fourier space, termed Fourier ptychography microscopy (FPM) has been introduced [3]. FPM shares its root with non-interferometric phase retrieval [4,5] and interferometric synthetic aperture imaging [6–8]. Instead of spatially scanning the object with a narrow illumination beam, it uses variable-angle illuminations provided by a light-emitting diode (LED) array and does not involve any moving parts. The captured low-resolution images from different illumination angles can be synthesized in the Fourier space to surpass the diffraction limit of the objective lens and thus boost the space-bandwidth product of the microscope. Major advantages of the ptychographic approaches are linked to the redundancy of the acquired data: many inaccurately known parameters and that can be retrieved simultaneously with the object complex function, such as the probe (pupil) function [9–12], probe (LED) positions [13–16], and mixed coherent states of the incoming illuminations [17–21].

By adopting the concepts in phase retrieval [4,5], ptychography uses an iterative reconstruction process to recover the ‘lost’ phase information of the sample by updating functions back and forth between the real and Fourier spaces. The original solutions to ptychography used Gerchberg-Saxton [1, 4], or incremental gradient descent based techniques (termed PIE and ePIE) [2, 9], which belong to a type of alternating projections (AP). Ideally, the successive iterations should eventually reach a solution that the constraints resulting from the overlapping condition and the intensity measurements can be satisfied simultaneously. However, in the presence of non-negligible noise, such a reasonable solution cannot be achieved as different intensity patterns are not anymore consistent one with another, leading to degraded reconstruction quality. The situation is even worse for FPM because the experimental data contains mainly darkfield images from high-angle illuminations that larger than the objective numerical aperture allows. The high-angle darkfield illuminations usually do not produce images with sufficient intensity, leading to a low signal-to-noise ratio (SNR). The poor performance of AP strategies has often been attributed to the non-convex nature of phase recovery problem since few provable global convergence guarantees exists even under ideal (noise-free) conditions [5, 22–24]. Multiple algorithms employing the difference map [11], nonlinear optimization [10], Gauss-Newton search [15,22], Wirtinger flows [24], or convex lifting [23] have been proposed to either achieve better convergence properties, improve the robustness towards noise, or both. However, those methods generally reside on expensive processing requirements, making them less appealing from a computational point of view.

Generally, the above mentioned solutions for solving the ptychographic phase retrieval problem can be classified into two categories: the incremental (sequential) algorithms (like the PIE and ePIE, which only update a small patch of the object function in each iteration) and the global algorithms (like the difference map, nonlinear optimization, Wirtinger flow etc., which attempt to update the entire object function in one iteration). The global algorithms have been found to be connected to the gradient descent methods (also known as steepest descent), and for which the convergence behavior (especially for convex sets) has been well studied. Various more advanced strategies in the optimization community, like the conjugate gradient, high-order derivative (Gauss-Newton), and line search have been introduced to improve their performance, which has been well summarized in [22] for ptychography and [15] for FPM. On the other hand, to the best of our knowledge, the incremental algorithms (PIE and ePIE) are more favored and widely used for their fast convergence, computational efficiency and low memory cost, especially for dealing with large dataset. Though their susceptibility to noise is well-noticed, the incremental approaches offer great flexibilities to be adapted to more complicate mathematical models of both object and imaging system for many advanced applications where great solution accuracy is not of paramount importance [20,21,25,26]. By contrast, very few general results are known for convergence behavior of incremental approaches, and the gathered knowledge is problem-specific and often of an empirical nature.

In this paper, we focus on the convergence behavior of incremental ptychographic solutions and address precisely the problem of their performance degradation under noisy conditions. After reviewing the state-of-the-art methods for solving ptychographic phase retrieval problem from a numerical analysis point of view, we found that the PIE and ePIE algorithms can be viewed as *incremental gradient descent approaches* in the numerical optimization community. The difference between the incremental gradient descent approaches and the more well-known (global) gradient descent approaches is that the former use an “approximate gradient” computed from only a part of the data. Despite faster initial convergence rate, incremental gradient approaches are typically more sensitive to noise and frequently cannot converge to a reasonable solution. We prove that this is not only because they are first order gradient methods or the phase retrieval problem is non-convex, but more importantly, because they require *a diminishing step-size* for convergence even in the convex case (which is not so for global gradient methods). This important property remained unnoticed in the ptychography and FPM literature, where the step-size is often taken to be a large constant (e.g. 1) [2,9,12,13]. The necessity of a diminishing step-size for convergence further motivates the adaptive step-size strategy we presented here. By simply introducing a variable step-size instead of a fixed one, the convergence behavior of the algorithm can be improved, and meanwhile, its tolerance to noise can be significantly enhanced. Furthermore, as a direct extension of the original PIE and ePIE, the proposed scheme is computationally simple and extremely easy to use. Numerical simulations and real experiments in the context of FPM are carried out to confirm and validate the performance of the proposed approach. It should also be emphasized that while we focus on specifically the FPM in this work, our approach can be directly extended to conventional ptychography via variable substitution.

There are two important issues about the proposed approach need to be clarified. First, we have recognized that the idea of adaptive step-size (sometimes is referred to ‘feedback parameter’ or ‘relaxation parameter’) is well known in the optimization and signal processing community. Even in the field of coherent diffractive imaging, many iterative phase retrieval algorithms were greatly improved through the introduction of adaptive feedback. For example, the Relaxed Averaged Alternating Reflection (RAAR) algorithm uses a relaxation parameter for steering the iteration and improving the phase retrieval quality [27]. In shrink-wrap algorithm, a feedback parameter is introduced in real space to dynamically tighten the support constraint [28]. And finally, a feedback parameter is used in ptychographic to control the refinement rate for solving position uncertainties [14]. However, in this work, we propose and analyze for the first time the adaptive step-size strategy for incremental solutions of ptychographic imaging. A convergence analysis of incremental gradient descent approaches for convex sets is also provided, which provides guidelines for the choice of step-size and makes our adaptive step-size rule mathematically tractable. Second, the convergence analysis of incremental gradient descent approaches described in this work is based on the framework of convex optimization, while the phase retrieval problem is inherently non-convex. For non-convex sets, the convergence behavior of incremental gradient descent approaches is much more difficult to predict. However, reformulating the phase retrieval and ptychography in the context of convex optimization does provide a powerful theoretical framework to gain insight and, potentially, to improve existing algorithms based on a sound mathematical theory [22, 27, 29]. Besides, in both ptychography and FPM, the measurement space modulus constraints from overlapping scan positions appear to be stringent enough to cause the optimization problem to be *almost convex*. As verified through extensive numerical simulations and experiments, the proposed adaptive step-size rule, along with those convergence properties described in this paper does not only has theoretical significance, but also has been proved to be pragmatic.

The remainder of this paper is organized as follows: In Section 2, we formulate the ptychographic imaging as an unconstrained nonlinear minimization problem. Some of the state-of-the-art solutions for ptychographic imaging are reviewed in Section 3, and they can be classified into two categories: the global gradient approaches and incremental gradient solutions. By analyzing the convergence behaviors of incremental gradient solutions, an adaptive step-size strategy is introduced in Section 4 to improve the stability and robustness of the reconstruction under noisy conditions. Simulations and experimental results are presented in Section 5 and Section 6, respectively, and this paper ends with a discussion and the conclusions in Section 7.

## 2. Problem formulation

The experimental configuration and data acquisition process for FPM measurements can be found in the literature [3,19,23] and will not be detailed here. In FPM, the problem to be solved is to recover the high-resolution complex field (including both amplitude and phase) of the object from a number of low-resolution intensity measurements with different illumination angles. For a finite set of illumination vectors **u*** _{i}* = (

*u*,

_{x}*u*),

_{y}*i*= 1, 2, ...,

*N*, we denote each intensity image by:

**r**is real-space coordinate vector and

*ℱ*is the 2D Fourier transform. The Fourier transform of the object function

*O*(

**u**), is firstly shifted by the illumination vector

**u**

*, then confined by the pupil function*

_{i}*P*(

**u**), and finally inverse Fourier transformed back to the real-space to form a low-resolution image

*I*(

_{i}**r**) captured by the camera. To solve

*O*(

**u**), most of current algorithms can be derived by formulating FPM as the following optimization problem [15,22]:

*M*pixels) and $\mathbf{O}={\left\{{O}_{m}\right\}}_{m=1}^{L}$ (with

*L*pixels).

**P**

*is an*

_{i}*M*×

*L*matrix determined by the pupil function

*P*(

**u**) that extracts a sub-spectrum containing

*M*pixels out of the whole

*L*pixels. We thus have the vectorized cost function:

**F**is the matrix representation of discrete Fourier transform, and

**g**

*≡*

_{i}**F**

^{−1}

**P**

_{i}**O**. The error metric given by Eq. (3) is also termed as the real-space error, which quantifies how close the current estimate fits the input data as a whole. Since the estimated high-resolution object complex field will be of higher space-bandwidth product than the raw image, we have

*L*>

*M*.

## 3. Global and incremental gradient solutions of Fourier ptychographic imaging

Most of the state-of-the-art solutions for ptychographic imaging are closely related to standard gradient-based iterative optimization algorithms in the numerical optimization community. These iterative algorithms can be split into two families: the global gradient approaches and incremental gradient solutions. We explore these algorithms by applying them to some two dimensional optimization problems in order to gain some intuition on their characteristics and behaviors.

#### 3.1. Global gradient based solutions of Fourier ptychographic imaging

To minimize Eq. (3) using iterative optimization techniques, we often need to evaluate the gradient of the cost function. To facilitate the gradient calculation, we reformulate the error metric of Eq. (3) in Fourier space based on Parseval’s theorem [5]

with where the**Ψ**

*is updated sub-spectrum; ${\mathrm{\Pi}}_{m\to \sqrt{\mathbf{I}}}$ operator is known as the real-space modulus projection ( ${\mathrm{\Pi}}_{m\to \sqrt{{\mathbf{I}}_{i}}}({\mathbf{P}}_{i}O)=\mathbf{F}\text{Diag}\left({\mathbf{g}}_{i}{\left|{\mathbf{g}}_{i}\right|}^{-1}\right)\sqrt{{\mathbf{I}}_{i}}$), which simply replaces the amplitude of*

_{i}**g**

*with the measured $\sqrt{{\mathbf{I}}_{i}}$ while conserving the phase. The gradient of*

_{i}*ε*with respect to

**O**can be expressed as

*ε*= 0), as in the difference map approach [11]), or alternatively, use gradient-based search algorithms, like the gradient descent [22], conjugate gradient [10], and Gauss-Newton methods [15,22]. For the simplest gradient descent method, the object function is updated based on the following equation

*k*is the iteration number, and

*α*is the step-size. It should also be noticed that other than Eq. (3), there also exists other forms of cost function that evaluate the difference of the estimated and measured intensity function, or explicitly take the statistics of the noise into account [15,31,32].

#### 3.2. Incremental gradient solutions of Fourier ptychographic imaging

In the above mentioned algorithms, the object function is updated globally with use of the full set of images for each iteration. However, when the number of intensity images is very large, those global methods may be ineffective because the size of the dataset makes each iteration very costly. Therefore, the incremental approaches [2, 9, 12, 19], which use only one intensity measurement for each update instead of the total data, may be more attractive. The incremental approach only operates on a single component of the cost function at each iteration

The corresponding gradient can be expressed as**O**, the optimization cycles through each intensity measurement

**I**

*in sequence, based on the following gradient-based search algorithm*

_{i}*α*is the step-size, which is often set to

^{k}*α*=

^{k}*α*≡ 1 in most literature [2, 9, 12, 13]. Note that one full iteration (so called one cycle,

*k*←

*k*+ 1) comprises

*N*sub-iterations running over the whole dataset. A small constant

*γ*is included in Eq. (12) to avoid divide-by-zero problem when |

*P*|

_{m,i}^{2}is too small.

*β*is a scaling factor that weights the well-transferred spectrum component while rejects the unreliable data falling outside the pupil

_{m,i}## 4. Adaptive step-size scheme for incremental gradient solutions

Though the incremental gradient method is closely related to the conventional gradient descent method [which minimizes Eq. (4) instead of sequentially minimizing Eq. (9)], their convergence behaviors are quite different. For conventional gradient descent algorithm, it can be proven that the algorithm can finally converge to at least a local optimum of *ε* with a linear rate if the step-size *α* is less than 2 (in the convex setting, all local minima are also global minima, so in this case it can converge to the global optimum) [30,34]. However, for incremental approaches, we usually observe that it can find a relatively correct solution in early iterations, but then start to exhibit limit-cycle like behavior. The oscillatory behavior is even severer for noisy data since the intensity images are inconsistent with each other. To understand the convergence mechanism of incremental gradient methods, let us assume that the component functions *ε _{i}* are selected for iteration according to a cyclic order [i.e.

*i*← (

*i*modulo

*N*) + 1] and

*α*is constant within a cycle (i.e. for all $k=0,1,\dots ,{\alpha}_{1}^{k}={\alpha}_{2}^{k}=\dots ={\alpha}_{N}^{k}$). For each iteration, the incremental methods use an approximate gradient computed from only one component of the entire data. Thus, one incremental iteration (

^{k}*i*←

*i*+ 1) tends to make reasonable progress only in the “average” sense (viewed from one whole cycle instead of single sub-iteration). Viewing the iteration Eq. (11) in terms of cycles, we have for every

*k*that marks the beginning of a cycle ( ${\mathbf{O}}^{k}={\mathbf{O}}_{1}^{k}={\mathbf{O}}_{N+1}^{k-1}$)

*e*is

^{k}*ε*(

**O**

*). For Lipschitz continuous gradient functions ∇*

^{k}*ε*, the error

_{i}*e*is proportional to the step-size

^{k}*α*[34,35], and this shows two fundamental convergence properties of incremental gradient approaches:

^{k}A constant step-size (e.g.*Property A:**α*= 1 as in most literature) typically cannot guarantee convergence, since then the size of the gradient error ‖*e*‖ is typically bounded away from 0. Instead, a peculiar form of convergence takes place for constant but sufficiently small_{k}*α*(see Lemma 1 of Appendix A), whereby the iterates within cycles converge but to different points within a sequence of*N*points (i.e., the sequence of first points in the cycles converges to a different limit than the sequence of second points in the cycles, etc).A diminishing step-size leads to diminishing error*Property B:**e*, so it can result in convergence to a stationary point of^{k}*ε*. The basic idea is that with a constant step-size*α*we can get to within an*O*(*α*)-neighborhood of the optimum (see Lemma 1 and Lemma 2 of Appendix A), so with a diminishing step-size*α*, we should be able to reach an arbitrarily small neighborhood of the optimum. Appendix A presents a convergence proof of the incremental gradient solution Eq. (11) with a diminishing step-size in the convex case. The algorithm can finally converge to a minimizer if lim^{k}_{k→∞}*α*= 0, ${\sum}_{k=0}^{\infty}{\alpha}^{k}=\infty $, and ${\sum}_{k=0}^{\infty}{\left({\alpha}^{k}\right)}^{2}<\infty $.^{k}

*ε*(

*ε*and its each component

*ε*are convex, ∇

_{i}*ε*and ∇

*ε*are Lipschitz continuous). For non-convex problems, gradient based approaches can only converge to a local minimum. Although the convex optimization theory cannot completely analyze the ptychographic phase retrieval problem, which is non-convex, it is in general much simpler and can indeed predict several important properties of the algorithm.

_{i}The two convergence properties reveal that the choice of the step-size *α* plays an important role for the success of incremental gradient methods, which is a key point that is often overlooked when using PIE and ePIE-type solutions for FPM reconstruction. According to Property A, though the routinely used unity step-size can achieve a much faster convergence rate than its global counterpart at early stage (when far from the solution, the evaluation of a single gradient component is sufficient for generating an approximate gradient direction [34]), it typically makes the algorithm strongly oscillatory in the neighborhood of a solution and thus may not converge to a stationary point (when getting close to the solution, a single component gradient can no longer well approximate the global gradient). In fact, the step-size *α* controls the amount of feedback from the previous estimate of the algorithm. When *α* = 1, the incremental gradient method is actually identical to the alternating projection algorithm with multiple constraint sets defined by each sub-aperture image. In such case, at one given pupil position, the estimate is updated fully by the measured intensity at that illumination angle. However, for FPM, in most cases the object is under dark-field illuminations, the low-level of irradiance produces individual measurements that are heavily affected by noise. In such condition, one single image should not fully constrain the estimate at a given pupil position. When a unity step-size is used, it can be observed that the algorithm can quickly find a relatively correct (but noisy) estimate, but start to loop around after some iterations because the intensity dateset is inconsistent, as a consequence of noise [31]. At one given pupil position, the algorithm tends to ‘undo’ the progress made at the previous update, reintroducing fully the noise within the associated intensity image. In contrast, the global gradient descent methods consider the complete cost function Eq. (4) and evaluate the gradient with use of the full set of images for each iteration. Despite much slower converge in early iterations and more involved computation, they can finally converge to a more reasonably balanced and globally consistent solution.

Property A also suggests using a sufficiently small step-size to stabilize the incremental algorithm and ensure convergence, as the size of the oscillation is roughly proportional to *α*. The relaxation in the projections effectively prevents any single component from altering the current estimate too far towards an incorrect result. From another angle to explain, using a small step-size less than one tends to make the incremental solution more ‘global’, resulting in better “subset gradient balance”: it is possible to allow the information from the overlapping pupil positions to blend together to reinforce the true signal while reduce the influence of noise. It is also interesting to note that the parameter *γ* (sometimes called regularization parameter) in Eq. (12) can also be tuned to achieve the similar noise-reduction effect, as demonstrated in [15,19] (if *γ* is large and dominates the denominator, we have *γ* ≈ 1/*α*). Even though, the question of determining a proper *α* throughout still remains: a large *α* is insufficient to stabilize the resulting asymptotic oscillation, while a too small *α* may significantly slow down the convergence, which compares unfavorably even with the linear rate of convergence associated with its non-incremental counterpart.

Instead of using a constant step-size, referring to Property B, an adaptive step-size (sequence) *α ^{k}* is introduced in this work to improve the performance of incremental ptychographic solutions. Since the sufficient conditions on step-size

*α*for global convergence to a minimizer in the convex setting are (i) lim

^{k}_{k→∞}

*α*= 0, (ii) ${\sum}_{k=0}^{\infty}{\alpha}^{k}=\infty $, and (iii) ${\sum}_{k=0}^{\infty}{\left({\alpha}^{k}\right)}^{2}<\infty $, the critical issue in practical will be how to determine a suitable sequence

^{k}*α*to get close to a solution within fewer iterations. Among the three conditions, (i) and (iii) require that the

^{k}*α*should diminishes to zero, which are easy to satisfy. However, shrinking the step-size to zero can only ensure that the algorithm converges to a point (as ${\mathbf{O}}_{i+1}^{k}={\mathbf{O}}_{i}^{k}+{\alpha}^{k}{\mathbf{W}}_{i}\nabla {\epsilon}_{i}\left({\mathbf{O}}_{i}^{k}\right)$ goes to ${\mathbf{O}}_{i+1}^{k}={\mathbf{O}}_{i}^{k}$ in the limit

^{k}*α*→ 0), which does not mean the algorithm found the desired optimum point. It is obvious that if the step-size shrinks too fast,

^{k}**O**may converge to a point that isn’t a minimum, especially when the initial point is sufficiently far from it. Thus, the condition (ii) ${\sum}_{k=0}^{\infty}{\alpha}^{k}=\infty $ should further be satisfied to ensure

*α*not being reduced too fast so that the algorithm can travel infinitely far if necessary. One simple and straightforward choice satisfying those conditions is:

^{k}*η*is a small constant which should be much less than 1. By comparing the (global) error function for

**O**

*and*

^{k}**O**

^{k−1}obtained in consecutive cycles, we keep the step-size fixed as long as non-trivial progress is made (determined by

*η*), while halve its value if otherwise. Starting from

*α*

^{0}= 1 as in PIE and ePIE, the algorithm can make large progress to get close to a solution at early iterations. Then the inconsistent information from adjacent pupil regions resulting from noise and model mismatch can be gradually reconciled as

*α*decays. Finally, the algorithm will converge to a stationary point quickly when the step-size reach a pre-specified minimum (a constant that small enough so that the asymptotic oscillation is of no essential concern). The small parameter

^{k}*η*defines the threshold of a non-trivial ‘progress’, which guarantees that the step-size reduces only when necessary [the algorithm gets near a (local) minimum and starts to oscillate]. We further found that the performance of the adaptive size strategy is not quite sensitive to the choice of

*η*. A reasonably good result can always be obtained by fixing

*η*= 0.01, which makes our approach nonparametric and very easy to use in practical applications.

It should be emphasized again that the convergence proof provided here is restricted to convex sets only, while the phase retrieval problem is non-convex. For non-convex optimization problems, the exact convergence properties of the incremental gradient solutions remain elusive. However, in a practical setting of FPM, the convergence behaviors derived from the convex optimization seem still applicable, and the adaptive step-size approach indeed provides significant improvements over the constant step-size on convergence rate and reconstruction accuracy. More importantly, those improvements *do not trade with the computational efficiency* of the incremental gradient approaches. Note that the error metric *ε ^{k}* in Eq. (16) is often used for monitoring the progress or the convergence of standard PIE and ePIE algorithms, which needs to be calculated after each full sub-iteration of the algorithm [2,9]. While in our approach, after each full sub-iteration of the algorithm,

*ε*can be further used to determine the step-size for the next cycle. Therefore, our adaptive step-size scheme can be view as an elegant extension of the original PIE and ePIE

^{k}*at no extra computational cost*. Moreover, the (global) error metric

*ε*can be further approximated by the sum of the incremental error metric ${\epsilon}^{k}\approx {\sum}_{i}{\epsilon}_{i}^{k}$, which can be accumulated during the incremental iterative process. Through this way, the computational cost can be further reduced (at the cost of lower accuracy in the error estimates), making it more suitable for time-critical applications where the execution time of algorithm should be minimized.

^{k}## 5. Simulations

We first validate the adaptive step-size scheme using simulations. The simulation parameters were chosen to realistically model an FPM platform, with an incident illumination wavelength of 626nm, an image sensor with pixel size of 6.5*μ*m, and an 4X objective with *NA _{obj}* of 0.1. We simulated the use of the central 15 × 15 LEDs in the array placed 87.5mm beneath the sample, and the distance between adjacent LEDs is 4mm. The raw low-resolution data is limited to a small region with only 64 × 64 pixel resolution, and the final high-resolution complex field with 256 × 256 pixel is recovered by different approaches. The true high-resolution complex function is created from the images shown in Figs. 1(a) and 1(b) for its intensity and phase. Besides the idealized situation, each low resolution image were corrupted with Gaussian noise with different variances. Since the signal power of darkfield ones images from high-angle illuminations is much weaker than that of the brightfield images, they suffer more from the noise, which is in accord with the actual experimental conditions. To illustrate the level of noise more intuitively, six typical noisy darkfield intensity images with a high illumination angle [labelled by the blue circle in Fig. 1(c)] are given in Fig. 1(d)–1(i). The noise level is quantified by the average mean absolute error (MAE), defined as AMAE = 〈|

**I**

*−*

_{n}**I**|〉/〈

**I**〉, where 〈

**I**〉 is the mean value of all noise-free darkfield intensity images and 〈|

**I**

*−*

_{n}**I**|〉 is the averaged mean absolute error of the corresponding noisy images. The metric MAE is also used to quantify amplitude and phase reconstruction accuracy.

There are two issues worth mentioning about the practical implementation of the FPM reconstruction algorithms. The first one is the initialization of the high-resolution complex field. Here, all of the algorithms use the same initial guess - the square root of the upsampled on-axis-illuminated intensity, which provides a good approximation to the high-resolution object function. We found that for incremental gradient approaches, the choice of initial value has little impact on the final reconstruction result. But a good initial value that closer to a solution does accelerate the convergence. Another one is the updating order of each sub-pupils in the Fourier domain. Three choices of updating sequence were suggested in [36]: (1) random order; (2) NA order (the sequence is ranked by the illumination NA); (3) energy order (the sequence is ranked by the total intensity value). In all simulations and experiments of this paper, the NA order was used, which starts at the center of the Fourier space and gradually expands outwards. It is also found that the NA order and the energy order produce almost the same results because the two sequences are quite similar because the energy of the sample in the Fourier domain typically decreases with the spatial frequency. Furthermore, the choice updating sequence does not fundamentally change the property and converge behavior of the incremental approach since no specific requirements on the order of updating imposed in the converge analysis. But it is essential to include all components (images) in the fixed deterministic order. If the component images are selected in a randomized order, then the incremental gradient method turns into the stochastic gradient method and the convergence rate is found to be slower than the deterministic (NA or energy) orders, as demonstrated in [36].

#### 5.1. Performance comparison of fixed step-size and adaptive step-size approaches

First we compare the performance of incremental gradient approaches with different fixed step-sizes as well as our adaptive step-size scheme (note that the PIE and ePIE are exactly the same when there is no pupil estimation). Figure 2 plots the amplitude and phase reconstruction accuracy as a function of intensity noise. Both the adaptive step-size method and standard PIE method were implemented with several choices of step-size (*α*=1, 0.5, and 0.05, each algorithm was running for 100 iterations). It can be seen when there is no noise, perfect reconstruction can be achieved no matter the choice of the step-size. In the presence of noise, the red and green curves rise rapidly, indicating large *α* values (1, 0.5) are relatively sensitive to noise. Besides, the reconstructions suffer from a significant degree of cross-talk between the phase and amplitude part of the complex field. A smaller *α* preforms comparatively better at the expense of much slower convergence rate. The adaptive step-size method always performs better than other methods using a fixed step-size, and it reaches good solutions with minimum cross-talk between the phase and amplitude.

To better explain the small error achieved by the adaptive step-size method, we show the one typical group of phase error trajectories under 40% Gaussian noise in Fig. 3. Similar behavior can be observed for intensity error curves as well. It is seen that the adaptive step-size method decreases more rapidly than fixed step-size methods and converge in the early 20 iterations. For fixed step-size methods, the curves reach their respective minima in succession and then overshot. They finally ‘converge’ to a limit cycle with the oscillation amplitude roughly proportional to *α*, as predicted by the theory. From these results, we can safely conclude that the adaptive step-size method outperforms the fixed step-size methods, with both faster convergence rate and lower mis-adjustment error simultaneously achieved.

Though the Gaussian noise model is assumed in the above simulations, the adaptive step-size approach can also be well applied to the case of Poisson noise. Figure 4 shows the final reconstruction accuracy for intensity MAE of 5% to 50%, which corresponds to an average of 10,000 to and 100 photons, per detector pixel. It can be seen that for the same noise level, the reconstruction errors under Poisson noise are smaller, however, the general trend of the error curves does not differ significantly to the case of Gaussian noise.

#### 5.2. Comparison of the adaptive step-size approach with the state-of-the-arts under noisy conditions

Next, we compare the proposed adaptive step-size approach with four state-of-the-art FPM reconstruction algorithms: (1) Gerchberg-Saxton [3, 12], (2) (sequential) Gauss-Newton [19], (3) Difference map [11], (4) Wirtinger flow [24]. The Gerchberg-Saxton and sequential Gauss-Newton are both incremental gradient approaches but the latter introduces an additional regularization parameter *γ*, which we use *γ* = 5 as suggested in [15]. The difference map and Wirtinger flow belong to global gradient approaches. The difference map approach is originally applied to conventional ptychography [11], and here we adapt it to the content of FPM and directly use the alternating projection instead of the hybrid input-output algorithm for Fourier-space updating. The Wirtinger flow algorithm optimizes the intensity-based cost function rather than the amplitude-based one [Eq. (3)] with two adjustable parameters (*k*_{0} and *θ _{max}*) in step-size to stabilize the convergence process. Here we use (

*k*

_{0}= 330 and

*θ*= 0.4) as recommended in [24].

_{max}The four state-of-the-arts and the proposed adaptive step-size algorithm are compared under noisy conditions (50% Gaussian or Poisson noise). The convergence curves as well as the reconstructed amplitudes and phases are given in Fig. 5 for the case of Gaussian noise and in Fig. 6 for the case of Poisson noise. To study the convergence speed and computational complexity for each algorithm, their iteration numbers and total runtime required for convergence are compared in Table 1. These tests were made with an Intel i7 2.6 GHz processor and 16 Gbyte RAM in conjunction with MATLAB’s cputime function. For different algorithms, the iterative process continues to run until it converges or stagnates. This allows to demonstrate their final reconstruction accuracy but not increase unnecessary running time for a fair comparison. Close inspection of the results from Figs. 5 and 6 reveals that the convergence behaviors of Gerchberg-Saxton, Gauss-Newton, and difference map approaches are quite similar to the standard PIE algorithm with a large, medium, and small step-size, respectively. This is understandable because when there is no pupil estimation, Gerchberg-Saxton algorithm is exactly the same as the standard PIE with a fixed step-size *α* = 1. The Gauss-Newton algorithm improves the robustness of the algorithm towards noise with the addition of the regularization parameter *γ*. Strictly speaking, when the regularization parameter *γ* is much larger than 1 as in [15, 19] and in our case, it should not be termed as ‘regularization parameter’ (since typically regularization parameters should be much less than 1). Actually, it is equivalent to use a constant but reduced step-size (*α* = 1/6) for the standard PIE updating process. If a very small step-size is used (like *α* = 0.05), the convergence behavior of the incremental gradient approach approximates the global one (difference map), which can achieve lower reconstruction error at the expense of larger iteration number and longer runtime. Different from the other four approaches, the Wirtinger flow algorithm performs better for the case of Gaussian noise and achieves the lowest reconstruction error. This is because the intensity-based cost function used in Wirtinger flow algorithm assumes implicitly a Gaussian noise model [15], while the other four approaches minimize the amplitude-based cost function, which works better for the case of Poisson noise. Besides, the Wirtinger flow algorithm requires more than 1000 iterations to converge, and thus the total runtime is much longer than the other four approaches. In the case of the Gaussian noise, the proposed adaptive step-size method effectively prevents the overshooting and achieves the lowest error among the four amplitude-based approaches. While in the case of the Poisson noise, it takes less than half number of iterations to achieve a similar accuracy as the global approach (difference map). Though the Gercherg-Saxton is the fastest convergent in term of iteration number and runtime, it is the most sensitive to noise and tends to produce a noisy result with strong artifacts. Therefore, the proposed adaptive step-size method can be regarded as the most efficient one if both the reconstruction accuracy and the convergence speed are considered.

## 6. Experiments

We experimentally verify the adaptive step-size strategy based on a real FPM platform. The experimental setup generally followed our simulation parameters, except that a denser LED array is used. The lateral distance between two adjacent LEDs is 2.5mm, and the LED array is controlled by a self-developed Field-Programmable Gate Array (FPGA) circuit. For all the experiments reported in the following, the central 21×21 LEDs in the array are used to illumination the sample sequentially, resulting in a final synthetic *NA _{syn}* of ∼ 0.45 in both x and y direction. The synthetic NA equals to the sum of the

*NA*of the objective and the

_{obj}*NA*of the illumination at the largest angle [37]. To achieve robust image reconstructions, a multi-exposure acquisition scheme is usually required to capture high dynamic range (HDR) images for both bright-field and dark-field illuminations [3]. However, in our experiments, all images are taken under the same exposure that accommodates the high intensity values of brightfield images to prevent sensor saturation and highlight the effect of noise.

_{ill}#### 6.1. Experimental results of an USAF resolution target

An USAF resolution target was first imaged to quantify the resolution improvement for each algorithm. The small portion of interest [128×128 pixels, shown in Fig. 7(b)], which contains the smallest group of feature (group 9 element 3, 1.56*μ*m line pair spacing), is extracted from the full FoV image [Fig. 7(a)]. It can be seen from Fig. 7(b) that when we turn on the central LED only, the minimum resolvable feature is group 7 element 3 (6.2*μ*m line pair spacing), which coincides with the well-known Rayleigh resolution limit of a conventional optical microscope under coherent illumination (*λ/NA _{obj}*). The maximum expected resolving power after FPM reconstruction is

*λ/NA*≈ 1.4

_{syn}*μ*m, which is sufficient for resolving the smallest feature in the USAF target. When dealing with experimental data, it is not always possible to obtain the true object complex function, so the MAE cannot be calculated. However, the normalized real-space error metric of the capture intensity images can be used as a measure or indicator of the reconstruction accuracy. The index is calculated as follows:

The low-resolution image set corresponding to the small tile was recovered with different strategies as in our simulation. The recovered high resolution images with fixed (*α*=1, 0.5, and 0.05) and adaptive step-sizes are shown in Figs. 7(d)–7(k), respectively. The corresponding error metric *E* [Eq. (17)] as a function of iteration number is plotted in Fig. 8. For a large fixed step-size *α*=1, strong artifacts are superimposed on the final reconstruction result, which not only smear the background but also distort several small-scale features [Figs. 7(d) and 7(h)]. Artifacts can be alleviated by reducing the step-size to 0.5, but the two smallest groups of feature (group 9 elements 2 and 3) are still not well resolved [Figs. 7(e) and 7(i)]. A small fixed step-size *α*=0.05 produced a much cleaner result with very little background noise at the expense of slower convergence rate [Fig. 7(f)]. Besides, as is shown in the magnified area [Fig. 7(j)], the high-spatial frequency details seem to be over-smoothed by the algorithm, resulting in blurred numbers and line structures at group 9 elements 2 and 3. These results indicate an improper choice of step-size can lead to blurring in high-frequency details if the step-size is too small or sub-optimal removal of noise artifacts if the step-size chosen is too large. Figures 7(g) and 7(k) shows the reconstructed result by the proposed adaptive step-size strategy. A video sequence (
Visualization 1) is also presented to show the evolution of the iterative reconstruction process. It can be seen that a sharp but noisy reconstruction was quickly established during the first few iterations. Then the background noise gradually diminished and high-frequency inconsistencies were gradually refined as the step-size decreases. Finally, the iteration process converged smoothly after 17 iterations (*α ^{k}* <0.001), resulting in a better reconstructed image with a uniform background and all groups of features clearly resolved.

Besides its fixed step-size counterparts, the proposed adaptive step-size approach was also experimentally compared with four state-of-the-art FPM reconstruction algorithms, with the same parameter settings as in the simulation. Figure 9 shows the reconstructed amplitudes of different approaches, which generally mirror our simulation results. Note that for each algorithm, the runtime is longer than that of the simulation due to the raw dataset contains more images with four times pixel resolution, and thus the runtime per iteration for each algorithm is increased. As an equivalent of the PIE with a unity step-size, the Gerchberg-Saxton algorithm induced severe artifacts in the background, and several small-scale features were distorted [Figs. 9(a) and 9(f)]. By simply reducing the step-size (introducing regularization parameter *γ*), the background noise was greatly suppressed at the expense of more iterations required for convergence [Figs. 9(b) and 9(g)]. The difference map took 50 iterations to produce a clean image, but some fine line structures are not well reconstructed [Figs. 9(c) and 9(h)]. The result is quite similar to the result of the PIE with a small fixed step-size [Figs. 7(f) and 7(j)] because using a small *α* for incremental gradient approach tends to make the algorithm more ‘global’. The Wirtinger flow algorithm has run for 1000 iterations, but finally produced an image with some ringing artifacts [Figs. 9(d) and 9(i)]. The explanation for this may be due to the fact that the noise of the realistic data is mainly Poisson, and the intensity-based cost function is more sensitive to model mismatch errors [15]. Comparatively, the proposed adaptive step-size approach achieves the best reconstruction quality with little computational overhead [Figs. 9(e) and 9(j)].

#### 6.2. Experimental results of pathological section

Finally, the proposed adaptive step-size strategy was tested on a biological sample (a pathological section of human kidney), where the phase part of the complex field often contains valuable information on the studied specimen. Figure 10(a) presents the full FOV of the specimen and Figs. 10(b) and 10(c) show the corresponding magnified area of interest. Though the histological section is stained in red with hematoxylin and eosin (H. & E.), it is poorly visible under monochromatic red illumination. Figures 10(c)–10(d) show the reconstructed amplitude and phase images of four state-of-the-arts and the adaptive step-size approach, respectively. Although the resolution improvements in the results of all the five reconstruction algorithms are obvious, the adaptive step-size approach, once again, outperforms the other methods with high reconstruction quality, low noise-induced artifacts, and fast convergence rate simultaneously achieved ( Visualization 2).

## 7. Conclusions and discussions

In this work, several state-of-the-art methods for solving ptychographic phase retrieval problem have been studied from a numerical analysis point of view. It has been found that the PIE and ePIE algorithms can be classified as incremental gradient descent approaches in the numerical optimization community, and their convergence behaviors are quite different from the global gradient descent approaches. Though incremental gradient approaches often offer reduced computational cost and faster convergence rate than global approaches, we prove that its convergence for convex sets requires the step-size to be gradually diminishing. Then an adaptive step-size strategy has been introduced in incremental solutions of ptychographic imaging and shown to achieve significant improvement in the stability and robustness of the reconstruction towards noise. Furthermore, the algorithm offers fast convergence rate with little computational overhead. As verified by simulations and experimental results in the context of FPM, it always produces good performance without manually tuning parameter or preknowledge about the noise statistics. In addition, we have compared our method with four state-of-the-art FPM reconstruction algorithms. It has been observed that in the early few iterations, it can approach a neighbourhood of the solution quickly just like the Gerchberg-Saxton algorithm, and then the vanishing step-size further allows to avoid stagnation at a poor local minimum and improve the quality of the reconstruction. The introduction of an adaptive step-size provides a simple tool which, in all of our numerical simulations and experiments, outperforms the current standards in terms of convergence speed and reconstruction quality without adding complexity to the reconstruction process. Our MATLAB source code for this algorithm has been made available on [38].

In this work, we only reconstruct the object function and assume that pupil function and LED positions are the accurately known a priori. It should be noted that several studies have demonstrate that with the help of inherent data redundancy of ptychography, it is possible to extract those inaccurately known system parameters along with the object complex function from the original FPM dataset [9, 12–14, 16]. As a result, the required setup stability and accuracy of system calibration can be relaxed. Recently, by incorporating the pixel sampling model and the multi-slice model into the original PIE framework, the data redundancy in FPM can further provide the potential of overcoming pixel aliasing problem [21] and depth-resolved imaging for thick specimens [25, 26]. Since the proposed adaptive step-size strategy is a simple extension of the original PIE and ePIE, we envisage it can be straightforwardly extended to those more sophisticated algorithms for pupil recovery [9,12], misalignment correction [13,16], multi-slice imaging [25, 26], and has the potential to be applied to label-free live cell imaging [39] and technical surface inspection [40]. In Appendix B, it is demonstrated that the same adaptive step-size rule can be well applied to the pupil updating (the adaptive *α* is used for both object and pupil updating). Nevertheless, based on the simulation results presented in Fig. 11, we recommend to use a smaller initial value (
${\beta}^{0}=1/\sqrt{N}$, *N* is the total number of low-resolution images in the dataset) for pupil step-size for getting the best performance from the adaptive step-size algorithm.

## Appendix A: Convergence proof of incremental solution with diminishing step-sizes in the convex setting

In this Appendix, we prove the global convergence of the incremental gradient solution [Eq. (11)] with diminishing step-sizes in the convex setting. Main results of the proof are adapted from [41] and [42]. The required assumptions for the proof are the following: *ε _{i}* is convex, ∇

*ε*is bounded on ℂ. The

_{i}**W**

*in Eq. (11) is some nonnegative definite diagonal matrix (function) that are independent with the subiteration index*

_{i}*i*(

**W**

*≡*

_{i}**W**). Define a weighted norm ‖·‖

_{W}_{−1}by ${\Vert \mathbf{O}\Vert}_{{\mathbf{W}}^{-1}}=\sqrt{{\mathbf{O}}^{*}{\mathbf{W}}^{-1}\mathbf{O}}$. Suppose that

*ε*

^{★}=inf

_{O∈ℂ}

*ε*(

**O**). The following Lemmas 1 and 2 are firstly given and will be used in the convergence proof.

** Lemma 1**: Let {

**O**

*} be a sequence generated by incremental gradient solution Eq. (11). Then for all*

^{k}**O**∈ ℂ and all

*k*and some

*C*> 0, we have:

*Proof*: One can verify that the algorithm Eq. (11) is equivalent to the following

Lemma 1 guarantees that with a cyclic order, the incremental gradient solution Eq. (11) yields a point **O**^{k+1} at the end of the cycle will be closer to **O** than **O*** ^{k}*, provided the step-size

*α*is less than 2(

^{k}*ε*(

**O**

*) −*

^{k}*ε*(

**O**))

*C*

^{−1}. In particular, for any

*e*> 0 and assuming that there exists an optimal solution

**O**

^{★}, either we are within $\frac{1}{2}{\alpha}^{k}C+e$ of the optimal value

**O**

^{★}will be strictly decreased by at least 2

*α*

^{k}*e*.

*α*≡

^{k}*α*), convergence can be established to a neighborhood of the optimum which shrinks to 0 as

*α*→ 0, as stated in Lemma 2.

** Lemma 2**: Let {

**O**

*} be a sequence generated by incremental gradient solution Eq. (11) with the step-size*

^{k}*α*such that

^{k}_{k→∞}

*ε*(

**O**

*) =*

^{k}*ε*

^{★}.

*Proof* : The proof uses Proposition 1.2 of [42]. Assume for contradiction that there are *δ* > 0, *K* ∈ ℕ, *υ* ∈ ℂ, and *ε*(*υ*) < *ε*(**O*** ^{k}*) −

*δ*for all

*k*≥

*K*. Since lim

_{k→∞}

*α*= 0, we can assume that

^{k}*K*is so large that

*α*<

_{k}C*δ*where

*C*> 0 is a constant from Lemma 1. Using Lemma 1, we obtain:

*k*≥

*K*. Summing up, this gives

*k*>

*K*. Letting

*k*→ ∞, ${\sum}_{k=0}^{\infty}{\alpha}_{k}=\infty $ is contradicted.

** Convergence Theorem**: Let {

**O**

*} is a sequence generated by the algorithm Eq. (11) with the step-size*

^{k}*α*such that

^{k}**O**

*} converges to some*

^{k}**O**

^{★}∈ Λ

^{★}= {

*υ*∈ ℂ :

*ε*(

*υ*) ≤

*ε*(

**O**), ∀

**O**∈ ℂ}.

*Proof* : Using Lemma 1 with some *υ* ∈ Λ^{★}, we have

*k*. Since ${\sum}_{k=0}^{\infty}{\left({\alpha}^{k}\right)}^{2}<\infty $, we have

*n*and some

*q*where

*C*is a constant from Lemma 1. This implies that ${\sum}_{n=0}^{\infty}{\alpha}^{n}\left(\epsilon \left({\mathbf{O}}^{n}\right)-{\epsilon}^{\u2605}\right)<\infty $ since

*ε*(

**O**

*) −*

^{n}*ε*

^{★}≥ 0, ∀

*n*. Therefore, Eq. (26) implies that {

**O**

*} is bounded. Furthermore, by Lemma 2, there exists a subsequence {*

^{k}**O**

^{kn}} of {

**O**

*} such that lim*

^{k}_{n→∞}

*ε*(

**O**

^{kn}) =

**O**

^{★}. Since {

**O**

^{kn}} is bounded, there exists a subsequence {

**O**

^{knl}} of {

**O**

^{kn}} such that {

**O**

^{knl}} converge to some

**O**

^{★}∈ ℂ. By the continuity of

*ε*, we have

*ε*(

**O**

^{★}) =

*ε*

^{★}, that is,

**O**

^{★}∈ Λ

^{★}. We have obtained a limit point

**O**

^{★}∈ Λ

^{★}of {

**O**

*}. Now we follow the line of the proof of Proposition 1.3 of [42]. For any*

^{k}*δ*> 0, take

*K*∈ ℕ such that ${\Vert {\mathbf{O}}^{k}-{\mathbf{O}}^{\u2605}\Vert}_{{\mathbf{W}}^{-1}}^{2}\le \delta /2$ and ${\sum}_{n=K}^{\infty}\left(-2{\alpha}^{n}\left(\epsilon \left({\mathbf{O}}^{n}\right)-{\epsilon}^{\u2605}\right)+{\left({\alpha}^{n}\right)}^{2}C\right)\le \delta /2$. Using Lemma 1, one obtains:

**O**

*} converges to an optimal solution*

^{k}**O**

^{★}∈ Λ

^{★}. Note that comparing Eq. (22) with Eq. (25), an additional mild restriction on the step-size sequence ${\sum}_{k=0}^{\infty}{\left({\alpha}^{k}\right)}^{2}<\infty $ is included to guarantee convergence of the entire sequence {

**O**

*}.*

^{k}## Appendix B: Performance analysis of adaptive step-size strategy for pupil recovery

In this Appendix, we evaluate the performance of adaptive step-size strategy for pupil recovery using simulated data in Section 5, but with additional pupil aberration error. The update of pupil function is same as the object function [Eq. (11)], with the roles of the object and pupil have been exchanged.

*β*is introduced, which is often set to

*β*=

^{k}*β*≡ 1 (same as the object step-size) as suggested in [9,12,13]. Here we wish to demonstrate that the same adaptive step-size rule can be well applied to the pupil updating (using the adaptive

*α*is used for both object and pupil updating). Figure 11 shows the convergence curves as well as the reconstructed amplitudes and phases with various fixed step-sizes and adaptive step-size under aberrated and noisy data (the images were corrupted with 50% Poisson noise). As can be seen, using large fixed sizes

*α*=

*β*= 1 leads to slow divergence and finally incurs relatively large errors after 100 iterations. Decreasing the step-size slows down but stabilizes the convergence process, and finally can reach points with lower errors. Comparatively, the adaptive step-size approach gives a better complex-field reconstruction as well as a more accurate pupil estimation with much fewer iterations. We further found that additional improvements can be obtained by reducing the initial value for pupil step-size to about 0.067 [ ${\beta}^{0}=1/\sqrt{N}=1/15$,

*N*is the total number of LEDs (225)]. In such case, both

*α*and

^{k}*β*are halved when no apparent reduction in the error metric is made by the iteration, and

^{k}*β*is always $\sqrt{N}$ times smaller than

^{k}*α*during the entire iteration process. The rationality for this lies in the fact that the pupil function is more frequently updated than the object function (as all of the sub-spectrum images share the same pupil function, the pupil function can be updated at each sub-pupil region; but the object function at a particular location in the Fourier space can only updated when it is covered by the sub-pupil). This intuitive modification helps to establish synchronous convergence of both object and pupil function, and thus allows for further improvement in algorithm performance. It can also be expected that a further reduction of initial pupil step-size to

^{k}*β*

^{0}= 0.01 only slows down the convergence but offers no further improvement in reconstruction quality.

## Funding

National Natural Science Fund of China (NSFC) (11574152, 61505081, 61377003); ‘Six Talent Peaks’ project (2015-DZXX-009); ‘333 Engineering’ research project (BRA2015294); Fundamental Research Funds for the Central Universities (30915011318); Open Research Fund of Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense (3092014012200417).

## Acknowledgments

C. Zuo thanks the support of the ’Zijin Star’ program of Nanjing University of Science and Technology.

## References and links

**1. **H. M. L. Faulkner and J. M. Rodenburg, “Movable aperture lensless transmission microscopy: A novel phase retrieval algorithm,” Phys. Rev. Lett. **93**, 023903 (2004). [CrossRef] [PubMed]

**2. **J. M. Rodenburg and H. M. L. Faulkner, “A phase retrieval algorithm for shifting illumination,” Appl. Phy. Lett. **85**, 4795–4797 (2004). [CrossRef]

**3. **G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution fourier ptychographic microscopy,” Nat. Photon. **7**, 739–745 (2013). [CrossRef]

**4. **R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of the phase from image and diffraction plane pictures,” Optik **35**, 237–249 (1972).

**5. **J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. **21**, 2758–2769 (1982). [CrossRef] [PubMed]

**6. **V. Mico, Z. Zalevsky, P. García-Martínez, and J. García, “Synthetic aperture superresolution with multiple off-axis holograms,” J. Opt. Soc. Am. A **23**, 3162–3170 (2006). [CrossRef]

**7. **C. Yuan, H. Zhai, and H. Liu, “Angular multiplexing in pulsed digital holography for aperture synthesis,” Opt. Lett. **33**, 2356–2358 (2008). [CrossRef] [PubMed]

**8. **P. Gao, G. Pedrini, and W. Osten, “Structured illumination for resolution enhancement and autofocusing in digital holographic microscopy,” Opt. Lett. **38**, 1328–1330 (2013). [CrossRef] [PubMed]

**9. **A. M. Maiden and J. M. Rodenburg, “An improved ptychographical phase retrieval algorithm for diffractive imaging,” Ultramicroscopy **109**, 1256–1262 (2009). [CrossRef] [PubMed]

**10. **M. Guizar-Sicairos and J. R. Fienup, “Phase retrieval with transverse translation diversity: a nonlinear optimization approach,” Opt. Express **16**, 7264–7278 (2008). [CrossRef] [PubMed]

**11. **P. Thibault, M. Dierolf, O. Bunk, A. Menzel, and F. Pfeiffer, “Probe retrieval in ptychographic coherent diffractive imaging,” Ultramicroscopy **109**, 338–343 (2009). [CrossRef] [PubMed]

**12. **X. Ou, G. Zheng, and C. Yang, “Embedded pupil function recovery for fourier ptychographic microscopy,” Opt. Express **22**, 4960–4972 (2014). [CrossRef] [PubMed]

**13. **A. Maiden, M. Humphry, M. Sarahan, B. Kraus, and J. Rodenburg, “An annealing algorithm to correct positioning errors in ptychography,” Ultramicroscopy **120**, 64–72 (2012). [CrossRef] [PubMed]

**14. **F. Zhang, I. Peterson, J. Vila-Comamala, A. Diaz, F. Berenguer, R. Bean, B. Chen, A. Menzel, I. K. Robinson, and J. M. Rodenburg, “Translation position determination in ptychographic coherent diffraction imaging,” Opt. Express **21**, 13592–13606 (2013). [CrossRef] [PubMed]

**15. **L.-H. Yeh, J. Dong, J. Zhong, L. Tian, M. Chen, G. Tang, M. Soltanolkotabi, and L. Waller, “Experimental robustness of fourier ptychography phase retrieval algorithms,” Opt. Express **23**, 33214–33240 (2015). [CrossRef]

**16. **J. Sun, Q. Chen, Y. Zhang, and C. Zuo, “Efficient positional misalignment correction method for fourier ptychographic microscopy,” Biomed. Opt. Express **7**, 1336–1350 (2016). [CrossRef] [PubMed]

**17. **P. Thibault and A. Menzel, “Reconstructing state mixtures from diffraction measurements,” Nature **494**, 68–71 (2013). [CrossRef] . [PubMed]

**18. **D. J. Batey, D. Claus, and J. M. Rodenburg, “Information multiplexing in ptychography,” Ultramicroscopy **138**, 13–21 (2014). [CrossRef] [PubMed]

**19. **L. Tian, X. Li, K. Ramchandran, and L. Waller, “Multiplexed coded illumination for fourier ptychography with an led array microscope,” Biomed. Opt. Express **5**, 2376–2389 (2014). [CrossRef] [PubMed]

**20. **S. Dong, R. Shiradkar, P. Nanda, and G. Zheng, “Spectral multiplexing and coherent-state decomposition in fourier ptychographic imaging,” Biomed. Opt. Express **5**, 1757–1767 (2014). [CrossRef] [PubMed]

**21. **J. Sun, Q. Chen, Y. Zhang, and C. Zuo, “Sampling criteria for Fourier ptychographic microscopy in object space and frequency space,” Opt. Express **24**, 15765–15781 (2016) [CrossRef]

**22. **C. Yang, J. Qian, A. Schirotzek, F. Maia, and S. Marchesini, “Iterative algorithms for ptychographic phase retrieval,” arXiv preprint arXiv:1105.5628 (2011).

**23. **R. Horstmeyer, R. Y. Chen, X. Ou, B. Ames, J. A. Tropp, and C. Yang, “Solving ptychography with a convex relaxation,” New Journal of Physics **17**, 053044 (2015). [CrossRef] [PubMed]

**24. **L. Bian, J. Suo, G. Zheng, K. Guo, F. Chen, and Q. Dai, “Fourier ptychographic reconstruction using wirtinger flow optimization,” Opt. Express **23**, 4856–4866 (2015). [CrossRef] [PubMed]

**25. **L. Tian and L. Waller, “3D intensity and phase imaging from light field measurements in an led array microscope,” Optica **2**, 104–111 (2015). [CrossRef]

**26. **A. M. Maiden, M. J. Humphry, and J. M. Rodenburg, “Ptychographic transmission microscopy in three dimensions using a multi-slice approach,” J. Opt. Soc. Am. A **29**, 1606–1614 (2012). [CrossRef]

**27. **D. R. Luke, “Relaxed averaged alternating reflections for diffraction imaging,” Inverse Probl. **21**, 37–50 (2005). [CrossRef]

**28. **S. Marchesini, H. He, H. N. Chapman, S. P. Hau-Riege, A. Noy, M. R. Howells, U. Weierstall, and J. C. H. Spence, “X-ray image reconstruction from a diffraction pattern alone,” Phys. Rev. B **68**, 140101(R) (2003). [CrossRef]

**29. **H. H. Bauschke, P. L. Combettes, and D. R. Luke, “Phase retrieval, error reduction algorithm, and Fienup variants: a view from convex optimization,” J. Opt. Soc. Am. A **19**, 1334–1345 (2002) [CrossRef]

**30. **P. S. Diniz, *Adaptive filtering*” (Springer, 1997). [CrossRef]

**31. **P. Godard, M. Allain, V. Chamard, and J. Rodenburg, “Noise models for low counting rate coherent diffraction imaging,” Opt. Express **20**, 25914–25934 (2012). [CrossRef] [PubMed]

**32. **P. Thibault and M. Guizar-Sicairos, “Maximum-likelihood refinement for coherent diffractive imaging,” New Journal of Physics **14**, 063004 (2012). [CrossRef]

**33. **J. Rodenburg, A. Hurst, and A. Cullis, “Transmission microscopy without lenses for objects of unlimited size,” Ultramicroscopy **107**, 227–231 (2007). [CrossRef]

**34. **D. P. Bertsekas and D. P. Bertsekas, *Nonlinear Programming* (Athena Scientific, 1999), 2nd ed.

**35. **D. P. Bertsekas, “Incremental gradient, subgradient, and proximal methods for convex optimization: A survey,” Optimization for Machine Learning **2010**, 1–38 (2011).

**36. **K. Guo, S. Dong, P. Nanda, and G. Zheng, “Optimization of sampling pattern and the design of Fourier ptychographic illuminator,” Opt. Express **23**, 6171–6180 (2015). [CrossRef] [PubMed]

**37. **X. Ou, R. Horstmeyer, G. Zheng, and C. Yang, “High numerical aperture Fourier ptychography: principle, implementation and characterization,” Opt. Express **23**, 3472–3491 (2015). [CrossRef] [PubMed]

**38. **http://www.scilaboratory.com/h-col-123.html

**39. **L. Tian, Z. Liu, L. Yeh, M. Chen, J. Zhong, and L. Waller, “Computational illumination for high-speed in vitro Fourier ptychographic microscopy,” Optica **2**, 904–911 (2016) [CrossRef]

**40. **T. M. Godden, A. Muñiz-Piniella, J. D. Claverley, A. Yacoot, and M. J. Humphry, “Phase calibration target for quantitative phase imaging with ptychography,” Opt. Express **24**, 7679–7692 (2016) [CrossRef] [PubMed]

**41. **A. Nedic and D. P. Bertsekas, “Incremental subgradient methods for nondifferentiable optimization,” SIAM J. Optim. **12**, 109–138 (2001). [CrossRef]

**42. **R. Correa and C. Lemaréchal, “Convergence of some algorithms for convex minimization,” Math. Program. **62**, 261–275 (1993). [CrossRef]