## Abstract

In microscopy, proper modeling of the image formation has a substantial effect on the precision and accuracy in localization experiments and facilitates the correction of aberrations in adaptive optics experiments. The observed images are subject to polarization effects, refractive index variations, and system specific constraints. Previously reported techniques have addressed these challenges by using complicated calibration samples, computationally heavy numerical algorithms, and various mathematical simplifications. In this work, we present a phase retrieval approach based on an analytical derivation of the vectorial diffraction model. Our method produces an accurate estimate of the system’s phase information, without any prior knowledge about the aberrations, in under a minute.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Attaining phase information in diffraction experiments is of great practical interest; however, intensity measurements, i.e. images, do not explicitly capture complex values. This hidden information can be estimated by phase retrieval (PR) [1] – namely, recovery of a complex signal from intensity measurements of its Fourier transform, which has therefore been employed in a wide variety of applications, e.g. X-ray crystallography [2], astronomy [3], optical ptychography [4,5], adaptive optics [6,7], and the subject of this work: localization microscopy [8,9].

In a typical 2D localization microscopy experiment, the localization step is performed by fitting images of well-separated emitters with a known point-spread function (PSF), e.g. an Airy disc. The reliance on precise knowledge of the PSF is highlighted in single-molecule localization microscopy (SMLM), where the signal-to-noise ratio (SNR) is limited. Acquiring additional information can be achieved by PSF engineering, where the PSF shape is intentionally perturbed to encode the desired information. This approach has been used for 3D measurements, e.g. astigmatism [10], Tetrapod [11] and double helix (DH) [12] PSFs; encoding emitter wavelength [13–15] and molecular orientation [16]. A closely related application is adaptive optics, which employs additional optical elements to counteract aberrations present in the imaging system that distort the PSF. In all mentioned cases, PR can be used to correct the theoretically simulated PSFs or to design the phase mask which creates them [17].

In PR, the complex signal is numerically estimated from the magnitude of its Fourier transform. Any employed algorithm requires the nonobvious selection of a penalty function, and must overcome degeneracies in the solution, unstable derivatives, non-convex optimization, and more [18]. So far, PR methods have made use of the iterative Gerchberg-Saxton (GS) algorithm [19] and its variations [20], estimation over the Zernike basis of aberrations [21–23], and pixel-wise numerical gradient calculations [24]. The first consideration of these algorithms is the tradeoff between computational efficiency and model accuracy; namely, a limited number of parameters must be chosen to describe the phase mask and various model approximations are employed. To reduce computational burden, the mentioned discrete approaches employ coarse graining in the physical space, while the methods that use a polynomial basis, e.g. Zernike polynomials, reduce the number of parameters by modeling only the most common optical aberrations.

A second consideration is the selection of an optical model. Thus far, most methods employ the scalar diffraction model [21,25]. The scalar model is more computationally efficient to simulate than the vectorial model; however, this approximation deteriorates for emitters near the coverslip, precisely where most calibration measurements are performed. While in some cases, PR can be performed within a biological sample (away from the coverslip), which would make the scalar model appropriate [26], it is often impractical.

In this work, we present a vectorial implementation of phase retrieval (VIPR): a PR technique based on an analytical derivative of the vectorial diffraction model that converges to a robust and accurate solution in a pixel-wise manner. Our implementation is very fast thanks to the GPU-optimized implementation using fast Fourier transforms (FFT). We provide a flexible software, suitable for both freely rotating emitters and fixed dipoles [27]. Furthermore, our method requires no priors on the pupil plane phase pattern (making it especially suitable for adaptive optics) and, intriguingly, also corrects for experimentally-observed effects, e.g. axial-position dependent lateral shift (also known as wobble), removing the need to calibrate and correct data in post-processing [28]. Our approach is designed to retrieve the phase information from a simple calibration measurement: images of a single bright emitter over a range of focal positions. The retrieved phase can then be applied to volumetric samples, e.g. 3D imaging of mitochondria and microtubule blinking data in cell imaging.

## 2. Methods

#### 2.1 Optical models

An illustration of the optical setup is shown in Fig. 1(a). Briefly, the imaging system consists of a Nikon Eclipse-Ti inverted fluorescence microscope with a Plan Apo 100X/1.45 NA Nikon objective. The emission path was extended with two achromatic doublet lenses (focal length 15 cm) forming a 4-f system. A liquid crystal spatial light modulator (LC-SLM) is placed in the conjugate back focal plane (Meadowlark 1920X1152 liquid crystal on silicon) with an EMCCD camera (iXon 897 Life, Andor) at the image plane. The calibration sample consists of a water-covered glass coverslip (Fisher Scientific) with 40 nm fluorescent beads adhered to the surface (FluoSpheres (580/605), ThermoFisher), which is illuminated with $50\frac{W}{{c{m^2}}}$, 560 nm light (iChrome MLE, Toptica).

Modeling the 3D PSF of the microscope can be done in various methods. A frequently used model is known as the scalar approximation. This model presupposes that the emission pattern of each point source is spherical. This is a good approximation when the objective NA is low, the emitter is rotationally mobile [29], and particularly when imaging emitters far from optical interfaces, i.e. beyond near-field effects such as super-critical angle fluorescence (SAF). Another crucial difference between the vectorial and scalar models is that volumetric imaging far from the coverslip (beyond ∼1 wavelength), has an effective NA that is limited by the sample’s refractive index rather than the objective NA (details in Appendix A). The main advantage of the scalar model is the reduced computational complexity in numerical simulations. Our method implements the more complex vectorial model (in addition to the scalar model) to account for the mentioned effects.

Designing arbitrary PSFs (Figs. 1(a)–1(b)) has two additional challenges: (1) the 3D PSF is smooth, due to the diffraction limit; as seen in Fig. 1(b), the designed letters had to be spaced by $1{\; }\mu m$. (2) All the optical aberrations (system and sample specific) are unaccounted for and adaptive optics should then be deployed to improve the result.

In this section, we discuss freely-rotating emitters for both the scalar and vectorial models. Here, the intensity pattern $I(x,y;\alpha ,\vec{q},P,{\sigma _B})$ of a quasi-monochromatic source at the image plane is given by [30–32] :

*P*is the phase aberration (phase mask) from the ideal (clear aperture) system and $\alpha $ encompasses the optical parameters required to model the imaging system, including refractive indices, objective parameters, emission wavelength and more. $\vec{q}$ describes the coordinates relative to the coverslip:$\vec{q} = ({x_0},{y_0},{z_0}, - NFP)$;$({x_0}{y_0},{z_0})$ are the emitter’s position coordinates and Nominal Focal Plane ($NFP$) is the objective displacement from focusing on the coverslip. $w$ are the electrical field components in the model (Appendix A): $w = 1$ for the scalar model, $w = 1,2\ldots 6$ for the full vectorial description and $w = 1,2,3$ or $w = 4,5,6$ for the vectorial model with a linear polarizer, relevant for polarization dependent LC-SLMs. $B$ denotes the optical blurring (estimated by a Gaussian kernel with a spatial standard deviation of ${\sigma _B}$). In some models [30,33], an additional filter is added which describes the 3D distribution of emitters for the case of large fluorescent particles. For beads smaller than the wavelength, we approximate these filters by a Gaussian blur to reduce computational complexity.

The electrical field components ${g_{img,w}}(x,y;\alpha ,\vec{q},P)$ in Eq. (1) are generated by:

In the next sections, we assume that the optical system is fixed and that we measure a pixelated version of the intensity $I(x,y;\alpha ,\vec{q},P,{\sigma _B})$. Thus we drop the dependence on the optical system parameters $\alpha$, and keep only the phase mask $P{\; }$ and the blur kernel std ${\sigma _B}$ as free parameters, e.g. $I(n;P,\vec{q},{\sigma _B})$, where $n = 1,2\ldots N$ is the pixel number.

#### 2.2 Cost functions

The next step in optimization is introducing the observed z-stack. To do so, a cost function which penalizes the difference between the simulation and measurements is required. Such cost functions can serve in both the PR step and subsequent localization steps. All of the considered cost functions are of the form $C = \frac{1}{{NQ}}\sum\limits_{i = 1}^Q {\sum\limits_{n = 1}^N {f[I(n;P,{{\vec{q}}_i},{\sigma _B}),{Z_i}(n),{\sigma _i}]} }$ where ${\sigma _i}$ is a noise parameter and ${Z_i}(n)$ is the observed image associated with positions ${\vec{q}_i}, i = 1,2\ldots Q$, e.g. in the typical case of PR from a z-stack of a bead on the coverslip surface, these correspond to different values of $NFP$.

Common examples of cost functions include the L1 [34] and L2 [24] norms. Other cost functions employ statistical estimation, e.g. are derived assuming Poisson distribution and thus define a negative log-likelihood cost [11,25,35,36]:

#### 2.3 Analytical gradient for phase retrieval

Phase retrieval can be viewed as a nonconvex optimization problem [37]. In general, such optimization problems pose a challenge as they can have spurious local minimizers, saddle points, and significant sensitivity to good initialization. However, it has been shown that PR can achieve convergence using stochastic gradient descent (SGD) [18].

To facilitate a fast gradient computation, we analytically derive the gradient of the cost function with respect to the pixelated phase mask $P(m),m = 1,2\ldots ,M$, i.e. $\frac{{\partial C}}{{\partial P}} \in {{\mathbb R}^M}$. In this section we describe the general derivation for dipoles with full rotational mobility (the fixed dipole case is derived in Appendix E). For simplicity, we derive the gradient for a single position ${\vec{q}_i}$($Q = 1$) and denote $I_i(n) = I(n;P,{\vec{q}_i},{\sigma _B})$. The first derivative in the chain rule (for the form of cost functions defined in Section 2.2) gives:

*a*. The final step is the matrix multiplication of Eqs. (8)–(9):

#### 2.4 Phase mask optimization workflow

Phase retrieval can be used either for *recovering* the pupil plane from measured PSFs, or for *designing* a new phase mask which will produce desired PSFs. Both of these applications amount to very similar procedures. In this section we describe the optimization flow which generates an estimated phase mask from the measured 3D PSF using the previously derived gradient. For mask *design*, the difference is that no denoising or experimental blur correction are required. The first step in the optimization procedure requires the input of a z-stack ${Z_i}(n)$ of length $Q$ (typically 40-50 images) with associated coordinates ${\vec{q}_i}$. Note that the PR result is the estimation of the phase in the back focal plane, assuming known coordinates ${\vec{q}_i}.$ Any mismatch between experiment and assumed coordinates will affect the result. For example, an inaccurate axial position input is compensated by an added defocus correction in the phase mask. Optimally, as many images would be used as possible within the axial range of interest, considering practical limitations, e.g. stage resolution and photobleaching. Next, the background is estimated from the corners of the cropped images (close to the PSF) to subtract the mean and estimate the noise standard deviation ($\widehat {{\sigma _i}}$), this background subtraction method works well with wide-field illumination in a small FOV, and it can be replaced by experiment-dependent customized methods.

We initialize the phase mask with an analytical gradient similar to [18]. This improves the convergence rate, because at this point, the simulated images (standard PSF) and measured images are not necessarily correlated. Subsequent ${K_1}$ iterations calculate the gradient for a batch of images (chosen by stochastic sampling) *via* Adam [39] with a step size of $\eta = 3 \cdot {10^{ - 2}}.$ The estimated blur kernel $\widehat {{\sigma _B}}$ and the noise $\widehat {{\sigma _i}}$ are drawn from a normal distribution in each iteration to reduce the dependence of the solution on the exact estimates. The full algorithm is described in Algorithm 1.

Two optional steps have been included in the software which may improve results in some cases at the cost of additional computation time. The first is fine-tuning the blur kernel (default $\widehat {{\sigma _B}} = 0.75$) after $\; {K_1}/3\; $ iterations. The blur kernel estimation is done using the MATLAB routine *fmincon*. The second option is performing ${K_2}$ iterations with a variant of directed sampling [40,41] where the side information is the correlation of the current model with the measured image stack, i.e. compute the gradient at positions where the correlation between the simulation and data is minimal. This can improve results when the used phase mask converges poorly at some focal positions; in this work, we found this step necessary only for the misaligned Tetrapod (Figs. 4 and 8). Future work can include optimization of this final step based on recent advances in adaptive sampling [42], which could further improve convergence.

Importantly, this method is very fast, thanks to the FFT implementation (Eq. (12)): Table 1. Shows the average time required to calculate a single gradient as a function of number of pixels (binning) in the phase mask. The algorithm is orders of magnitudes faster than numerical pixel-wise gradient calculation [24], and much faster than fitting Zernike polynomials implemented by the MATLAB function *fminunc* [34]. A single gradient calculation for a Zernike coefficient is on the same time scale as the presented VIPR gradient, thus the theoretical acceleration factor is exactly the amount of Zernike polynomials considered. The results were timed on a laptop with INTEL i7-8750H CPU and NVIDIA GeForce RTX 2070 GPU. Typical pupil plane recovery (vectorial model with a linear polarizer) from a fluorescent bead z-stack requires 300 iterations over 44000 pixels, using a batch size of 3 z positions (per iteration), and takes 31.5 seconds on a GPU (Table 1) compared to ∼30 minutes for Zernike polynomial fitting with a global gradient computation. An equivalent numeric pixel-wise gradient based calculation takes ∼9 days (∼25,000 times slower). Note that the significance of the improvement in run-time over Zernike polynomial fitting inflates quickly when the PR becomes a step in a larger iterative procedure, such as designing new optimal PSFs [17], live adaptive optics implementations and spatial aberration corrections [43].

## 3. Results and discussion

#### 3.1 Comparison with existing methods

For benchmarking, we compare our vectorial PR (VIPR) to two alternatives: (1) analytically optimizing (with our method) the scalar diffraction model and (2) adding Zernike polynomials (first 60 polynomials without tip and tilt) to the initially designed phase mask using the vectorial model [21,34]. Note that using Zernike polynomials requires an additional step to address observed wobble, by fitting the z-stack with the initial guessed mask to center it. Using the described methods, we retrieved phase masks from a z-stacks using a $4\; \mu m\; $ Tetrapod phase mask [11,44], with one image per NFP at a step size of $100\; nm\; $ (measured stack at Fig. 2(b) second row with an estimated image sum (signal) of $3.5\cdot {10^5}ADU$ and background ${\sim} {\cal N}(870,8100)$). To quantify the performance, we analyzed z-stacks with ten images per focal position (Figs. 2(b)–2(e)). Robustness of our approach to image noise was tested by computationally degrading the images with a Gaussian-distributed noise (std equal to a multiplier scalar times the estimated std of the noise).

While VIPR deteriorates in precision with added noise, the other methods, interestingly, initially improve after the addition of some noise; this is likely due to VIPR capturing the subtle details of measured intensity patterns and thus avoids bad localizations at specific axial positions (Appendix C). To test the axial localization bias, i.e. the z accuracy, and the wobble behavior we evaluated the performance of the methods on a short z-stack (one image per focal position). The total error is a combination of method-related PR error, measurement error (stage and readout accuracy) and SNR, in the two sets of data (the calibration set and the testing set). We found that our method incorporates the wobble better and has minimal axial bias (Appendix C and Visualization 1).

VIPR is based on sensor-less phase retrieval, i.e. we do not measure the wavefront directly at the back focal plane with a wavefront sensor. Additionally, the main sources of aberrations in PSF engineering setup (especially with the introduction of large phase modulations) is the LC-SLM device itself [45] and the objective lens. These aberrations might introduce deviations between the designed phase mask (Fig. 2(b) first row) and the PR mask; however, the essential result is creating an accurate PSF generation model (Visualization 1 and Visualization 2) for localization and estimation of the systematic and sample specific effective phase modulation that the measured PSF had.

#### 3.2 Generality to phase patterns

The benefit of pixel-wise methods is their broad applicability to model any aberration, e.g. non-smooth ones, which require many Zernike modes to describe. The Tetrapod phase mask presented in Section 3.1 is the best-case-scenario for Zernike polynomials as the designed phase mask is directly constructed by a sum of low order Zernike polynomials. To test the generality of our method, we imaged z-stacks with 4 different phase masks using the LC-SLM. The first mask consists of a randomly generated combination of low amplitude Zernike polynomials to test the applicability of the method for adaptive optics (Fig. 3(a)). The resulting PSF simulations from VIPR produce similar PSFs to the measured data, which is the crucial step in PR and the desired result. However, the deviations between the designed phase mask and the result (highlighted in Fig. 3(a) but present in all the obtained phase masks) show that the strongest optical aberrations occur on the edges of the back focal plane. We attribute these deviations to objective specific aberrations, especially at the SAF frequencies, and to the validity of the aplanatic approximation at the edges of the NA.

The second case shows a machine-learned phase mask which was recently demonstrated to optimally measure dense fields of emitters in 3D (Fig. 3(b)) [46]. The third test (Fig. 8) deals with severe misalignments in the Tetrapod phase mask, which has a large area of unmodulated pixels. The results for the double-helix phase mask, which comes from optimization of Gauss-Laguerre basis functions and is commonly used in PSF engineering [12] are also shown in Fig. 9 and Visualization 2.

## 4. Experimental demonstrations of STORM data

Direct stochastic optical reconstruction microscopy (dSTORM) is a technique in which a super-resolution image is reconstructed from localization of sparse fluorescent molecules that are switched on and off (blinking) using high power laser [47,48]. Both experiments (Section 4 and Appendix F) were performed using a dielectric $4{\; }\mu m$ Tetrapod phase mask, which exhibits better photon efficiency than the LC-SLM polarization dependent implementation. PR was performed in a similar way to the LC-SLM data (by removing the polarizer). Appendix F shows a reconstruction of fluorescently labeled mitochondria in COS7 cells, using the deep-learning network which trains on simulations only, adapted from [46]. This demonstrates the viability of our *in-vitro* method in predicting *in-vivo* PSFs.

To show the robustness of our method to suboptimal situations, we used a misaligned $4{\; }\mu m$ dielectric Tetrapod phase mask to image microtubule blinking data. We show in Fig. 4 the applicability of our method in recovering unusual aberrations, which are otherwise challenging to model. The use of image-based interpolation methods can also address these issues; However, such methods need a good volumetric calibration [49] and are challenging to deploy on a GPU as it requires much more parameters to model the PSF [22]. The PR predictions (using the vectorial model) were used to localize 10 K frame of cell data (Fig. 4(a)). Using the localization and an axial range of ${\pm} 25{\; }nm$ around the desired positions, we created mean experimental PSFs (Fig. 4(c)) to show their matching with our model.

## 5. Conclusion

In this work, we have developed and demonstrated an accurate and fast optimization method for PR and PSF design. Our method estimates the pupil-plane phase distribution in a microscope experiment based on Fourier optics. The optimization uses only basic knowledge of the optical system and a target image set (simulated or measured), with no priors on the phase mask. The derivation is general and can be adopted for many microscopy applications, sample variation (fixed or freely rotating dipoles) and optical processing methodologies (e.g. a polarized LC-SLM, a dielectric phase mask or a deformable mirror can be directly modeled with their real pixel size and optical constraints). The performance of VIPR is mainly limited by SNR, the used approximations (e.g. numerical pixelation, aplanatic performance, axially invariant NA, etc.), number of measurements, and accuracy in the knowledge of the optical parameters. Slight improvements could be made by retrieving an amplitude mask and by measuring the optical parameters, e.g. NA, magnification, etc., as part of the calibration process at the cost of computational complexity. The pixel-wise derivation can be adopted to many applications by simply changing the cost function and optimization constraints. Furthermore, this approach can be applied to adaptive optics due to the versatility and speed of our implementation.

## Appendix A

In this section we describe the analytical (far-field) electrical field derivation at the back focal plane which is required in Eq. (3) to calculate -${E_{bfp,w}}(\widehat x,\widehat y;\alpha ,\vec{q}),w = 1,2\ldots 6$. For this section, we assume that there is no phase mask in the Fourier plane (a unit aperture). The source is an emitter located at ${r_0} = ({x_0},{y_0},{z_0})$ with a dipole vector $\overrightarrow \mu = ({\mu _1},{\mu _2},{\mu _3})$. Emitters are immersed in a sample with refractive index ${n_m}$, placed above an oil immersion objective with refractive index ${n_{imm}}$. The emission from the source can be described by the far-field Green tensor ${G_{ff}}({r_0},r)$ [31,32,50], which links between the three polarizations of the emitter to the electrical field generated at the far field position $r = (\hat{x},\hat{y},z)$:

The support of the Fourier plane, $S(\widehat x,\widehat y)$ (Eq. (3)), in such systems is usually limited by the objective’s aperture given by Abbe’s sine rule. However, a subtle yet important phenomena can be observed here, usually referred to as super-critical angle fluorescence (SAF) [31,32]. When the objective’s NA is higher than the refractive index of the sample media, a range of ${\theta _{imm}}$ propagation angles is available between the critical angle and the NA where $\cos ({\theta _m})$ becomes imaginary (creating evanescent waves which effectively limit the NA of the system). In PR techniques such as this, where the emitters are bound to the coverslip, the SAF is a significant part of the measured intensity [52].

The last component needed to describe the electrical field at the back focal plane, is the ray rotation matrix which is induced by the objective, assuming a perfect objective lens (full transmission):

In the case of the scalar approximation, there is only a single component of the electrical field and all the polarization effects are ignored. To relate between the two models, the scalar model can be described by ${g_{bfp,1}} = \sqrt {\frac{{{n_{imm}}}}{{\cos ({\theta _{imm}})}}}$:

## Appendix B

To compare between the two MLE cost functions (Poisson and Gaussian) we used the retrieved Tetrapod phase mask (Fig. 2(a)) to localize the z-stack (Fig. 5(c)). Evaluation of the robustness to noise was achieved by adding artificial noise (similar to Section 2.2). We found that the Poisson MLE deteriorates faster than the Gaussian MLE under the addition of artificial noise (Fig. 5(a)) and even under photo-bleaching, which can be seen in Fig. 5(b) where the Poisson MLE has a trend in the precision over the course of a single measurement.

The empirical data suggests that the Gaussian MLE cost is more suitable for localization and PR at high SNR values than Poisson MLE. We attribute this to the apparent sensitivity of the Poisson MLE to exact knowledge of emitter intensity. The derivative of ${C_{G,NLL}}$ is more regularized than the derivative of ${C_{P,NLL}}$ (Eq. (20)), which is crucial because in practice the intensity is estimated from the acquired images, and is not precisely known.

## Appendix C

This appendix provides with additional information to Sections 2.3, 2.4 and Fig. 2. Figure 6 describes the results of localizing a short z-stack (a second z-stack taken on the calibration bead) with the three retrieved phase mask from the described methods in Section 2.4. A short z-stack will endure less optical drift and can be used to estimate the accuracy of the PR method. Figure 6(a) shows the difference in the axial position estimation (with MLE fitting) and the stage read where the sole difference is the phase mask. Figures 6(b)–6(c) shows that the wobble was incorporated into the phase mask using our method (with the vectorial and scalar models) compared to the Zernike PR (green) which shows wobble.

Wobble poses two separate issues for Zernike polynomial fitting: (1) A robust and accurate fit requires the wobble calibration before the optimization (to center the PSFs); unfortunately, a reliable wobble fit needs a matching PSF generator model. Thus, Zernike fitting requires a good initial guess to fit the wobble or an implementation of a slow bootstrap approach to optimize both. (2) The calibration is performed on the coverslip, so implementation in volumetric imaging will require some interpolation or scaling to match the different phase accumulations (immersion oil and sample medium).

Figure 7 describes the localization results of Fig. 2(b) (without additional noise). From these results, we observe that even a small mismatch in the PSF can lead to large deviations in localization precision at certain positions. These are the positions in which the SAF light produces a sharp point (as it is unmodulated by the phase mask). VIPR incorporates SAF into the derivation and thus suffers less precision penalty in this region.

## Appendix D

In this appendix we demonstrate further generality results, in continuation to Section 3.2. Figure 8 shows the retrieval results on a $4\; \mu m\; $ Tetrapod phase mask which is severely misaligned as described in Section 3.2.

Figure 9 shows the retrieval results on a DH phase mask, similar to Fig. 2(a). We note that while the DH is not based on Zernike polynomials like the Tetrapod mask, it can still be corrected using Zernike polynomials if we have the prior knowledge of the theoretical phase mask.

## Appendix E

In this appendix we derive an analytical gradient for the case of a fixed dipole, represented by the vector $\overrightarrow \mu = ({\mu _1},{\mu _2},{\mu _3})$. In this case, Eq. (1) is replaced by:

## Appendix F

In this appendix, we describe the sample (Section 4) preparation and provide also with a demonstration of VIPR when used on an aligned $4\; \mu m$ Tetrapod phase mask. Sample preparation included cleaning 22X22 mm, 170 $\mu m$ thick coverslips in an ultrasonic bath with 5% Decon90 at 60°C for 30 min. Next, the coverslips were washed with water then incubated in ethanol absolute for 30 min and finally sterilized with 70% filtered ethanol for 30 min. COS7 cells were grown for 24 hours on the coverslips in 6-well plate in phenol red free Dulbecco's modified eagle medium (DMEM) With 1 g/l D-glucose (low glucose) supplemented with 10% fetal bovine serum, 100 U/ml penicillin 100 ug/ml streptomycin and 2 mM glutamine, at 37°C, and 5% CO_{2}. The cells were fixed with 4% paraformaldehyde and 0.2% glutaraldehyde in PBS, pH 6.2, for 45 min, washed and incubated in 0.3M glycine/PBS solution for 10 minutes. The coverslips were transferred into a clean 6-well plate and incubated in a blocking solution (10% goat serum, 3% BSA, 2.2% glycine, and 0.1% Triton-X in PBS, filtered with 0.45 $\mu m$ PVDF filter unit, Millex) for 2 hours at 4°C. The cells were then immunostained overnight at 4°C with either anti TOMM20-AF647 (Fig. 11) antibody (Abcam, ab209606) or anti-$\alpha $ Tubulin-AF647 (Section 4) antibody (Abcam, ab190573) diluted 1:230 in the blocking buffer, and washed X5 with PBS.

For STORM, a PDMS chamber was attached to the glass coverslip containing fixed immunostained COS7 cells and placed in the microscope setup described in Fig. 1(a). The added blinking buffer was freshly made from 100 mM β-mercaptoethylamine hydrochloride, 20% sodium lactate, and 3% OxyFluor (Sigma, SAE0059), modified from [53]. A glass coverslip was placed on top to prevent evaporation. The sample was illuminated by a high-intensity 638nm 2000mW red dot laser module which passed through a 25 µm pinhole (Thorlabs). Emission light was filtered through a 500 nm Long pass dichroic and a 650 nm long pass (Chroma). 10K images were recorded on a sCMOS camera (Prime95B, Photometrics) instead of the EMCCD to reduce pixel size and enable a large field of view.

## Funding

European Research Council (802567); Technion-Israel Institute of Technology; Zuckerman Foundation.

## Acknowledgments

We wish to thank Adam Backer for useful discussions. We thank Navitar Optical Solutions for the donation of a Pixelink camera which was used for system alignment.

## Disclosures

The authors declare no conflicts of interest.

## Code availability

The software will be available at Code 1 (Ref. [54]).

## References

**1. **Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase Retrieval with Application to Optical Imaging: A contemporary overview,” IEEE Signal Process. Mag. **32**(3), 87–109 (2015). [CrossRef]

**2. **R. P. Millane, “Phase retrieval in crystallography and optics,” J. Opt. Soc. Am. A **7**(3), 394–411 (1990). [CrossRef]

**3. **J. C. Dainty and J. R. Fienup, “Phase Retrieval and Image Reconstruction for Astronomy,” Image Recover. theory Appl. **231**, 275 (1987).

**4. **L.-H. Yeh, J. Dong, J. Zhong, L. Tian, M. Chen, G. Tang, M. Soltanolkotabi, and L. Waller, “Experimental robustness of Fourier ptychography phase retrieval algorithms,” Opt. Express **23**(26), 33214–33240 (2015). [CrossRef]

**5. **L. Bian, J. Suo, G. Zheng, K. Guo, F. Chen, and Q. Dai, “Fourier ptychographic reconstruction using Wirtinger flow optimization,” Opt. Express **23**(4), 4856–4866 (2015). [CrossRef]

**6. **N. Ji, “Adaptive optical fluorescence microscopy,” Nat. Methods **14**(4), 374–380 (2017). [CrossRef]

**7. **M. J. Booth, “Adaptive optics in microscopy,” Philos. Trans. R. Soc., A **365**(1861), 2829–2843 (2007). [CrossRef]

**8. **D. Sage, T. A. Pham, H. Babcock, T. Lukes, T. Pengo, J. Chao, R. Velmurugan, A. Herbert, A. Agrawal, S. Colabrese, A. Wheeler, A. Archetti, B. Rieger, R. Ober, G. M. Hagen, J. B. Sibarita, J. Ries, R. Henriques, M. Unser, and S. Holden, “Super-resolution fight club: assessment of 2D and 3D single-molecule localization microscopy software,” Nat. Methods **16**(5), 387–395 (2019). [CrossRef]

**9. **D. Baddeley and J. Bewersdorf, “Biological Insight from Super-Resolution Microscopy: What We Can Learn from Localization-Based Images,” Annu. Rev. Biochem. **87**(1), 965–989 (2018). [CrossRef]

**10. **B. Huang, W. Wang, M. Bates, and X. Zhuang, “Three-dimensional super-resolution imaging by stochastic optical reconstruction microscopy,” Science **319**(5864), 810–813 (2008). [CrossRef]

**11. **Y. Shechtman, S. J. Sahl, A. S. Backer, and W. E. Moerner, “Optimal point spread function design for 3D imaging,” Phys. Rev. Lett. **113**(13), 133902 (2014). [CrossRef]

**12. **S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proc. Natl. Acad. Sci. **106**(9), 2995–2999 (2009). [CrossRef]

**13. **E. Hershko, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Multicolor localization microscopy and point-spread-function engineering by deep learning,” Opt. Express **27**(5), 6158–6183 (2019). [CrossRef]

**14. **Y. Shechtman, L. E. Weiss, A. S. Backer, M. Y. Lee, and W. E. Moerner, “Multicolour localization microscopy by point-spread-function engineering,” Nat. Photonics **10**(9), 590–594 (2016). [CrossRef]

**15. **A. Jesacher, S. Bernet, and M. Ritsch-Marte, “Colored point spread function engineering for parallel confocal microscopy,” Opt. Express **24**(24), 27395–27402 (2016). [CrossRef]

**16. **M. P. Backlund, M. D. Lew, A. S. Backer, S. J. Sahl, G. Grover, A. Agrawal, R. Piestun, and W. E. Moerner, “Simultaneous, accurate measurement of the 3D position and orientation of single molecules,” Proc. Natl. Acad. Sci. U. S. A. **109**(47), 19087–19092 (2012). [CrossRef]

**17. **A. S. Backer, “Computational inverse design for cascaded systems of metasurface optics,” Opt. Express **27**(21), 30308–30331 (2019). [CrossRef]

**18. **E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval via wirtinger flow: Theory and algorithms,” IEEE Trans. Inf. Theory **61**(4), 1985–2007 (2015). [CrossRef]

**19. **R. W. Gerchberg and W. O. Saxton, “Practical algorithm for the determination of phase from image and diffraction plane pictures,” Opt. **35**, 237–246 (1972).

**20. **J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. **21**(15), 2758–2769 (1982). [CrossRef]

**21. **P. N. Petrov, Y. Shechtman, and W. E. Moerner, “Measurement-based estimation of global pupil functions in 3D localization microscopy,” Opt. Express **25**(7), 7945–7959 (2017). [CrossRef]

**22. **P. Zhang, S. Liu, A. Chaurasia, D. Ma, M. J. Mlodzianoski, E. Culurciello, and F. Huang, “Analyzing complex single-molecule emission patterns with deep learning,” Nat. Methods **15**(11), 913–916 (2018). [CrossRef]

**23. **L. Möckl, P. N. Petrov, and W. E. Moerner, “Accurate phase retrieval of complex point spread functions with deep residual neural networks,” Appl. Phys. Lett. **115**(25), 251106 (2019). [CrossRef]

**24. **W. Wang, F. Ye, H. Shen, N. A. Moringo, C. Dutta, J. T. Robinson, and C. F. Landes, “Generalized method to design phase masks for 3D super-resolution microscopy,” Opt. Express **27**(3), 3799–3816 (2019). [CrossRef]

**25. **A. Aristov, B. Lelandais, E. Rensen, and C. Zimmer, “ZOLA-3D allows flexible 3D localization microscopy over an adjustable axial range,” Nat. Commun. **9**(1), 2409 (2018). [CrossRef]

**26. **F. Xu, D. Ma, K. P. MacPherson, S. Liu, Y. Bu, Y. Wang, C. Bi, T. Kwok, P. Yin, S. Calve, G. E. Landreth, and F. Huang, “Three dimensional nanoscopy of whole cells and tissues with in situ point spread function retrieval,” bioRxiv, 727354 (2019).

**27. **O. Zhang and M. D. Lew, “Fundamental limits on measuring the rotational constraint of single molecules using fluorescence microscopy,” Phys. Rev. Lett. **122**(19), 198301 (2019). [CrossRef]

**28. **L. Carlini, S. J. Holden, K. M. Douglass, and S. Manley, “Correction of a depth-dependent lateral distortion in 3D super-resolution imaging,” PLoS One **10**(11), e0142949 (2015). [CrossRef]

**29. **M. D. Lew, M. P. Backlund, and W. E. Moerner, “Rotational mobility of single molecules affects localization accuracy in super-resolution fluorescence microscopy,” Nano Lett. **13**(9), 3967–3972 (2013). [CrossRef]

**30. **M. Siemons, C. N. Hulleman, RØ Thorsen, C. S. Smith, and S. Stallinga, “High precision wavefront control in point spread function engineering for single emitter localization,” Opt. Express **26**(7), 8397–8416 (2018). [CrossRef]

**31. **L. Novotny and B. Hecht, * Principles of Nano-Optics* (Cambridge university press, 2012). [CrossRef]

**32. **A. S. Backer and W. E. Moerner, “Extending single-molecule microscopy using optical fourier processing,” J. Phys. Chem. B **118**(28), 8313–8329 (2014). [CrossRef]

**33. **B. M. Hanser, M. G. L. Gustafsson, D. A. Agard, and J. W. Sedat, “Phase-retrieved pupil functions in wide-field fluorescence microscopy,” J. Microsc. **216**(1), 32–48 (2004). [CrossRef]

**34. **P. Zelger, K. Kaser, B. Rossboth, L. Velas, G. J. Schütz, and A. Jesacher, “Three-dimensional localization microscopy using deep learning,” Opt. Express **26**(25), 33166–33179 (2018). [CrossRef]

**35. **R. J. Ober, S. Ram, and E. S. Ward, “Localization accuracy in single-molecule microscopy,” Biophys. J. **86**(2), 1185–1200 (2004). [CrossRef]

**36. **M. P. Backlund, Y. Shechtman, and R. L. Walsworth, “Fundamental Precision Bounds for Three-Dimensional Optical Localization Microscopy with Poisson Statistics,” Phys. Rev. Lett. **121**(2), 023904 (2018). [CrossRef]

**37. **J. Sun, Q. Qu, and J. Wright, “A Geometric Analysis of Phase Retrieval,” Found Comput Math **18**(5), 1131–1198 (2018). [CrossRef]

**38. **K. Kreutz-Delgado, “The Complex Gradient Operator and the CR-Calculus,” arXiv:0906.4835 (2009).

**39. **D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 (2014).

**40. **P. Zhao and T. Zhang, “Stochastic Optimization with Importance Sampling,” international conference on machine learning., 1–9 (2015).

**41. **D. Needell, N. Srebro, and R. Ward, “Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm,” Math. Program. **155**(1-2), 549–573 (2016). [CrossRef]

**42. **S. Gopal, “Adaptive sampling for SGD by exploiting side information,” 33rd Int. Conf. Mach. Learn. ICML 2016, 364–372 (2016).

**43. **G. Zheng, X. Ou, R. Horstmeyer, and C. Yang, “Characterization of spatially varying aberrations for wide field-of-view microscopy,” Opt. Express **21**(13), 15131–15143 (2013). [CrossRef]

**44. **Y. Shechtman, L. E. Weiss, A. S. Backer, S. J. Sahl, and W. E. Moerner, “Precise Three-Dimensional Scan-Free Multiple-Particle Tracking over Large Axial Ranges with Tetrapod Point Spread Functions,” Nano Lett. **15**(6), 4194–4199 (2015). [CrossRef]

**45. **S. Moser, M. Ritsch-Marte, and G. Thalhammer, “Model-based compensation of pixel crosstalk in liquid crystal spatial light modulators,” Opt. Express **27**(18), 25046–25063 (2019). [CrossRef]

**46. **E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, R. Orange, T. Michaeli, and Y. Shechtman, “ Dense three dimensional localization microscopy by deep learning,” arXiv:1906.09957 (2019).

**47. **M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM),” Nat. Methods **3**(10), 793–796 (2006). [CrossRef]

**48. **M. Heilemann, S. Van De Linde, M. Schüttpelz, R. Kasper, B. Seefeldt, A. Mukherjee, P. Tinnefeld, and M. Sauer, “Subdiffraction-resolution fluorescence imaging with conventional fluorescent probes,” Angew. Chem., Int. Ed. **47**(33), 6172–6176 (2008). [CrossRef]

**49. **Y. Li, Y. L. Wu, P. Hoess, M. Mund, and J. Ries, “Depth-dependent PSF calibration and aberration correction for 3D single-molecule localization,” Biomed. Opt. Express **10**(6), 2708–2718 (2019). [CrossRef]

**50. **J. Lin, O. G. Rodríguez-Herrera, F. Kenny, D. Lara, and J. C. Dainty, “Fast vectorial calculation of the volumetric focused field distribution by using a three-dimensional Fourier transform,” Opt. Express **20**(2), 1060–1069 (2012). [CrossRef]

**51. **J. W. Goodman, “Introduction to Fourier Optics 2nd edition,” (Roberts and Company Publishers, 1996).

**52. **B. Ferdman, L. E. Weiss, O. Alalouf, Y. Haimovich, and Y. Shechtman, “Ultrasensitive Refractometry via Supercritical Angle Fluorescence,” ACS Nano **12**(12), 11892–11898 (2018). [CrossRef]

**53. **L. Nahidiazar, A. V. Agronskaia, J. Broertjes, B. Den Van Broek, and K. Jalink, “Optimizing imaging conditions for demanding multi-color super resolution localization microscopy,” PLoS One **11**(7), e0158884–18 (2016). [CrossRef]

**54. **B. Ferdman, “VIPR - Vectorial Phase Retrieval for microscopy”, github (2020) https://github.com/Borisfer/VIPR-Vectorial-Phase-Retrieval-for-microscopy