With the recent emergence of slow-servo diamond turning, optical designs with surfaces that are not intrinsically rotationally symmetric can be manufactured. In this paper, we demonstrate some important limitations to Zernike polynomial representation of optical surfaces in describing the evolving freeform surface descriptions that are effective for optical design and encountered during optical fabrication. Specifically, we show that the ray grids commonly used in sampling a freeform surface to form a database from which to perform a φ-polynomial fit is limiting the efficacy of computation. We show an edge-clustered fitting grid that effectively suppresses the edge ringing that arises as the polynomial adapts to the fully nonsymmetric features of the surface.
©2011 Optical Society of America
There is a revolution occurring in the field of optical design that is being driven by two concurrent, but unrelated major developments that have forced the community to retool the fundamental surface shapes that are and will be used in optical design. The first development resulted from discovering that the power series definition of rotationally symmetric aspheric surfaces, which was introduced by Abbe over 100 years ago , is failing in part because of the lack of orthonormalization. This development has become apparent as new polishing methods, including small tool polishing, ion beam polishing, and MRF polishing come to be adopted throughout the industry resulting in the economic factors favoring, for the first time, the introduction of rotationally symmetric aspheres in many optical designs. The mathematical proposition of Q-type surfaces in place of power series to describe rotationally symmetric aspheres was made in the recent work of Forbes . The fabrication community that requires minimization of the maximum slope on any specified asphere drives these surface types. Of key importance in the development of these new surface descriptors is the fact that one of the preferred Q-type mathematical surface descriptions, Qbfs, yields normalizing coefficients that, when squared and summed, provide a direct computation of the RMS slope error. As a result of this computationally effective metric, it becomes possible, for the first time, to design optical systems with aspheres that comply with the fabricators guidelines on maximum testable slope. This effective slope constraint metric that arises directly from the Q-type surface descriptions has now been fully integrated in optical system design software with dramatic outcomes. As an example, Ma recently showed that the design of a 28 element lithographic lens with Q-type surfaces and an optimization integrated RMS slope constraint resulted in an order of magnitude decrease in overall sensitivity to tilts and decenters .
The second development is the introduction in the diamond turning community of the slow-servo axis , which allows, again for the first time, the controlled manufacture of optical quality surfaces that are not rotationally symmetric. As a result of this development, the optical design community is working to broaden the definition of optical surface shapes to leverage these new capabilities. The initial impact of this development has been to broaden the shape of optical surfaces from a conic surface plus power series terms, and occasionally an anamorphic factor to model astigmatic surfaces, to a conic surface plus the terms of the FRINGE Zernike polynomial, a specific surface definition in the φ-polynomial family that is traceable to current interferometric optical testing. Indeed, since their introduction by F. Zernike while developing the theory of phase-contrast in the 1930s , Zernike polynomials have emerged as a pervasive means of describing as-fabricated optical surface deformations and more recently to illustrate the field dependence of the wavefront aberrations in optical systems without rotational symmetry . In optical design and manufacturing, Zernike polynomial representations of surface departure, placed as an added layer on top of a conic surface, form an enabling fundamental basis. These polynomials are complete and orthogonal over the unit circle, whether the image forming optical system is rotationally symmetric or not. In addition, the lower-order terms are readily identified with the Seidel aberrations. These properties suggest that any departure could be represented in terms of Zernike polynomials. This property has led to their early adoption as a surface representation for the new nonsymmetric diamond-turning process enabled by slow-servo technology.
φ-polynomials, such as Zernike polynomials, play an important role in optical modeling, simulation, and design given their listed above properties. As a consequence, the optical testing industry has chosen to present the departure of a wavefront under test from the reference sphere created by the interferometer in terms of Zernike polynomials. This new level of surface complexity is consistent with evolving innovative optical designs and diamond turning and computer controlled fabrication processes and this formulation in particular has been shown to dramatically extend the performance space of certain classes of optical design (e.g. head worn displays and missile seekers). Fuerschbach demonstrated, in an unobscured three-mirror design that presents the largest planar entrance aperture in the smallest spherical volume, dramatic advances in the diffraction-limited field of view (from 3 to 10) and the operational f/number (from ~f/10 to ~f/2) of an existing production form as a result of extending the surface shapes to the φ-polynomial family . Even though the optical design community is just beginning to leverage these new opportunities for optical design with φ-polynomial surface shapes, it is not too soon to be looking ahead to their potential limitations when further extending surface shape descriptions to fully freeform surfaces. In this paper, the term freeform describes a conic surface with additive (or subtractive) polynomial departure that may not be rotationally symmetric (e.g. φ-polynomial) and in addition, which is a key point, we anticipate additive localized multi-centric functions that are described as radial basis functions .
The primary purpose of this paper is to illustrate why φ-polynomial based surface descriptions are limited in representing this ultimate general description for a freeform surface and to provide a glimpse at what properties the ultimate surface might have. In the context of a next and potentially final generation of freeform surface description, a topic of investigation is to determine how realistic it is to use φ-polynomial surfaces (Zernike polynomials being a special case) to simulate a small bump on an optical surface, such as shown in Fig. 1 , or multiple bumps as illustrated later in this paper, by simply extending the order of the Zernike polynomial until it has a spatial frequency that is sufficient to encompass the bump or bumps. While this paper will show that it is mathematically possible to fit the bump(s), as a practical matter it is not a useful approach within the current context of optical design where the number of effective parameters cannot extend into the thousands. As a precursor to other work that does address the methods for representing local defects in as-fabricated optical surfaces, a second aspect of this paper is to study efficient surface departure fitting strategies as an important step towards maintaining a simultaneously effective and highly accurate ray trace methodology. Stringent optics applications such as precision optics and lithography demand accuracies in representing the optics surface on the order of a nanometer or even sub-nanometers. In order to achieve these levels of accuracy, higher-order φ-polynomial terms have started to appear in optical surface parameterization. Of interest here is the extension of all aspects of the optical design and analysis environment to enable the investigation of where and when more advanced freeform surfaces, such as Zernike polynomials  and Radial Basis Functions (RBF)  will effectively extend the performance of an optical system.
There are two major bottlenecks in representing an optical surface sag or wavefront aberration function over the unit circle with φ-polynomials to the accuracies demanded for most stringent optics applications. One is the numerical ill-conditioning associated with computation of the higher-order polynomials that might be required to achieve an accurate representation of an as-fabricated optical surface. Forbes addressed this bottleneck with the development of three-term recurrence relations for Zernike polynomials . This recurrence relation also reduces the computational cost significantly. Second is the substantial number of Zernike terms required, sometimes thousands. Prior to arriving at the proper number of terms in the representation of the optical surface, intermediate results with an insufficient number of coefficients exhibit high-departure errors at the edges. Importantly, as will be shown in this paper, the rate of convergence to an adequate number of terms in the representation is sensitive to the surface sampling.
The major contribution of this paper is the application of an edge-clustered fitting grid to create the database for input to the least squares fitting process resulting in a Zernike polynomial approximant to the freeform surface. The impact of this fitting grid on the reduction of edge ringing and the improvement of surface-fit quality by several orders of magnitude compared to current fitting grids will be demonstrated. We compare the effectiveness of fit with commonly used grid types such as 1) a set of uniform size hexagonal subgrids centered on a uniform rectangular grid, 2) a polar grid with a Chebyshev-based radial sample points, and 3) a uniformly random point fitting grid to show the significance of the edge-clustered random fitting grid on the accuracy of surface approximation.
The remainder of this paper is organized as follows; Section 2 briefly summarizes the three-term recurrence relation developed by Forbes for Zernike polynomials, which will be used as a representative from the family of more general φ-polynomials. A detailed explanation of the least-squares procedure to map the freeform surface based on the sampled database created on the fitting grids to a Zernike polynomial representation is presented in Section 3. In Section 4, the surfaces that were created to be the benchmark cases for freeform surfaces are described. In Section 5, four fitting grids that supply the database for the least-square fitting process are detailed. Finally, numerical results that show RMS errors as a function of the number of Zernike polynomial coefficients for each test surface with different fitting grids fitting are reported, before concluding the paper.
2. The three-term recurrence relation for Zernike polynomials
As recently brought forth to the optics literature, recurrence formula must be applied in computing any polynomial from mild to higher orders to ensure an accurate representation of the polynomial. In this section, the recurrence relation of Zernike polynomials developed by Forbes  is briefly revisited. The recurrence relation for Zernike polynomials removes the numerical round-off errors that might lead to a numerical catastrophe in the computation of Zernike polynomials if the explicit definitions are followed in the computation of very high-order Zernike terms. More and more high-order Zernike polynomials might become necessary as more complex analysis is performed that goes beyond simply the resolution of the optics and expands to include the impact of a midspatial-frequency fabrication residual, as an example. A Zernike polynomial definition is given in Born and Wolf  as followsEq. (2 and 3) are related to some Jacobi polynomials. Specifically, the relation of Zernike polynomials toJacobi polynomials is also given in  asEq. (4).1 of Forbes  asEq. (5) and 6 significantly reduces the numerical round-off errors and ill-conditioning associated with the explicit expressions. In Fig. 2 , we have compared a high-order Zernike, (ρ,φ) computed with the explicit expression (see Eq. (2) and computed with the recurrence relation for Zernike’s (see Eq. (5 and 6). For a similar example, see also Fig. 1 of .
As demonstrated by Fig. 2a, a computation based on an explicit expression gives peak values caused by round-off errors on the order of 100, where the value of the radial polynomial should be 1. The recursively computed result for (ρ,φ) shown in Fig. 2b is free of these debilitating computational artifacts.
3. Least squares data fitting to create a Zernike polynomial surface
A freeform surface over a circular aperture may be represented as a function that is a weighted sum of Zernike basis functions that form a complete and orthogonal set over the unit circle asEq. (8) is used to find the coefficients, ck, associated with each Zernike polynomial at each wavelength and field point. Here, the goal is to represent the sag of a surface to be used as an exact representation of a freeform surface, to the extent possible.
The linear system given in Eq. (8) can be solved with backward substitution. The matrix A is constructed as follows; the matrix A is M by N, where N is the number of Zernike polynomial coefficients to be fit, and M is the number of sample points throughout the aperture. Each row corresponds to a sample point, where each Zernike polynomial is computed. Each column corresponds to a Zernike polynomial coefficient evaluated over all the points in the circular aperture. The coefficients ck are the weights multiplying the columns of the matrix A to match the surface sag vector, f. Once the coefficients are determined, the approximant can be evaluated at any point across the aperture as would routinely occur in evaluating the impact of the surface in an overall optical system design. In Fig. 3 , as an example, we have shown the 136 Zernike polynomials coefficients that are computed as the result of least squares fitting of a conventional rotationally symmetric asphere, which is shown in Fig. 4a and whose description is given in Eq. (9)a. As can be seen in Fig. 3, most of the coefficients are almost zero up to the working double precision limit, as expected. The non-zero valued coefficients are aligned with increasing orders of the Zernike polynomial terms that are affiliated with spherical aberration. Table 1 provides a list of the nonzero Zernike polynomial coefficients of Fig. 3.
4. The test surfaces
In order to investigate the effectiveness of different sampling distributions over the unit circle, we chose three different test cases for analytic surface shapes as a benchmark suite whose analytical expressions in millimeter units are given in Eq. (9)a-c. The second two test surfaces are designed to exercise features of a next generation freeform surface shape that include not only conic and polynomial terms but also includes multi-centric additive or subtractive functions. The first test surface is a common rotationally symmetric five-term aspheric mirror surface shown in Fig. 4a. The second test surface is an F/1 parabolic surface with three Gaussian bumps added of different heights and standard deviation (as shown in Eq. (9)b), aiming to be a general example of a multiple bump surface that might be encountered in optical surface manufacturing and illustrated in Fig. 4b. The third test surface is a Franke  test function shown in Fig. 4c that comes from the scattered-data approximation literature that is widely used to assess the approximation capabilities of different mathematical bases.
5. Hexagonal Grid, Chebyshev-Polar Grid, Uniform-Random Grid, and Edge-Clustered Grid fitting
In this section, we present four different fitting grid patterns that form the database content for the least squares fit to yield the Zernike polynomial coefficients that then are used to describe the surface sag at any point on the optical surface. The resulting Zernike polynomial definition of the surface are re-sampled during either an analysis of the optical system performance or during optical system optimization. When optical surfaces are specified by a polynomial function in an optical system design and analysis simulation environment, the predicted performance of the optical system is computed by an often sparse sample of the wavefront obtained by ray tracing. Typically, in a commercial raytrace code, uniform rectangular grids are used to form a sampled database for the wavefront at an individual wavelength and at an individual field point where the grid density is often in a user specified range between 64 x 64 and 1024 x 1024.
In this paper four different sampling grids are considered to create the database that is applied to transform the specially selected set of analytic test surfaces defined by a small number of coefficients to an equivalent surface, but now computed from a Zernike polynomial description in all cases. The four types of grids applied to the transformation process are 1) a set of uniform size hexagonal subgrids centered on a uniform rectangular grid (hex grid), 2) a polar grid with Chebyshev-based radial weighting (Cheby-polar grid), 3) a uniformly random-point grid (uni-random grid), and 4) an edge-clustered, random-point grid (e_clust-random grid). Each is illustrated in Fig. 5 .
The purpose of compiling and illustrating the evolution of the fit accuracy is to highlight the relationship between the spatial-frequency content of the test surface and the number of Zernike terms required. Moving forward, one of the features of the broader class of freeform surfaces will be an ability to introduce surface features with higher spatial-frequency content. With the hex grid setting, the unit circle is divided into regular hexagonal cells. The center point for each hexagonal cell is also included as a sample point. We have chosen a hexagonal grid structure for the uniform setting since a circular aperture is more uniformly covered with hexagons rather than rectangular cells. In the Cheby-polar grid, the sample points are placed along an expanding set of circles at the roots of the Chebyshev polynomial of degree N in the ρ direction. The points along the ρ direction are called Chebyshev abscissas. Therefore we call these circles Chebyshev circles. The radii of the circles are given as
The third grid type, uni-random, consists of simply randomly placed sample points. A point sampling function generates uniformly distributed random points across the unit square (see , Halton points). The fourth sampling method is termed e_clust-random point sampling. Here the radial coordinate of the random Halton points is modified with a sine function weighting to move them towards the boundary of the unit circle. If a point has coordinates (ρ,φ) then the corresponding clustered point hasas its coordinates. This type of edge-clustered sampling was previously used in the context of an RBF-QR method by Fornberg et al. . As will be shown, it is the fourth fitting grid that has clearly demonstrated the best efficacy. Figure 5 displays each of these grid types using approximately 450 points over the unit circle.
6. Results of efficacy of fitting the test surfaces with four different fitting grids
To study the efficacy of each grid type for determining the coefficients of a Zernike polynomial fit to each of the test surfaces, over an increasing set of coefficients, we have sampled each surface with the four different fitting grids, carried out the least squares approximation to create the Zernike polynomial fit and then evaluated the approximant at around 7000 points that are on a uniform grid as would be the case in any commercial lens design program. Then, RMS errors in the resulting approximant to the test surface, which quantifies the difference between the height of the test surface computed using Eq. (9) and the Zernike polynomial fit determined for each fitting grid type, are recorded in meters in order for 10−9 on the vertical scale to correspond to nanometers. For each approximation, the size of the approximation matrix is M by N, where N is the number of Zernike polynomials coefficients, and M is the number of samples over the circular aperture, where M ≈1.5N. We have chosen this ratio such that the computational cost is not too high in the approximation; meanwhile the number of samples is sufficient to avoid working within an interpolation setting (i.e. M = N). In Chebyshev-based clustering methods, this ratio has been studied in 1D, and optimum results are obtained when this ratio is around 1.5 for different bases .
In Fig. 6c , we have plotted the RMS errors in meters versus the number of point samples on a uniform grid for the four different fitting grids. This analytic test surface is a conventional rotationally symmetric test surface described in Eq. (9)a and shown in Fig. 6a-b, which is well approximated with all fitting strategies by using around 200 points. We see that initially, all the fitting grids achieve good RMS errors, and the geometry of the fitting grid has little influence on the quality of the approximation. As shown in Fig. 6c, all fitting grids quickly achieved sub-nanometer accuracies with less than 100 points. However, as the number of Zernike coefficients increases, the RMS error grows with hex grid, uni-random, and Cheby-polar grids, whereas the edge clustered random point set produced the most stable results with errors kept on the order of machine precision. As this surface is a conventional rotationally symmetric surface rather than a freeform surface, the number of Zernike polynomials is minimal, i.e. 66 terms, to reach the high accuracy levels demanded by precision or lithography applications (also see Fig. 3).
In Fig. 7b , we illustrate the RMS errors in meters versus the total number of point samples sampled on a uniform grid for the four fitting grids applied to fitting a Zernike polynomial to the F/1 parabola with three bumps described in Eq. (9)b. Results show that the hex grid and the uni-random grid produce large errors in the order of 10−5 m to 1m, which is huge compared to the sizes of the bumps on the F/1 parabola spanning 30 to 600 µm. This is caused by the dominant effect of the oscillations on the circular boundary when there is no edge weighting. On the other hand, applying either of the edge clustered sampling grids, Cheby-polar or e_clust-random, produces exponentially decaying errors as the number of sampling points increases. However, it is only with the e_clust-random fitting grid that we can obtain an approximant described with machine precision accuracy, as illustrated in Fig. 7b. The Cheby-polar grid produces a similar RMS error trend as that of the e_clust-random grid until the number of point samples reaches around 3000, after which it yields less accuracy. Only the edge clustered sampling grids achieved sub-nanometer accuracies, while the unclustered fitting grids could not even describe the surface with micron accuracies. Hex grid and uni-random grid point sampling produced consistently poor surface approximants when fitting Zernike polynomials to intrinsically nonsymmetric surfaces.
The reason that all of the fitting grids initially produce RMS errors around 5x10−5 m for the F/1 parabola with bumps is best captured in Fig. 8 ., where we show the resulting approximants with the hex grid fitting (top row) and e_clust-random fitting (bottom row) grids while increasing the number of sample points and in proportion the number of Zernike polynomial coefficients for the F/1 parabola with bumps. Results shows that the number of sample points used in the evaluation of the approximation, in this case, is so small that the bumps on the F/1 parabola are under-sampled. However, as the number of Zernike polynomial coefficients increases to better fit the surface (and in conjunction the number of sampling points used in the evaluation), the least squares process tries to match the sag values at more points with a higher number of Zernike polynomial terms, causing severe oscillations at the edges when distributions that are not edge clustered are used as seen in the upper row of Fig. 8. As the lower row of displays in Fig. 8 shows, the e_clust-random fitting grid successfully describes the surface without significant edge ringing, therefore producing much better approximants than a sampling without edge clustering.
The reason the edge clustered fitting grids produce better surface approximants can be explained by its success in reducing the boundary errors over the unit circle. In the same way that adjusting an equispaced grid to Chebyshev points are the remedy for the Runge type oscillations that results in using equispaced points in 1D, the edge clustered fitting grids act as a remedy for φ-polynomials edge ringing for surface shapes with offset localized structured such as will be seen with multi-centric radial basis functions.
In the process of optical design, due to the number of parameters, fields, wavelengths, etc. the number of sample points within any one ray set used in evaluating the metric for optimization is minimal, often less than 100. However, once a solution is established, the analysis of the performance is typically conducted with hundreds of thousands if not millions of ray samples per surface. In the context of the process being pursued here, the Zernike coefficients that characterize the surface are computed with the relatively sparse optimization grid. The result is that there is a spatial frequency content on the wavefront that will not typically be sampled during the optimization. This gap will result in a poor match to the wavefront by the surface model. It is necessary in fact, for the surfaces shown here, especially the parabola with a series of three bumps and the Franke surface, to greatly exceed the number of samples that can be applied in a typical lens design software if the distinguishing features are to be picked up in the surface models as revealed by the high resolution analysis that is readily conducted in a ray trace code. Additionally, as compiled here using a standalone code, the number of Zernike terms that needs to be available in the optimization procedure exceeds by thousands the parameterization sets available during an optical design, which currently rarely exceeds 100.
In Fig. 9 , we have plotted the RMS errors in Zernike approximants compared to the analytic equation as the number of Zernike coefficients increases for the Franke surface (Eq. (9)c), which is selected to be a stressing example of the important characteristics of a next generation optical freeform surface. Similar to the results presented for the F/1 parabola with bumps, edge clustered fitting grids produced excellent approximants with errors decaying exponentially with respect to the number of sample points used in the least squares approximation. Two to nine orders of magnitude improvement over the unweighted fitted grids was achieved. The reason the unweighted fitting grids produce large RMS errors as the number Zernike polynomial coefficients are increased is the edge oscillations must be controlled with edge weighting or clustering. Only the edge clustered fitting grids achieve sub-nanometer accuracy levels (i.e. 10−10 m) demanded by lithography applications. Cheby-polar fitting grids produced similar results to e_clust-random fitting grids until around 2800 points. After this point, Cheby-polar fitting grids produce approximants with growing RMS errors. The stable and exponentially decaying errors produced by the edge clustered sampling make this method the method of choice for fitting Zernike polynomials to freeform optics surfaces and is the key result of this paper.
In this paper, we have investigated the effect of edge clustering points towards the boundary of the unit circle when fitting Zernike polynomials to a general surface shape that represents a family of freeform optical surfaces. We demonstrated that these grids are effective and that the edge-clustered random-fitting grid is particularly effective. We have also compared this fitting grid with a hexagonal sub-grid spaced on uniform centers and a simple random fitting grid for optical surface approximation with Zernike polynomials. We have observed that edge-clustered fitting grids produced very good approximants, and improved the approximation performance by several orders of magnitude compared to that of fitting grids without edge clustering. For highly varying freeform surfaces (e.g. surfaces with no specific symmetry) like Franke, only the edge-clustered sampling method achieved sub-nanometer accuracies. The RMS errors produced by edge clustered fitting grids stably and exponentially decreased as the number of point samples was gradually increased for freeform optical surfaces. In the test cases considered, edge clustered random sampling has achieved in 2-D what a Chebyshev-radial spaced sample achieves in 1-D in removing the impact of edge ringing.
The work was supported by the National Science Foundation GOALI grant ECCS-1002179, the II-VI Foundation, and the NYSTAR Foundation C050070.
References and links
1. E. Abbe, Lens system. U.S. Patent No. 697,959, (April, 1902).
4. Y. Tohme and R. Murray, “Principles and Applications of the Slow Slide Servo,” Moore Nanotechnology Systems White Paper (2005).
5. F. Zernike, “Beugungstheorie des schneidenver-fahrens und seiner verbesserten form, der phasenkontrastmethode,” Physica 1(7-12), 689–704 (1934). [CrossRef]
6. J. P. Rolland, C. Dunn, and K. P. Thompson, “An Analytic Expression for the Field Dependence of FRINGE Zernike Polynomial Coefficients in Rotationally Symmetric Optical Systems,” Proc. SPIE 7790, (2010).
8. O. Cakmakci, B. Moore, H. Foroosh, and J. P. Rolland, “Optimal local shape description for rotationally non-symmetric optical surface design and analysis,” Opt. Express 16(3), 1583–1589 (2008). [CrossRef] [PubMed]
10. M. Born and E. Wolf, Principles of Optics, (Cambridge, 1999).
11. A. B. Bhatia, E. Wolf, and M. Born, “On the circle polynomials of Zernike and related orthogonal sets,” Proc. Camb. Philos. Soc. 50(1), 40–48 (1954). [CrossRef]
12. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, chap. 22, (Dover, 1972).
13. G. E. Fasshauer, Meshfree Approximation Methods with MATLAB (World Scientific Publishing, Singapore, 2007).
14. B. Fornberg, E. Larsson, and N. Flyer, “Stable computations with Gaussian radial basis functions,” SIAM J. Sci. Comput. 33(2), 869–892 (2011). [CrossRef]
15. R. Platte, Accuracy and Stability of Global Radial Basis Function Methods for the Numerical Solution of Partial Differential Equations, Ph.D. Thesis, (University of Delaware, 2005).