An iterative algorithm to analyze three-flat test data for absolute planarity measurements is presented. Using the properties of Zernike polynomial representations, results are achieved in a fast and effective manner. Details and demonstrative examples are provided.
© 2008 Optical Society of America
A task that interferometry is often required to accomplish in metrology laboratories is finding the absolute shape of a planarity standard. A most adopted method is by the three-flat test.  Flats are compared interferometrically two by two in the course of four or more measurements. Then appropriate data reduction procedures are implemented, ending up with the absolute shape of each flat. A useful approach to analysis is giving the source interferograms a mathematical representation by best fit to Zernike polynomials, thereafter computing the Zernike representation of the flats. [2, 3] The approach is straightforward in case the fourth measurement is obtained by azimuthal rotation of one flat by an assigned angle, yielding the mathematical result in closed form. Validity and accuracy of this method are generally acknowledged, although reservations remain as to the number of azimuthal rotations to be imparted (originally just one) and to their value. In some cases, increasing the number of polynomials (originally 37) or varying their choice would also be desirable. While the analytical approach could be modified to allow the above requests, the solving equations become more and more involved. A further problem that was pointed out in past works is that the interferometric data are to some extent uncorrelated, due to small variations of the flat shape from one measurement to another. [4,5] This required working out a least-squares procedure by constrained fitting, taking all the data into account at once and reducing the residuals to a minimum; the mathematical complications were anyway significant.
We have recently developed a numerical iterative procedure that works on pixel data instead of Zernike polynomials.  Starting from three trial surfaces, the absolute shape of the true surfaces is recovered by successively modifying the trial ones in a coded manner so that the given interferograms are accounted to a minimum residual error. The particulars of the technique are given in Ref. . In such a reference, examples are reported where interferograms are sampled on a circular pupil of 169 pixel diameter (22333 data points), and the solution is obtained in minutes of processing on a 1.5 GHz personal computer. Building upon our previous work, to which the reader is deferred for background information, as an addition to the same numerical approach based on iterations, here we present the implementation of a similar concept to Zernike polynomials, and compare the results with those of the pixel based approach. As a major outcome, dealing with a far smaller set of data – the Zernike coefficients – the analysis is significantly faster, producing the result in almost real time. For example, the computing task reported above required 90 s working on pixels; on the same computer, the same task with a 37-term Zernike representation takes less than 3 s. While the use of Zernike polynomials is limited to surface errors that are well represented by the terms selected, such a selection can be extended as appropriate with a minor overcharge to data processing. Also, the algorithm can easily be adapted to include more rotation angles and measuring combinations, as it occurs when one attempts to reduce the sources of uncertainty with multiple measurements. Here we analyze the basic case of four interferograms, briefly reviewing the numerical algorithm, and present in detail the peculiarities related to the use of Zernike polynomials. Examples of data processing are also provided.
2. Data processing with Zernike polynomials
The three flats are here named K, L, M. Their mathematical representation is written as
being Zj the Zernike polynomials of choice, N their number, and kj, lj, mj the coefficients to be determined. Although the procedure is at all general, to the purpose of illustration here we refer to a typical approach with four measurements, where the fourth is an azimuthal rotation of flat M. In formulas, we express such a rotation by a subscript R:
Similarly, a counter-rotation is expressed by a subscript -R. Interferometric measurements require that one flat is reversed front-to-back to face its counterpart, say, by rotation about the y-axis. We name this “flipping”, and use for it the subscript F; for example,
Naturally, double flip leads back to the unflipped pattern. Overall, the data set is written as
It is assumed that the coefficients (kF m)j, (lF m)j, (lF mR)j, (lF k)j are known after Zernike fitting to experimental data.
The flow chart of the data processing is outlined in Fig. 1. To start the numerical approach, three trial surfaces are selected; to a neutral choice, one may select perfectly plane surfaces, i.e., for all terms. Synthetic interferograms , , , are computed according to
(If at start the choice of plane surfaces was made, initially all the above will be zero). Differences between synthetic and real interferograms are then worked out; for example,
Thinking of the above difference as a vector in a N-dimensional space, the square modulus is
We define the error function EF as the sum of the square moduli of the differences so computed:
The above is the function we aim at minimizing numerically by recursive iteration. To this purpose, the initial set of trial surfaces is given incremental steps that tend to reduce the error function. The algorithm we use consists of scaling down the differences Δ(KFM), Δ(LFM), Δ(LFMR), Δ(LFK), and sharing between the flats that have produced them. However, while doing so, we undo flips and rotations, if any, so that each contribution is summed up consistently with the orientation of the trial flats. In formulas, the new trial surfaces are then written
where α, β are down-scaling factors (in computations presented in this paper we used α=1/20, β=1/30). New synthetic interferograms are computed according to Eq. (5), and the cycle is repeated until EF reaches a minimum. On the subject of the uniqueness of the minimum it is noted that, in terms of Zernike polynomials, the problem of solution finding is linear, so that no local (secondary) minima of the error function occur.  Incidentally, it is also noted that the expression of EF is just the one that is minimized in a least squares process when all the data are taken into account at once, so it is not surprising that the results, including the residuals if any, are found to be the same as with the constrained fitting approach mentioned above.
The convergence of the algorithm is discussed in appendix A. In particular it is shown that, by selecting appropriate values of α, β, the convergence rate of the radial Zernike terms can be made constant, and independent of the data. For all terms, directions to implement a steepest descent approach are provided. The latter leads to the solution with a reduced number of cycles by conveniently changing the values of α, β each cycle, but the computation task per cycle is more demanding, so that using instead a fixed pair of α, β values, and in case reducing them as EF approaches the minimum, can still be preferred as a numerical technique.
In addition to the detailed mathematical account given in appendix A, to gain further insight on how the algorithm works, one may consider a specific Zernike coefficient (ordering number j); for simplicity, here we refer to a pure radial term. The experimental data are
(rotation and flip do not affect pure radial terms), with kj, lj, mj the Zernike coefficients to be determined. After h iterations, let the trial surfaces be , , , and the trial Zernike terms then be
The application of the iterative algorithm as described, with α=1/20, β=1/30, leads to
In words, looking for instance at , after h+1 iterations one achieves a trial value that is made of nine tenths of the previous (trial) value , plus one tenth of the true value kj, so that there is more of the true value than in the previous iteration; the other addends are one twentieth of the difference between the true and the trial values of surface L, and one twentieth of the difference between the true and the trial values of surface M. However, since at increasing h also , contain more of the true values lj, mj, respectively, the above differences tend to zero, while the sum tends to kj. Similar considerations can be made for the other Zernike terms.
The iterative algorithm being presented uses two operations that are particularly simple with Zernike polynomials: flip and rotation. Flip only involves azimuthal terms, containing cos nθ or sin nθ, with n an integer; pure radial terms are unaffected. In particular, using surface K as an example, flip about the y-axis is obtained by the following rule:
Rotation is straightforward as well, due to the well known rotational properties of Zernike polynomials permitting to express the coefficients of a rotated map in exact closed form. Such properties let to overcome the difficulties previously encountered when processing square grids of pixel data, both as to software and theory. Software difficulties were due to somewhat involved computations that had to be performed, including a passage to polar coordinates, and a return to Cartesian. Theory difficulties were related to the fact that the rotated grid does not match exactly the unrotated one at the sampling points, so that in principle a solvable set of equations could not be written, nor a mathematically exact solution be achieved  (although with close sampling the remaining errors are negligible in practice). With Zernike polynomials, both difficulties relating to rotation are removed; also using the flip rule given above, an effective software program implementing the algorithm can then be readily worked out.
3. Numerical examples
We have tested the iterative approach both on simulated data, and with data from experiments. In the case of simulated data, using 49 Zernike terms, it is verified that reconstruction is achieved up to the digitization limit; the processing time is almost unappreciable (we use a personal computer with a 1500 MHz CPU). As to experimental data, we used four interferograms obtained in the laboratory from a set of three fused silica plates using a rotation angle of 54° for the fourth measurement. Data fitting to Zernike polynomials (37 terms) was first performed; subsequent analysis was carried out both with the iterative approach, and with the constrained fitting procedure. Iterations started with perfectly plane trial surfaces. In Fig. 2 the error function EF versus the iteration number is presented; normalization is made with respect to first iteration cycle. The curve shows regular and smooth convergence towards the minimum. The latter though is not zero, due to decorrelation residuals.
The topography of the surfaces reconstructed with the iterative algorithm and the constrained fitting procedure are compatible to sub-nanometer scale. Measured peak-to-valley (P-V) and root mean square (rms) errors for surfaces K, L, M are reported in Table 1. As to decorrelation residuals, a typical example obtained with the iterative algorithm [interferogram Δ(KFM)] is presented in Fig. 3(a). The corresponding residual from the constrained fitting procedure is shown in Fig. 3(b). It is noticed that the two patterns are reasonably similar; their difference map is given in Fig. 4, where P-V and rms values are 0.222 nm and 0.015 nm, respectively.
The example reported above, and others with different data sets, support that the effectiveness of the iterative algorithm achieves the level of constrained fitting. With respect to the pixel based approach, equivalence in terms of accuracy is achieved in case the source interferograms are well fitted by the Zernike terms in use. In other cases, high spatial frequency residuals are expected. Where the Zernike iterative approach can be of advantage over the pixel version is in terms of time required to obtain the result; in particular, significant reductions of the processing time can be obtained with large size data sets. This feature can be of value in case of quick measurements, or routine tests in a production laboratory.
The problem of solving for the Zernike coefficients of the surfaces K, L, M can be formulated as a general least squares fitting to a polynomial representation, and treated with classical linear regression methods. However such an approach, proposed in the past literature to deal with data redundancy,  does not take into account decorrelation phenomena that arise because of various sources of instability, such as mechanical strain, thermal behaviour of the flats in the course of the measurements,  and relocation errors. To proceed with linear regression, one should extract the best compatible data set first, purging the source data from decorrelation contributions. To do so, the general method is by the use of Lagrangian multipliers.  As an alternative technique, one may link the data equations with appropriate constraints, and carry out a constrained fitting procedure as described in Ref. . In any case, the mathematical complication is significant. The iterative method is instead far simpler and versatile.
As to validation, in addition to checks reported in the previous section, we compared our results with the absolute exact figure that can be computed along the line of flipping (the y-axis in our case) with standard methods based on three interferograms.  As an example, using the same experimental data mentioned above, Fig. 5 shows the section profiles that are obtained with the two approaches for the case of surface L. The peak-to-valley of the difference is 1.16 nm, and the rms of the difference is 0.23 nm. While such values are quite small, one may notice the filtering action of the Zernike representation with respect to the high spatial frequency content of the surface profile.
Although the example just presented refers to four measurements, the iterative approach can be extended to more measurements by adding contributions as appropriate to the error function [Eq. (8)] and to the incremental steps [Eq. (9)]. Great versatility over the constrained fitting procedure is thus gained. While though surfaces K, L, M are rapidly and precisely reconstructed, it is noted that the end accuracy of the measurement is a different matter, involving a physical assessment of the various sources of variability of the interferometric data, and the basic metrologic approach itself in terms of available information. The uncertainty balance should then be made considering the actual measuring process from which the data are originated, their repeatability and reproducibility, as well as any other cause of uncertainty that applies, such as the class of the interferometer, the properties of the relevant materials, and more (“type B” uncertainty contributions).10 While multiple rotation angles and different measuring combinations can be used with the purpose of reducing the above uncertainty contributions, the iterative algorithm offers a means to include all the interferograms available in a single computing process that works out the best compatible surfaces in least squares sense. To do so with pixel based iteration or a suitably modified constrained fitting the computing task or the mathematical complication may become excessive. On that respect the Zernike iterative algorithm is simple and fast enough to conveniently deal even with major sets of data, up to the spatial resolution established by the kind and number of Zernike terms selected.
As to data reduction errors produced by the algorithm itself, these are of the same order of magnitude as those from constrained fitting, remaining anyway negligible with respect to those produced by other uncertainty contributions.
Appendix A: Convergence of the iterations
Pure radial Zernike terms
We refer to a specific Zernike coefficient (ordering number j), and in particular to a pure radial term (azimuthal terms will be addressed afterwards). To a simpler notation, within this appendix we use the following symbols:
(rotation and flip do not affect pure radial terms), with kj, lj, mj the Zernike coefficients to be determined. After h iterations, still to a simpler notation, the coefficients of the trial surfaces are here written
The trial Zernike terms are indicated as
The error function EF is then
Using primed symbols to refer to the subsequent iteration, according to the algorithm it is
where α, β are numerical coefficients to be determined for convergence of the iterations. It is then written
The pertaining error function is
The expressions of A’, B’, D’ are substituted into the above EF’:
Carrying out the operations in detail, at the end one obtains
Convergence occurs for EF′<EF. Such inequality is certainly met in case the following conditions are fulfilled:
- the coefficients of the cross products (a-A)(b-B), (a-A)(d-D), (b-B)(d-D) are all zero,
- the coefficients of the quadratic terms (a-A)2, 2(b-B)2, (d-D)2 are all smaller than unity.
The condition 1) is expressed by the system of equations
This system provides two solutions for the pair α, β. The first solution is α=1, β=0, and is discarded for obvious reasons. The second solution is α=4/19, β=6/19, and is then tested for compliance with condition 2). In fact, substituting into the coefficients of the quadratic terms, one obtains
Clearly then, for α=4/19, β=6/19, the iteration process is converging, with the error function approximately halving each cycle. In addition, the convergence rate is independent of the data. Slightly departing from α=4/19, β=6/19 (in the paper we used α=1/20, β=1/30), the convergence rate depends to some extent on the data, and varies a while from one iteration to next.
In general, since the error functions EF, EF′ are expressed in their explicit analytical form, the inequality EF′<EF can be taken as the statement of the condition to be fulfilled for convergence. If desired, a technique of steepest descent can be implemented within the iteration cycle by computing the values of α, β that produce the maximum decrease of the error function. To this purpose the partial derivatives , are evaluated and set to zero. So doing on Eq. (A.9) one obtains the system of equations
then evaluating the values αsd, βsd of steepest descent:
In numerical simulations, the above values are quite effective to get to the result, and may require some de-emphasis to better looking at the convergence process (for example using α=0.9 αsd, β=0.9 βsd in place of αsd, βsd). It is also noted that, with the technique of steepest descent, while the number of iteration cycles decreases, the operations to be carried out each cycle increase, so that one might consider whether using instead fixed values of α, β would be more convenient.
Azimuthal Zernike terms
The same analysis carried out with pure radial Zernike terms can be repeated with azimuthal terms. In such a case, however, one has to consider Zernike orders in pairs, since the rotation formulae mix sine and cosine terms. Also, flip rules are to be applied according to parity. The error function is correspondingly made of more terms (32 terms, in place of the 6 terms above), also including cross factoring between the paired orders. Conditions for convergence can though be expressed as before, and verified in cases of interest. In detail:
We now refer to a pair of Zernike coefficients (ordering numbers p=j for the sine term, q=j+1 for the cosine term), relating to the same azimuthal frequency nθ. We first consider the case of n odd. As to rotation by an angle φ (we use φ=54°), referring for example to surface M, the properties of Zernike polynomials let to write
In the following, for simplicity we will use the symbols ρ=cos φ, σ=sin φ. The experimental data are
The trial Zernike terms are
where use has been made of the rotation formula, and of the flip rules for n odd. The error function is
Using again primed symbols to refer to the subsequent iteration, according to the algorithm it is written
The error function becomes
Carrying out the operations in detail, at the end one obtains
Again, convergence occurs for EF′<EF, and this can be taken as a condition statement. However, looking at Eq. (A.21), it is not possible to null all the coefficients of the cross terms altogether, and get a convergence rate constant and fully independent of the data. In analogy with computations carried out for pure radial terms based on partial derivatives, one may find though conditions for steepest descent. The quantities of concern now become
The values of αsd, βsd for steepest descent can then be computed with the same Eq. (A.14), and included in the iteration cycle. Naturally, due to the number of terms in Eq. (A.22), one might prefer using fixed values for α, β, as presented in the paper.
The case of n even is treated in a manner similar to that of n odd. In fact, looking at the flip rules and at the iteration algorithm, in the source Eq. (A.17) it is sufficient to exchange the p and q indices, and change σ to −σ, while all the following equations remain formally the same.
References and links
1. Lord Rayleigh, “Interference bands and their application,” Nature (London) 48, 212–214 (1893). [CrossRef]
2. R. E. Parks, “Removal of test optics errors,” Proc. SPIE 153, 56–63 (1978).
3. B. S. Fritz, “Absolute calibration of an optical flat,” Opt. Eng. 33, 379–383 (1984).
4. V. B. Gubin and V. N. Sharonov, “Absolute calibration of spherical surfaces,” Sov. J. Opt. Technol. 57, 554–555 (1990).
5. V. Greco, R. Tronconi, C. Del Vecchio, M. Trivi, and G. Molesini, “Absolute measurement of planarity with Fritz’s method: uncertainty evaluation,” Appl. Opt. 38, 2018–2027 (1999). [CrossRef]
7. J. Grzanna and G. Schulz, “Absolute testing of flatness standards at square-grid points,” Opt. Commun. 77, 107–112 (1990). [CrossRef]
8. V. Greco and G. Molesini, “Micro-temperature effects on absolute flatness test plates,” Pure Appl. Opt. 7, 1341–1346 (1998). [CrossRef]
9. S. Brandt, Statistical and Computational Methods in Data Analysis, 2nd ed. (North-Holland, Amsterdam, 1970), p. 216.
10. International Bureau of Weights and Measures, International Electrotechnical Commission, International Federation of Clinical Chemistry, International Organization for Standardization, International Union of Pure and Applied Chemistry, International Union of Pure and Applied Physics, and International Organization of Legal Metrology, Guide to the Expression of Uncertainty in Measurements (International Organization for Standardization, Geneva, 1993).