## Abstract

Mathematical methods that are poorly known in the field of optics are adapted and shown to have striking significance. Orthogonal polynomials are common tools in physics and optics, but problems are encountered when they are used to higher orders. Applications to arbitrarily high orders are shown to be enabled by remarkably simple and robust algorithms that are derived from well known recurrence relations. Such methods are demonstrated for a couple of familiar optical applications where, just as in other areas, there is a clear trend to higher orders.

© 2010 OSA

## 1. Introduction

Many of the special functions used in optics satisfy three-term recurrence relations. These include trigonometric functions, Bessel functions (including modified Bessels and Hankels), and various orthogonal polynomials. Although most of the ideas discussed in what follows apply for applications of any in this class of functions, I concentrate on orthogonal polynomials in particular. Among the better known orthogonal polynomials in optics are the Zernike polynomials [1] (for characterizing a function defined on a disc, or modified for cases with non-uniform weighting, or for annular domains) and the Laguerre and Hermite polynomials [2] (for characterizing profiles of transversely localized beams). Traditionally, such polynomials were only ever needed to modest orders, say up to ten or so terms. (Note that, in the case of Zernikes, there are distinct polynomials for each azimuthal order, so the total number of terms is often more than thirty, but these hold a more modest number of terms for each azimuthal order.) This has meant that explicit expressions have been widely used for the polynomials, and they have generally been perfectly adequate.

The inclusion of higher-order terms in such decompositions is increasingly being used to boost capabilities. Two considerations limit progress in this direction, however. First, the explicit expressions for the polynomials lead to numerical round-off problems that become catastrophic when the number of terms reaches around 20 —a lesson sometimes learned the hard way. Second, computational efficiency (i.e. arithmetic operation count) remains a vital consideration in applications such as optical design: despite the staggering growth of computer power, multi-dimensional global optimization is so challenging that its results will always benefit from efficiency gains. The methods discussed below lead to simple code that is remarkably efficient while also avoiding round-off failures.

Any set of orthogonal polynomials, say $\left\{\text{\hspace{0.17em}}{P}_{m}\text{\hspace{0.17em}}\right\}$ where ${P}_{m}(x)$ is a polynomial of order *m*, is defined so that

A result that is of central importance in this work follows upon considering the expansion of the function defined by $S(x):=x\text{\hspace{0.17em}}{P}_{n}(x)$. On account of Eq. (1.1), $<\varphi ,\text{\hspace{0.17em}}{P}_{m}>$ vanishes when *ϕ* is *any* polynomial of order less than *m*. It follows that ${s}_{m}\text{\hspace{0.17em}}=\text{\hspace{0.17em} \hspace{0.17em}}<x\text{\hspace{0.17em}}{P}_{n},\text{\hspace{0.17em}}{P}_{m}>\text{\hspace{0.17em} \hspace{0.17em}}=\text{\hspace{0.17em} \hspace{0.17em}}<{P}_{n},\text{\hspace{0.17em}}x\text{\hspace{0.17em}}{P}_{m}>$ vanishes unless $|m-n|\text{\hspace{0.17em}}\le 1$. This observation can be expressed as

*q*,

*r*, and

*s*, are constants and $q\ne 0$. Equation (1.5) can be re-arranged to give the standard form for the three-term recurrence relation that underpins the following sections:

Since the well known orthogonal polynomials in optics are among the standard set, the existence of the associated recurrence relations, generating functions, Christoffel-Darboux identities, etc. has often been noted and occasionally rediscovered. In the case of Zernikes, see [4–7] as examples. Some of the associated recurrence relations have been adopted in the field of image processing, as reviewed in [8]. Even so, in most applications of Zernikes, significant effort —and space— is devoted to working with the explicit expressions for the polynomials of moderate orders. *It has not been generally appreciated that, in practice, this is a road to grief.* With *r* as the radial coordinate on a unit disc, each element of a Zernike decomposition involves $\mathrm{cos}m\theta $ or $\mathrm{sin}m\theta $ times ${r}^{m}\text{\hspace{0.17em}}{Z}_{n}^{m}({r}^{2})$. These polynomials for *m*’th azimuthal order are related to the Jacobi polynomials by ${Z}_{n}^{m}(x)={P}_{n}^{(0,m)}(2x-1)$, e.g. see Eq. (3.7) of [4]. The domain of interest is $0\le x\le 1$. The explicit expression that is the focus of many applications in such work is

*n*exceeds about 20. Accuracy can be seen to suffer long before that point of total catastrophe and, as demonstrated in Fig. 1 , this problem is avoided by using Eq. (1.6). The details of this particular recurrence relation are given in Section 4, and the stability of such relations is discussed in, for example [9].

Another benefit offered by Eq. (1.6) can be appreciated when evaluating a function expressed as in Eq. (1.2) with the sum truncated at, say, $m=M$. Even when nested according to the Horner scheme, the evaluation of ${P}_{m}(x)$ requires about $2m$ arithmetic operations. In this way, assuming all the coefficients are pre-computed, the evaluation of $S(x)$ therefore requires about ${M}^{2}$ arithmetic operations. When Eq. (1.6) is employed, on the other hand, each of the basis elements is then determined with just five arithmetic operations, so only $5M$ are required for the entire set. In either case, forming the linear combination that constitutes $S(x)$ takes another $2M$ operations, but the main point to be made is that an “$O({M}^{2})$ process” is converted to one of $O(M)$. This comes on top of the vital accuracy gain demonstrated in Fig. 1. More importantly, however, the focus of this paper is on three additional steps that were developed decades ago by Clenshaw [10], Smith [11], and Salzer [12], respectively, to extract additional benefits from Eq. (1.6). Their elegant, but little used, ideas are given unified derivations below and are adapted to the context of optics.

## 2. Linear combinations of orthogonal polynomials and their derivatives

An obvious application of Eq. (1.6) is for functions defined as a truncated sum of the type in Eq. (1.2), say

*this is an elegant short-circuit from the spectrum to the function value.*When nested as a Horner scheme, evaluating a regular polynomial of order

*M*requires only $2M$ operations. A factor of three in operation count is evidently the minimal cost that must be paid to avoid catastrophic numerical instability. There are significant side-benefits, however, and a sample is commented on in one context in Section 5. Many of them follow from the fact that the rms value of a function is just the square root of the sum of the squared coefficients, as in Eq. (1.4).

Clenshaw’s process turns out to open the way to an equally effective scheme for computing derivatives. Although Eq. (1.6) can be differentiated to find a recurrence relation for ${{P}^{\prime}}_{n}(x)={\scriptscriptstyle \frac{\text{d}}{\text{d}x}}{P}_{n}(x)$, a more direct determination of ${S}^{\prime}(x)$ can be performed with a simple variant of Eq. (2.2). In fact, the same thing works for higher derivatives. If the *j*’th derivative is written as ${S}^{(j)}(x)$, it is proven in Appendix B that if we start with ${\alpha}_{M-j+1}^{(j)}=0$ and ${\alpha}_{M-j}^{(j)}\text{\hspace{0.17em} \hspace{0.17em}}=\text{\hspace{0.17em} \hspace{0.17em}}j\text{\hspace{0.17em}}{b}_{M-j}\text{\hspace{0.05em}}{\alpha}_{M-j+1}^{(j-1)}$, then

Something equivalent to Eqs. (2.3) was presented by Smith [11], but with minimal derivation. The argument in his paper seems to proceed via differentiation of Eq. (1.6) instead of Eq. (2.2), as done in Appendix B. He presents two intermediate expressions that appear to complicate any interpretation of the process, and inconsistently applies a notational convention for higher derivatives. [For example, Smith presents something akin to Eq. (2.3), but without the factor of *j* that appears immediately after the equals sign.] The confusion this may have caused could explain the fact that his beautiful result seems to have been ignored in much of the related literature, even in numerical classics like [9]. Similarly, Clenshaw’s result has also been poorly appreciated in physics and optics and, again, it may be no coincidence that I have been unable to find a clear derivation of Eq. (2.2) in the literature. This is what motivated me to develop Appendices A and B. It is also interesting to note that Doha [13] and Barrio [14] both offer some interesting analyses and alternative results, although they are specific to Jacobi polynomials: Doha re-expresses ${S}^{(j)}$ as a linear combination (with constant coefficients) of the original basis elements, i.e. in terms of ${P}_{m}$ instead of ${P}_{m}^{(j)}$, while Barrio gives ${S}^{(j)}$ without requiring all lower derivatives. The fact that the derivatives of Jacobi polynomials are once again Jacobi polynomials, opens the door for their special results. Thankfully, the simple process described above applies more generally to any polynomials that satisfy Eq. (1.6) and Eq. (2.3) is all that is required in our context to determine derivatives.

## 3. Change of basis

When a function is expressed in terms of the basis $\left\{\text{\hspace{0.17em}}{P}_{m}^{\text{I}}\text{\hspace{0.17em}}\right\}$ as

Salzer’s process is derived in Appendix C in terms of what can be regarded as an $(M+1)\times (M+1)$ matrix with elements ${\alpha}_{k}^{n}$ [not to be confused with the differentiated polynomial written as ${\alpha}_{n}^{(j)}$ in Section 2]. It is convenient to introduce some auxiliary matrices in terms of the constants from the recurrence relations of Eqs. (3.3), namely

In some applications, it may be appropriate for these to be pre-computed and stored. It turns out that ${\alpha}_{k}^{n}$ is triangular with $(M+1)(M+2)/2$ non-zero elements: ${\alpha}_{k}^{n}=0$ unless $0\le k\le M-n$. This property can be used to simplify the penultimate equation of Appendix C, namely Eq. (C.6). At $k=0$, one term drops out leaving

Equations (3.5)-(3.8) allow each ${\alpha}_{k}^{n}$ to be determined in no more than seven arithmetic operations. The roughly ${M}^{2}/2$ non-zero elements can therefore be determined with about $3.5{M}^{2}$ operations. Notice that, as for the processes in Section 2, this change of basis is achieved without evaluating a single member of either basis set. More significantly, there is no need to determine integrals involving the basis functions as you might expect from Eq. (1.3). It is a powerful and efficient shortcut. Its value is enhanced by the fact that the bases need not be orthogonal polynomials; all that is required is that they satisfy a three-term recurrence. For example, the regular monomial basis $\left\{\text{\hspace{0.17em}}{x}^{m}\text{\hspace{0.17em}}\right\}$ obviously satisfies Eq. (1.6) with ${b}_{n}=1$ and ${a}_{n}={c}_{n}=0$, and this is exploited in Section 5.

## 4. Applications for Zernike polynomials

As stated in the Introduction, the Zernike polynomials of *m*’th azimuthal order are related to the Jacobi polynomials by ${Z}_{n}^{m}(x)={P}_{n}^{(0,m)}(2x-1)$. It follows that ${Z}_{n}^{m}(x)$ satisfies a recurrence relation like Eq. (1).6). With $s:=m+2n$, this relation has

*r*) to ${\sum}_{n}\text{\hspace{0.17em}}{t}_{n}\text{\hspace{0.17em}}{(r/\epsilon )}^{m}\text{\hspace{0.17em}}{Z}_{n}^{m}({r}^{2}/{\epsilon}^{2})$.

The authors of [16–20] have started from Eq. (1.7) and tackled laborious algebra in order to derive explicit expressions for the matrix elements in the change-of-basis matrix. The problem is that their results take an almost identical form to Eq. (1.7) —see e.g., Eq. (19) of [19] or Eq. (29) of [20]. These matrix elements are polynomials in ${\epsilon}^{2}$ and, while the expressions are exact to within round-off errors, they nevertheless fail as catastrophically as Eq. (1.7) when the number of terms grows to 20 or more. To make matters worse, this failure is most extreme for $\epsilon \approx 1$, which is precisely where the expressions are typically used. An elegant, specialized alternative is given in [21] where the matrix elements are themselves expressed as differences between two Zernikes. Their impressive derivation relies on two integral relations involving Bessel functions. A similarly robust and efficient scheme follows simply from the more general results given in the previous section, however.

This direct application of the results of Section 3 involves finding ${t}_{n}$ so that

*x*in the second basis. The end result is then evidently just ${t}_{n}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}{\epsilon}^{m}\text{\hspace{0.17em}}{\alpha}_{n}^{0}$. In this way, Salzer’s elegant general-purpose routine means that there was no need to wrestle with special functions or explicit high-order expansions. More importantly, the failure of the explicit expressions is thereby avoided. It is also interesting to notice that, for this application, ${f}_{k}^{n}$, ${g}_{k}^{n}$, and ${h}_{k}^{n}$ of Eqs. (3.4) are linear functions of ${\epsilon}^{2}$. This means that ${\alpha}_{k}^{j}$ is a polynomial of order $M-j$ in ${\epsilon}^{2}$ and, just as in Eq. (B.3), Eqs. (3.5)-(3.8) can be differentiated with respect to ${\epsilon}^{2}$ to obtain a recurrence for determining the derivative of ${t}_{n}$ with respect to ${\epsilon}^{2}$. Such an entity is used for sensitivity analysis in [21].

## 5. Applications for asphere shape specification

Advances in both fabrication and metrology are enabling the introduction of more complex and precise aspheric optical surfaces. This trend demands larger numbers of terms in the polynomial used in the standard characterization of surface shape. The methods described above are ideally suited to one of the orthogonal bases that were recently introduced for efficient characterization in this context [22]. If *c* denotes the axial curvature and *κ* the conic constant, the surface’s sag is written as

*u*is just the normalized radial coordinate, i.e. $u:=\rho /{\rho}_{\mathrm{max}}$ with ${\rho}_{\mathrm{max}}=\text{CA}/2$. A sample of these basis members is plotted in Fig. 2 . It so happens that ${Q}_{m}^{\text{con}}(x)\equiv {Z}_{m}^{4}(x)$, so the recurrence relation for this set follows from Eqs. (4.1) with $m=4$:

The methods of Sections 2 and 3 are ideally matched to working with aspheres in these terms. In this way, the potentially chronic round-off problems are localized to the basis members themselves, and tamed there by using recurrence relations. As discussed in [22], the fact that the coefficients now function like a spectrum has a variety of benefits.

Upon introducing $S(x):={\displaystyle {\sum}_{m}\text{\hspace{0.17em}}{s}_{m}\text{\hspace{0.17em}}{Q}_{m}^{\text{con}}(x)}$ and $\varphi ={[1-(1+\kappa )\text{\hspace{0.17em}}{c}^{2}\text{\hspace{0.17em}}{\rho}^{2}]}^{1/2}$, Eq. (5.1) and its derivatives become

Perhaps more interestingly, the conventional manner to describe surfaces is by expressing the additive polynomial of Eq. (5.1) as a sum of monomials, say

## 6. Concluding remarks

Four basic processes are described in this work, namely the evaluation of (i) a set of orthogonal polynomials, (ii) a linear combination of them, (iii) a linear combination of their derivatives, and (iv) changing from one such basis to another. These regularly enter optics, but much of the related work was focused on explicit expressions for the polynomials. It has not been widely appreciated that, although they are exact, these expressions fail numerically when the number of terms grows much beyond ten or so. Further, these decompositions also tend to become computationally intensive as the number of terms grows. Robust and efficient alternatives were presented here in each case. None of these alternatives for (ii)-(iv) require the evaluation of a single orthogonal polynomial; they all operate directly upon the coefficients of the expansion and lead to simple code that avoids round-off problems.

Although these methods were introduced decades ago, it appears that they were ahead of their time. They now open the door to benefits delivered by larger numbers of terms in various contexts including modeling based on Gaussian-beam-mode decompositions, various applications of Zernike polynomials, or optical design based on the methods of Section 5. The two-step induction-based process that was introduced in the Appendices clarifies how these methods can be generalized to other contexts. In my view, the work of Clenshaw, Smith, and Salzer certainly deserves to be better appreciated in optics, and more widely known in general. Many of us can then benefit significantly by having faster, more versatile code that does not suffer from catastrophic round-off failure and can proceed to arbitrary orders.

## Appendix A: Directly evaluating linear combinations

The three-term recurrence relations that underpin the key results in this work can be expressed as

*n*in Eq. (A.3) with $n-1$ and comparing the result with Eq. (A.4), it is then evident that

*S*can then be determined simply with Eq. (A.7).

Things can be simplified a little further by using Eqs. (A.6) and (A.5), respectively, to eliminate the *β*’s from Eqs. (A.5) and (A.7):

*P*’s, this is not effective for the optical applications considered here. It is important to note, however, that [9] gives a useful treatment of applications involving trigonometric and Bessel functions as well as a sketch of stability analysis for recurrence-based processes. For many cases of interest, including those considered in Sections 4 and 5, it turns out that ${v}_{0}=0$ hence ${P}_{1}\text{\hspace{0.17em} \hspace{0.17em}}=\text{\hspace{0.17em} \hspace{0.17em}}{u}_{0}\text{\hspace{0.05em}}{P}_{0}$, and the conventional normalization is ${P}_{0}\text{\hspace{0.17em} \hspace{0.17em}}\equiv \text{\hspace{0.17em} \hspace{0.17em}}1$. As a result, Eq. (A.9) then reduces to an even more beautiful result:

## Appendix B: Directly evaluating derivatives of linear combinations

When ${P}_{m}$ in Eq.(A.2) is a polynomial of order *m* in *x* and each ${s}_{m}$ is independent of *x*, Smith [11] presented a scheme to compute the derivative of that sum with respect to *x*. With primes denoting derivatives, we now seek

*x*, differentiating Eq. (A.8) now leads directly to

*S*was evaluated beforehand—${S}^{\prime}$ can be evaluated just as robustly and with much the same number of arithmetic operations.

Of course, if ${v}_{0}\ne 0$, it is the derivative of Eq. (A.9) that yields the desired end result, but such cases aren’t needed in this work. What is more, even when ${u}_{n}$ and ${v}_{n}$ are more general functions of *x*, it is straightforward to use the derivative of Eq. (A.8) to get a slight generalization of Eq. (B.3). For the most common case laid out above, it is also clear that higher derivatives can be determined in much the same way: if a superscript in parentheses is used to denote higher derivatives, it follows trivially from Eq. (A.8) that

## Appendix C: Directly changing to a new basis

The inter-conversion considered in Sec. 3 can be carried out by, much as in Appendix A, progressively eliminating ${P}_{m}^{\text{I}}$ from the top end of the sum in Eq.(3.1). In this case, however, ${\alpha}_{n}$ is to be expresses as a linear combination of $\left\{\text{\hspace{0.17em}}{P}_{m}^{\text{II}}\text{\hspace{0.17em}}\right\}$ rather than as the nested polynomials of Appendix A. The intermediate result that replaces Eq.(A.3) is written as

*α*’s and

*β*’s are now just constant coefficients. By using Eq. (3.3a), the first term on the right-hand side of Eq. (C.1) can be re-expressed as

Upon gathering the terms of Eq. (C.4) that involve ${P}_{n-1}^{\text{I}}$ and, separately, those that involve ${P}_{n}^{\text{I}}$, the result can be equated to Eq. (C.1) with *n* replaced by $n-1$. The first relation that emerges is simply

*k*except ${\delta}_{00}=1$. As done for the

*β*’s in Appendix A, Eq. (C.5) makes it easy to eliminate ${\beta}_{k}^{n}$ from the second relation that emerges in order to determine an analogue for Eq. (A.8), namely a recurrence relation for ${\alpha}_{k}^{n}$:

*k*. With this as a given, Eq. (C.6) can be used, for fixed

*n*, by running over $0\le k\le M-n$. The process starts at $n=M$ and works down to $n=0$ in order to yield the desired end result.

## References and links

**1. **M. Born, and E. Wolf, *Principles of Optics* (Cambridge, 1999), see Sec. 9.2 and Appendix VII.

**2. **A. E. Siegman, *Lasers* (University Science Books, 1986), Chaps. 16–17.

**3. **M. Abramowitz, and I. Stegun, *Handbook of Mathematical Functions* (Dover, 1978), Chap. 22.

**4. **A. B. Bhatia, E. Wolf, and M. Born, “On the circle polynomials of Zernike and related orthogonal sets,” Proc. Camb. Philos. Soc. **50**(01), 40–48 (1954). [CrossRef]

**5. **D. R. Myrick, “A generalization of the radial polynomials of F. Zernike,” J. Soc. Ind. Appl. Math. **14**(3), 476–492 (1966). [CrossRef]

**6. **E. C. Kintner, “On the mathematical properties of the Zernike polynomials,” Opt. Acta (Lond.) **23**, 679–680 (1976). [CrossRef]

**7. **R. Barakat, “Optimum balanced wave-front aberrations for radially symmetric amplitude distributions: generalizations of Zernike polynomials,” J. Opt. Soc. Am. **70**(6), 739–742 (1980). [CrossRef]

**8. **C.-W. Chong, P. Raveendran, and R. Mukundan, “A comparative analysis of algorithms for fast computation of Zernike moments,” Pattern Recognit. **36**(3), 731–742 (2003). [CrossRef]

**9. **W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, *Numerical Recipes: The Art of Scientific Computing* (Cambridge University Press, 1992) Section 5.5.

**10. **C. W. Clenshaw, “A Note on the Summation of Chebyshev Series,” Math. Tables Other Aids Comput. **9**, 118–120 (1955), http://www.jstor.org/stable/2002068.

**11. **F. J. Smith, “An Algorithm for Summing Orthogonal Polynomial Series and their Derivatives with Applications to Curve-Fitting and Interpolation,” Math. Comput. **19**(89), 33–36 (1965). [CrossRef]

**12. **H. E. Salzer, “A Recurrence Scheme for Converting from One Orthogonal Expansion into Another,” Commun. ACM **16**(11), 705–707 (1973). [CrossRef]

**13. **E. H. Doha, “On the coefficients of differentiated expansions and derivatives of Jacobi polynomials,” J. Phys. Math. Gen. **35**(15), 3467–3478 (2002). [CrossRef]

**14. **R. Barrio and J. M. Peña, “Numerical evaluation of the p’th derivative of Jacobi series,” Appl. Numer. Math. **43**(4), 335–357 (2002). [CrossRef]

**15. **B. Y. Ting and Y. L. Luke, “Conversion of Polynomials between Different Polynomial Bases,” IMA J. Numer. Anal. **1**(2), 229–234 (1981). [CrossRef]

**16. **K. A. Goldberg and K. Geary, “Wave-front measurement errors from restricted concentric subdomains,” J. Opt. Soc. Am. A **18**(9), 2146–2152 (2001). [CrossRef]

**17. **J. Schwiegerling, “Scaling Zernike expansion coefficients to different pupil sizes,” J. Opt. Soc. Am. A **19**(10), 1937–1945 (2002). [CrossRef]

**18. **C. E. Campbell, “Matrix method to find a new set of Zernike coefficients from an original set when the aperture radius is changed,” J. Opt. Soc. Am. A **20**(2), 209–217 (2003). [CrossRef]

**19. **G. M. Dai, “Scaling Zernike expansion coefficients to smaller pupil sizes: a simpler formula,” J. Opt. Soc. Am. A **23**(3), 539–543 (2006). [CrossRef]

**20. **H. Shu, L. Luo, G. Han, and J.-L. Coatrieux, “General method to derive the relationship between two sets of Zernike coefficients corresponding to different aperture sizes,” J. Opt. Soc. Am. A **23**(8), 1960–1966 (2006). [CrossRef]

**21. **A. J. E. M. Janssen and P. Dirksen, “Concise formula for the Zernike coefficients of scaled pupils,” J. Microlith. Microfab Microsyst. **5**(3), 030501–3 (2006). [CrossRef]

**22. **G. W. Forbes, “Shape specification for axially symmetric optical surfaces,” Opt. Express **15**(8), 5218–5226 (2007), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-15-8-5218. [CrossRef] [PubMed]