In optical scatterometry, the least squares (LSQ) function is usually used as the objective function to quantify the difference between the calculated and measured signatures, which is based on the belief that the actual measurement errors are normally distributed with zero mean. However, in practice the normal distribution assumption of measurement errors is oversimplified since these errors come from the superimposed effect of different error sources. Biased or inaccurate results may be induced when the traditional LSQ function based Gauss-Newton (GN) method is used in optical scatterometry. In this paper, we propose a robust method based on the principle of robust estimation to deal with the abnormal distributed errors. An additional robust regression procedure is used at the end of each iteration of the GN method to obtain the more accurate parameter departure vector. Simulations and experiments have demonstrated the feasibility of our proposed method.
© 2014 Optical Society of America
Optical scatterometry [1–3], also referred to as optical critical dimension (OCD) metrology, has been applied in many scopes with great success, such as the process control for back-end-of-the-line (BEOL) , the in-chip critical dimension (CD) and overlay metrology , and in situ measurement of pattern reflow in nanoimprinting . In general, there are two main procedures in optical scatterometry. The first one involves calculation of the theoretical signature from a diffractive structure using reliable forward-modeling techniques, such as the finite element method (FEM) , the boundary element method (BEM) , the finite-difference time-domain (FDTD) method , and the rigorous coupled-wave analysis (RCWA) [10–12]. Here the general term signature contains the scattered light information of the diffractive structure, which can be in the form of reflectance, ellisometric angles, Stokes elements, and Mueller matrix elements. The second procedure involves the extraction of profile parameters from the measured signature, which is a typical inverse problem with the objective of finding the profile parameters whose calculated signature can best match the measured one.
To solve the inverse problem in optical scatterometry, an objective function, usually the least squares (LSQ) function, is used to quantify the difference between the measured and calculated signatures. To increase the precision and accuracy of the profile parameter extraction, the signature is usually obtained as a vector consisting of several data points at multiple wavelengths or incident angles. Then by using either a library search [13–15] or an iterative optimization such as the Gauss-Newton (GN) method [16, 17] to minimize the objective function, a set of optimal profile parameters corresponding to the best matched calculated signature can be achieved. The general form of the LSQ function contains a set of weight factors representing the reciprocal of variances. The weight factors can be either fixed  or varying , both of which are built on top of the belief that the measurement errors are normally distributed with zero mean. Consequently, the weighted measurement errors, defined as the measurement errors at all the wavelengths or incident angles divided by their corresponding estimated standard deviations, are considered to follow standardized normal distribution. This belief indicates that the weighted measurement errors at all the wavelengths or incident angles fall into the range −3 ~3 with a probability of 99.7%. However, in an actual measurement system, normality is only a myth, as Geary pointed out . In optical scatterometry, the superimposed effect from different error sources such as the power fluctuation of the incident beam , the imperfect modeling , the depolarization effect , and even human operation errors will bias the actual statistical property of measurement error from normal distribution. In addition, considering the inevitably inaccurate estimation of variances of measurement errors, some of the weighted measurement errors might way off from the range −3 ~3, which are called outliers in this paper. Once the outliers exist, the traditional iteration method like GN is willing to distort the calculated signature curve  to reduce the outliers, which in turn enlarges the other weighted measurement errors, and thus leads to a biased or even wrong estimation of profile parameters. Therefore, it is highly desirable to develop a method that can effectively reduce the effect of outliers in the profile parameter estimation.
To reduce the outlier effect, one may consider simply rejecting those wavelengths or incident angles that correspond to the outliers. However, it is difficult or even impossible to sort out the outliers from a measured signature directly. In the present paper, based on the principle of robust statistics [24–27], we introduce an additional regression procedure in each iteration of the LSQ-based GN method. Without modifying the experimental measurement setups, the proposed method achieves more robust estimation of profile parameters.
The remainder of this paper is organized as follows. Section 2 introduces the formulation of the inverse problem in optical scatterometry followed by description of the traditional LSQ-based GN iteration method and our proposed robust method. Section 3 presents the measurement setup, sample description, and numerical and experimental results. Then, we draw some conclusions in Section 4.
2.1 Inverse problem in optical scatterometry
Usually, the inverse problem in optical scatterometry is described as an object to minimize an LSQ function, which can be generally expressed asEq. (1) relates to the commonly used chi-square statistic χ2. The use of χ2 is built on top of the belief that the measurement errors are normally distributed with zero mean, namely, the measured signature y can be expressed as
2.2 Solution to the inverse problem with GN iteration method
In consideration of the highly nonlinear relationship between the n-dimensional vector x and the calculated signature f(x), the inverse problem as shown in Eq. (3) is usually solved iteratively until an optimal resultis reached. If the result of the current ith iteration is represented by x(i), the necessary condition for x(i+1) to minimize F(x) as shown in Eq. (1) is that . The relationship between x(i) and x(i+1) can be expressed as , where Δx(i) is the parameter departure vector. By using the Taylor expansion, we can approximate the gradient in the vicinity of x(i) asEq. (4) is the n × n Hessian matrix at x(i). Since calculation of the Hessian matrix is very time-consuming, it is usually approximated asEq. (5) and Eq. (7) into Eq. (4) and letting Eq. (4) equal zero, we will haveEq. (4) ~Eq. (9) iteratively, we can obtain an optimal result, which is usually attributed to the GN iteration .
2.3 Robust solution to the inverse problem with robust statistics method
Since in Eq. (9) the Jocobian J(x(i)) is only calculated from the forward modeling operator at the current x(i), and w is the known matrix, the estimation of Δx(i) is dominated by Δy(i). Here Δy(i) is the difference between the measured signature and the ith calculated signature of the corresponding profile parameters. If the measurement errors follow the normal distribution, Eq. (9) usually can ensure a good estimation of Δx(i). However, as mentioned in the introduction section, the assumption of normal distribution with zero mean for the measurement error is oversimplified. Actually, the measured signature y should be formulated as21] and the depolarization effect . Because of the existence of μ, the direct use of Eq. (9) may lead to a biased or even inaccurate estimation of Δx(i). As such, it is of great importance to give a robust estimation for the ith parameter departure vector Δx(i).
To achieve a robust estimation of parameter departure vector Δx(i), we first use the Moore-Penrose pseudo-inverse of J(x(i))Tw to transform Eq. (8) intoEq. (11), the estimation of Δx(i) can be treated as a problem to estimate the slope of a hyperline, which can be described as estimating the slope of a hyperline Δx(i) in the n-dimensional space with m given input data pairs (J(x(i))j,). Here is the jth element of Δy(i), and J(x(i))j is the jth row of J(x(i)). Instead of simply using Eq. (9), we can achieve a robust estimation of Δx(i) by solving the following regression problemEq. (9). is the residual term equal to . To solve Eq. (12), the GN iteration could be used. For simplicity, we substitute , , , , and for , Δy(i), Δx*(i), J(x(i)), and respectively. Consequently, the problem described in Eq. (12) can be rewritten as29], we can get another iteration schemeEq. (16) and Eq. (17), this method only requires knowledge of the weight function ω(p). In the present paper, the weight function ω(p) is chosen as the Andrews’s hard redescender ,Eq. (1) in which the weight factors are set as constant during the whole regression. By repeatedly conducting Eq. (13) ~Eq. (17) we can get the estimate, which is more robust than the estimate Δx(i) obtained by using Eq. (9).
In summary, the proposed method introduces an additional “inner” GN iteration in the ith iteration of the “outer” GN iteration using Andrews’s hard redescender. Instead of using Eq. (9) directly, a robust parameter departure vectorcan be achieved. We shall point out that this robust statistics method can be applied not only in scatterometry, but also in the general problem of model fitting in ellipsometry and related thin film measurements, provided that the expression J(x(i))TwJ(x(i)) is nonsingular. This implies that the Jacobian J(x(i)) is of full column rank and thus leads to
3.1 Measurement setup and sample description
A self-developed dual-rotating compensator Mueller matrix ellipsometer (DRC-MME) prototype suitable from ultraviolet to infrared spectrum [31, 32] is used for demonstration. Data analysis is performed using the in-house developed optical modeling software based on rigorous coupled-wave analysis (RCWA) [10–12]. As schematically shown in Fig. 1(a), the system setting of the DRC-MME in order of light propagation is PCr1SCr2A, where P and A stand for the fixed polarizer and analyzer, respectively, Cr1 and Cr2 refer to the first and second frequency-coupled rotating compensators, respectively, and S stands for the sample. With the light source used in this ellipsometer, the available wavelengths are in 200 to 1000 nm range, covering the spectral range of 200 to 800 nm used in this paper. With the dual-rotating compensator setting, we can obtain the full Mueller matrix elements of the sample.
The investigated sample is a one-dimensional (1-D) etched Si grating, whose scanning electron microscope (SEM) cross-section image is shown in Fig. 1(b). The etched Si grating is chosen for this study due to its long-term dimensional stability, high refractive index contrast, and relevance to the semiconductor industry . Optical properties of Si are taken from . As depicted in Fig. 1(b), a cross-section of the Si grating is characterized by a symmetrical trapezoidal model with top critical dimension TCD, bottom critical dimension BCD, grating height Hgt, and period Pitch. Dimensions of the structural parameters that obtained by SEM are TCD = 350 nm, Hgt = 472 nm, and BCD = 383 nm. In the following experiments, profile parameters of the Si grating that need to be extracted include TCD, Hgt, and BCD, while the grating period is fixed at its nominal dimension, i.e., Pitch = 800 nm.
3.2 Numerical results
In this section, we simulate the measurement process of Si grating by firstly calculating its corresponding Mueller matrix. The values of profile parameters TCD, Hgt and BCD of the Si grating are set as 350 nm, 472 nm and 383 nm respectively. The measurement configuration is 45° incident angle and 10° azimuthal angle. Then the simulated normal distribution errors are added into the calculated Mueller matrix to form the “measurement” signature. The standard deviation or noise level of the simulated normal distribution error at a specific wavelength is set as a fraction of root-mean-square (rms) in the Mueller matrix over the full wavelength range of interest. The fractions of the wavelengths differ with each other, but are all within the range of 0 ~5%. To know more about the modeling of a specific normal distribution error, we refer the readers to . Finally, the GN method and our proposed method are used to extract the profile parameters from the “measurement” signature. The values of TCD, Hgt and BCD extracted by our proposed method are 350.4 nm, 471.9 nm and 382.6 nm, respectively, which are in coincidence with the extracted values 350.5 nm, 471.4 nm, and 383.9 nm obtained by the GN method. The above simulation has demonstrated that our proposed method is effective in dealing with the normal distributed errors.
As described in the Introduction section, there are many different types of error sources that make contribution to the measurement errors. Since simulating all these types of error sources is impossible, we add some known simulated error sources to the “measured” signature described in the above paragraph instead. The error sources simulated here include the finite numerical aperture of the DRC-MME, the spectral resolution of the monochromator, and the biases of the estimated standard deviations of measurement errors. The simulation method of the former two error sources were directly taken from , while for the last one the bias of each estimated standard deviation was randomly chosen within the range of −0.003 ~0.003, which ensures the absolute bias is less than 20% of the estimated standard deviation. After injection of the errors above, the LSQ based GN method and the proposed robust method are used to extract the profile parameters from the “measured” signature respectively. The iteration results of TCD, Hgt and BCD for both the two methods are presented in Fig. 2, in which we can obviously find that the proposed robust method converges to the more accurate solution. For more clarity, Fig. 3 displays the weighted “measurement” errors, in which we can find that some data are obviously out of the range −3 ~3.
The above simulation has demonstrated that our proposed method is more robust than the GN method. Moreover, the results of the extracted values of profile parameters under different incident angles (with fixed 10° azimuthal angle) and azimuthal angles (with fixed 45° incident angle) are presented in Fig. 4, in which we can find that nearly all the extracted values achieved by our proposed method are closer to the true values than those obtained by the traditional GN method.
3.3 Experimental results
In this section, we perform experiments to further compare our proposed method with the GN method for extracting profile parameters. The measured Mueller matrix is obtained under an incident angle of 45° and an azimuthal angle of 10°. The extracted profile parameters are presented in Table 1, in which we can observe that the values of TCD, Hgt, and BCD extracted by our proposed method are 3.1 nm, 1.6 nm and 5.3 nm closer to the SEM measurements than those obtained by the GN method.
To explain the differences of profile parameters extracted by our proposed and the GN methods, we shall demonstrate that the measurement errors are not normally distributed with zero mean. Note that in practice to accurately obtain the measurement errors is not possible; instead, we calculate the fitting differences between the measured Mueller matrix and the best calculated Mueller matrix by the GN method. Figure 5 depicts the fitting differences together with their corresponding standard deviations. Each signature data point corresponds to a standard deviation of a measurement error, thus the standard deviations at different wavelengths also differ from each other. We then calculate the weighted fitting differences of the Mueller matrix elements obtained by the GN method, as shown in Fig. 6, where the weighted fitting differences are the fitting differences divided by the standard deviations.
From Fig. 6, we indeed observe many large data points that are out of the range −3 to 3. These outliers have demonstrated that the normal distribution assumption of measurement errors is not valid. Actually, the measurement errors contain not only the random errors, but also the deterministic errors , as shown in Eq. (10). The deterministic error at a specific wavelength has no influence on the standard deviation of the measurement error. However, the deterministic error does have non-ignorable influence on the value of the weighted fitting differences, since the GN method cannot exclude the deterministic error effects. Consequently, the weighted fitting differences may be dragged out of the range −3 ~3 by the deterministic error effects. In consideration of the fact that the deterministic errors differ with each other at different wavelengths, the weighted fitting differences then present the diversity, as can be seen in Fig. 6.
Unfortunately, the true values of the profile parameters are unknown in practice, which increases the difficulty of accuracy estimation. As an alternative, we can indirectly estimate the accuracy by varying the incident and azimuthal angles . From this point of view, we measure the Mueller matrices of the 1-D etched Si grating by fixing the azimuthal angle at 10° while varying the incident angle from 45° ~65° with an increment of 5°. We also perform measurements by fixing the incident angle at 45° while varying the azimuthal angle from 10° ~30° with an increment of 5°. Figure 7 depicts the extracted values of TCD, Hgt and BCD from these two sets of measurements. As shown in Fig. 7(a), at different incident angles the extracted TCDs, Hgts and BCDs obtained by our proposed method are all closer to the SEM measured values compared with those obtained by the GN method. The same trend is also observed in Fig. 7(b) for different azimuthal angles. These experiments and discussions have demonstrated again that the normal distribution assumption of measurement errors is not always valid, and outliers are inevitable. They also have confirmed that our proposed method outperforms the traditional LSQ-based GN method when dealing with the outliers.
In summary, we have numerically and experimentally demonstrated that it is not always valid to assume that the measurement errors in optical scatterometry are normally distributed. In this case, if the objective function is set as the LSQ function, the traditional iterative methods like the GN method leads to a biased estimation of profile parameters. To reduce the effect due to the abnormal distribution, we have proposed a robust parameter estimation method based on the principle of robust statistics, which has been proved to achieve more accuracy than the traditional GN method.
This work was funded by the National Natural Science Foundation of China (Grant Nos. 51475191 and 51405172), the National Instrument Development Specific Project of China (Grant No. 2011YQ160002), and the Program for Changjiang Scholars and Innovative Research Team in University of China. The authors would like to thank the facility support of the Center for Nanoscale Characterization and Devices, Wuhan National Laboratory for Optoelectronics (WNLO) (Wuhan, China).
References and links
1. C. J. Raymond, M. R. Murnane, S. L. Prins, S. S. H. Naqvi, J. W. Hosch, and J. R. McNeil, “Multiparameter grating metrology using optical scatterometry,” J. Vac. Sci. Technol. 15(2), 361–368 (1997). [CrossRef]
2. X. Niu, N. Jakatdar, J. Bao, and C. J. Spanos, “Specular spectroscopic scatterometry,” IEEE Trans. Semicond. Manuf. 14(2), 97–111 (2001). [CrossRef]
3. H. Huang and F. Terry Jr., “Spectroscopic ellipsometry and reflectometry from gratings (scatterometry) for critical dimension measurement and in situ, real-time process monitoring,” Thin Solid Films 455-456, 828–836 (2004). [CrossRef]
4. M. G. Faruk, S. Zangooie, M. Angyal, D. K. Watts, M. Sendelbach, L. Economikos, P. Herrera, and R. Wilkins, “Enabling scatterometry as an in-line measurement technique for 32nm BEOL application,” IEEE Trans. Semicond. Manuf. 24(4), 499–512 (2011). [CrossRef]
5. Y. N. Kim, J. S. Paek, S. Rabello, S. Lee, J. T. Hu, Z. Liu, Y. D. Hao, and W. McGahan, “Device based in-chip critical dimension and overlay metrology,” Opt. Express 17(23), 21336–21343 (2009). [CrossRef] [PubMed]
6. H. J. Patrick, T. A. Gemer, Y. F. Ding, H. W. Ro, L. J. Richter, and C. L. Soles, “Scatterometry for in situ measurement of pattern flow in nanoimprinted polymers,” Appl. Phys. Lett. 93(23), 233105 (2008). [CrossRef]
7. Y. Ohkawa, Y. Tsuji, and M. Koshiba, “Analysis of anisotropic dielectric grating diffraction using the finite-element method,” J. Opt. Soc. Am. A 13(5), 1006–1012 (1996). [CrossRef]
8. Y. Nakata and M. Koshiba, “Boundary-element analysis of plane-wave diffraction from groove-type dielectric and metallic gratings,” J. Opt. Soc. Am. A 7(8), 1494–1502 (1990). [CrossRef]
9. H. Ichikawa, “Electromagnetic analysis of diffraction gratings by the finite-difference time-domain method,” J. Opt. Soc. Am. A 15(1), 152–157 (1998). [CrossRef]
10. M. G. Moharam, E. B. Grann, D. A. Pommet, and T. K. Gaylord, “Formulation for stable and efficient implementation of the rigorous coupled-wave analysis of binary gratings,” J. Opt. Soc. Am. A 12(5), 1068–1076 (1995). [CrossRef]
11. L. Li, “Use of Fourier series in the analysis of discontinuous periodic structures,” J. Opt. Soc. Am. A 13(9), 1870–1876 (1996). [CrossRef]
12. S. Y. Liu, Y. Ma, X. G. Chen, and C. W. Zhang, “Estimation of the convergence order of rigorous coupled-wave analysis for binary gratings in optical critical dimension metrology,” Opt. Eng. 51(8), 081504 (2012). [CrossRef]
13. J. L. Zhu, S. Y. Liu, C. W. Zhang, X. G. Chen, and Z. Q. Dong, “Identification and reconstruction of diffraction structures in optical scatterometry using support vector machine method,” J. Micro/Nanolith. 12(1), 013004 (2013). [CrossRef]
14. X. G. Chen, S. Y. Liu, C. W. Zhang, and H. Jiang, “Improved measurement accuracy in optical scatterometry using correction-based library search,” Appl. Opt. 52(27), 6726–6734 (2013). [CrossRef] [PubMed]
15. X. G. Chen, S. Y. Liu, C. W. Zhang, and J. L. Zhu, “Improved measurement accuracy in optical scatterometry using fitting error interpolation based library search,” Measurement 46(8), 2638–2646 (2013). [CrossRef]
17. C. W. Zhang, S. Y. Liu, T. L. Shi, and Z. R. Tang, “Improved model-based infrared reflectrometry for measuring deep trench structures,” J. Opt. Soc. Am. A 26(11), 2327–2335 (2009). [CrossRef] [PubMed]
18. H. Gross, A. Rathsfeld, F. Scholze, and M. Bar, “Profile reconstruction in extreme ultraviolet (EUV) scatterometry modeling and uncertainty estimates,” Meas. Sci. Technol. 20(10), 105102 (2009). [CrossRef]
19. M. A. Henn, H. Gross, F. Scholze, M. Wurm, C. Elster, and M. Bär, “A maximum likelihood approach to the inverse problem of scatterometry,” Opt. Express 20(12), 12771–12786 (2012). [CrossRef] [PubMed]
21. A. Kato and F. Scholze, “Effect of line roughness on the diffraction intensities in angular resolved scatterometry,” Appl. Opt. 49(31), 6102–6110 (2010). [CrossRef]
22. X. G. Chen, C. W. Zhang, and S. Y. Liu, “Depolarization effects from nanoimprinted grating structures as measured by Mueller matrix polarimetry,” Appl. Phys. Lett. 103(15), 151605 (2013). [CrossRef]
23. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical recipes – the art of science computing, 3rd ed. (Cambridge University, 2007).
24. J. Hald and K. Madsen, “Combined LP and Quasi-Newton methods for nonlinear l1 optimization,” J. Numer. Anal. 22(1), 68–80 (1985). [CrossRef]
25. E. J. Schlossmacher, “An iterative technique for absolute deviations curve fitting,” J. Am. Stat. Assoc. 68(344), 857–859 (1973). [CrossRef]
26. R. A. El-Attar, M. Vidyasagar, and S. R. K. Dutta, “An algorithm for l1-norm minimization with application to nonlinear l1-approximation,” J. Numer. Anal. 16(1), 70–86 (1979). [CrossRef]
27. H. Ekblom and K. Madsen, “Algorithms for non-linear Huber estimation,” BIT Numer. Math. 29(1), 60–76 (1989). [CrossRef]
28. R. C. Aster, B. Borchers, and C. H. Thurber, Parameter Estimation and Inverse Problems, 1st ed. (Academic, 2005).
29. A. E. Beaton and J. W. Tukey, “The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data,” Technometrics 16(2), 147–185 (1974). [CrossRef]
30. D. Andrews, F. Bickel, F. Hampel, P. Huber, W. Rogers, and J. Tukey, Robust Estimation of Location: Survey and Advances, 1st ed. (Princeton University, 1972).
31. S. Y. Liu, X. G. Chen, and C. W. Zhang, “Mueller matrix polarimetry: A powerful tool for nanostructure metrology,” ECS Trans. 60(1), 237–242 (2014). [CrossRef]
32. X. G. Chen, S. Y. Liu, C. W. Zhang, H. Jiang, Z. C. Ma, T. Y. Sun, and Z. M. Xu, “Accurate characterization of nanoimprinted resist patterns using Mueller matrix ellipsometry,” Opt. Express 22(12), 15165–15177 (2014). [CrossRef] [PubMed]
33. X. G. Chen, S. Y. Liu, C. W. Zhang, and H. Jiang, “Measurement configuration optimization for accurate grating reconstruction by Mueller matrix polarimetry,” J. Micro/Nanolith. 12(3), 033013 (2013). [CrossRef]
34. C. M. Herzinger, B. Johs, W. A. McGahan, J. A. Woollam, and W. Paulson, “Ellipsometric determination of optical constants for silicon and thermally grown silicon dioxide via a multi-sample, multi-wavelength, multi-angle investigation,” J. Appl. Phys. 83(6), 3323–3336 (1998). [CrossRef]
35. T. Novikova, A. De Martino, S. Ben Hatit, and B. Drévillon, “Application of Mueller polarimetry in conical diffraction for critical dimension measurements in microelectronics,” Appl. Opt. 45(16), 3688–3697 (2006). [CrossRef] [PubMed]