Interfacial surface roughness determination by coherence scanning interferometry using noise compensation

Hirokazu Yoshino; John Michael Walls; Roger Smith

doi:10.1364/AO.56.004757

1. INTRODUCTION

Surface metrology of form, flatness, roughness, and smoothness are important for quality assurance in many types of manufacturing. It is a particular issue for optical components or optical coatings where control of features in the nanometer or sub-nanometer range is required. Stylus-based surface profilometry is the conventional technique used to provide two-dimensional surface metrology. However, this technique can cause modification of the surface under measurement. Coherence scanning interferometry (CSI), previously referred to as scanning white light interferometry, is a well established, non-contact method that provides true three-dimensional measurements [1]. Optical microscopy provides lateral images without height information. Scattermeters measure the proportion of specular to diffuse reflection to calculate the root mean square roughness [2]. In comparison, CSI measures absolute heights at each pixel in the field of view with sub-nanometer vertical resolution [3]. This allows all the various surface roughness parameters measured using stylus profilometry to be computed using CSI [4].

CSI measures surface topography by locating and connecting peak positions in the interference pattern, referred to as an interferogram, at each pixel over the scanned area to reconstruct the measured surface. One of the prerequisites for the test surface is that the surface should have identical amplitude reflection coefficients over the field of view; otherwise, a phase shift on reflection occurs, which results in an erroneous vertical profile. Even if the refractive index is unchanged over the measurement area, problems can occur in the CSI measurement for transparent/semi-transparent thin films with thicknesses of $< \sim 1.5 μm$ . This is because the interferogram has multiple peaks corresponding to the interfacial surfaces in the thin film assembly. The peaks may be superimposed depending on the thin film structure. In the case where the films are $> \sim 1.5 μm$ , it is possible to detect and separate the peaks to reproduce the interfacial topographies [5–7]. However, this is not the case with thin films of a few hundred nanometers in thickness. A Fourier transform of the interferogram can be performed for investigations in this thickness regime. Subtle changes in phase and amplitude are compared with those synthesized mathematically [8–11].

One of the methods defined in the frequency domain approach uses a theory based on the helical complex field (HCF) function [12,13] and its extensions [14,15]. Interfacial surface roughness (ISR) has been determined using this method by introducing a first-order approximation to the HCF function, which enables fast real-time computation [3,16]. However, the method can introduce spurious surface roughness, as shown in a previous study [15].

This paper presents a methodology to reduce the spurious roughness which can occur with the existing ISR method and improves its computation stability for a wide variety of samples with no additional hardware, no changes in measurement procedure, and little extra computational effort.

2. THEORY

A. Helical Complex Field Function

The determined and synthesized HCF functions, which have a reflection coefficient averaged over the incident angle of the light of amplitude $\bar{r}$ [15], are expressed as follows:

{HCF}^{d} (ν; d) = {\bar{r}}_{ref} (ν) \cdot \frac{F {[I]}_{S B +}}{F {[I_{ref}]}_{S B +}}, {HCF}^{s} (ν, d) = \bar{r} (ν, d) \cdot \exp (j 4 π ν Δ z_{HCF} \cos \bar{θ}),

where the thin film assembly consists of a substrate represented by its subscript sub and the thickness of the

L

film layers expressed by

d = {[d_{sub}, d_{1}, \dots, d_{L}]}^{⊺}

. Note that

d_{sub}

is not used in Eq. (1) but

Δ d_{sub}

is considered in the following discussion. The determined HCF function,

{HCF}^{d}

, is given by the positive sideband of the Fourier transform of the actual interference signal

I

obtained from a test sample divided by that of a known reference material

I_{ref}

. Whereas the synthesized HCF function,

{HCF}^{s}

, is derived from a mathematical model of the test thin film structure [12]. Note that given any signal

s

, the positive sideband of the Fourier transform of the signal is termed

F {[s]}_{S B +}

. The averaged incident angle of the CSI instrument is denoted by

\bar{θ}

and

ν

represents the light frequency. The unknown parameter

Δ z_{HCF}

is associated with the surface height [15].

Assuming that the thin films are completely flat over the field of view, the set of film thicknesses is numerically determined to be $d = \hat{d}$ together with $Δ z_{HCF}$ by minimizing the squared error between the two functions in Eq. (1). Normally the interference signals over $M$ test sample pixels and $M_{ref}$ reference sample pixels, typically a few hundred, are averaged to have each signal with less noise; i.e., $I = 1 / M \sum_{i} I^{i}$ is effectively used in Eq. (1) and so is $I_{ref} = 1 / M_{ref} \sum_{i} I_{ref}^{i}$ .

B. First-Order Approximation to the Synthesized HCF Function

Let $ε_{px}$ be the noise induced in the interference signal at any pixel $I_{px}$ with its interfacial surface perturbation $Δ d$ , then the determined and the synthesized HCF functions are, respectively,

{HCF}_{px}^{d} = {\bar{r}}_{ref} (ν) \cdot \frac{F {[I_{px} + ε_{px}]}_{S B +}}{F {[I_{ref}]}_{S B +}}, {HCF}_{px}^{s} (ν; \hat{d} + Δ d) \approx \bar{{HCF}^{d}} + j 4 π ν \cos \bar{θ} \cdot \bar{{HCF}^{d}} {Δ d_{sub} + \sum_{l = 1}^{L} G_{l} (ν; \hat{d}) Δ d_{l}},

where

G_{l} (ν; \hat{d}) = 1 + \frac{1}{4 π ν \cos \bar{θ}} \frac{\partial χ (\hat{d})}{\partial d_{l}}, \arg (\bar{r}) = χ, \bar{{HCF}^{d}} = {\bar{r}}_{ref} (ν) \cdot \frac{F {[E [I_{px} + ε_{px}]]}_{S B +}}{F {[I_{ref}]}_{S B +}} .

Note that the condition $E [I_{px} + ε_{px}] = E [I_{px}]$ holds. It follows that $\bar{{HCF}^{d}}$ is smooth over the wavelength range of interest, as shown in Fig. 1(a).

Fig. 1. Determined HCF function of a 520 nm ${SiO}_{2}$ thin film on a Si substrate: (a) The global determined HCF function $\bar{{HCF}^{d}}$ , obtained from the full $21 \times 21$ matrix of four pixels; (b) the HCF function ${HCF}_{px}^{d}$ determined from four pixels at the edge of the measurement area; (c) the locally determined HCF function ${HCF}_{px}^{d}$ at the center of the measurement area.

Download Full Size | PDF

For the computations based on Eq. (2), the expressions for the HCF functions need to be re-written in a spectrally discrete manner ( $ν = {[ν_{1}, ν_{2}, \dots, ν_{m}]}^{⊺}$ ) as follows:

{HCF}_{px}^{d} = {[{HCF}_{px}^{d} (ν_{1}), {HCF}_{px}^{d} (ν_{2}), \dots, {HCF}_{px}^{d} (ν_{m})]}^{⊺}, {HCF}_{px}^{s} \approx \bar{{HCF}^{d}} + Diag [\bar{{HCF}^{d}}] G Δ d .

Accordingly the expression is re-written as a linear inverse problem with noise $ϵ_{o}$ in the frequency domain:

{HCF}_{px}^{d} ≃ \bar{{HCF}^{d}} + Diag [\bar{{HCF}^{d}}] G Δ d + ϵ_{o},

where

G = j 4 π \cos \bar{θ} [\begin{matrix} ν_{1} & ν_{1} \cdot G_{1} (ν_{1}) & \dots & ν_{1} \cdot G_{L} (ν_{1}) \\ ν_{2} & ν_{2} \cdot G_{1} (ν_{2}) & ⋱ & ⋮ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ν_{m} & ν_{m} \cdot G_{1} (ν_{m}) & \dots & ν_{m} \cdot G_{L} (ν_{m}) \end{matrix}], Diag [\bar{{HCF}^{d}}] = [\begin{matrix} \bar{{HCF}^{d}} (ν_{1}) & 0 & \dots & 0 \\ 0 & \bar{{HCF}^{d}} (ν_{2}) & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & \bar{{HCF}^{d}} (ν_{m}) \end{matrix}], \bar{{HCF}^{d}} = {[\bar{{HCF}^{d}} (ν_{1}), \bar{{HCF}^{d}} (ν_{2}), \dots, \bar{{HCF}^{d}} (ν_{m})]}^{⊺}, ϵ_{o} = {[ϵ_{o 1}, ϵ_{o 2}, \dots, ϵ_{o m}]}^{⊺} \sim N (ϵ_{o} | 0, σ_{o}^{2} I) .

Note that the noise $ϵ_{o i}$ existing in the frequency domain is assumed to follow a normal distribution $N (ϵ_{o i} | 0, σ_{o}^{2})$ with the mean at zero and variance $σ_{o}$ for $i \in N$ . Given a random vector $a$ , the operators $E [a]$ and $E [(a - E [a]) {(a - E [a])}^{⊺}]$ are understood to be the ensemble and the variance-covariance matrix of the random vector respectively. The variance-covariance matrix of the noise $ϵ_{o}$ is assumed to be $σ_{o}^{2} I$ , where $I$ is the identity matrix.

C. Interfacial Surface Roughness Determination by the Present ISR Method

Using the discrete expressions introduced in the previous section, a merit function $J_{px} = {‖ {HCF}_{px}^{d} - {HCF}_{px}^{s} ‖}^{2}$ is minimized for every pixel with respect to $Δ d$ such that $ϵ_{o} \sim N (ϵ_{o} | 0, σ_{o}^{2} I)$ . The solution $\hat{Δ} d$ is equivalent to that established by the maximum likelihood estimation under the assumption that each element of the noise $ϵ_{o}$ is stochastically independent and has the same variance, i.e., the relation $E [(ϵ_{o} - E [ϵ_{o}]) {(ϵ_{o} - E [ϵ_{o}])}^{⊺}] = σ_{o}^{2} I$ holds. Then, the solution of the linear inverse problem in Eq. (4) is given analytically by [15,17,18]

\hat{Δ} d = {(G^{⊺} G)}^{- 1} G^{T} u,

where

u = {Diag [\bar{{HCF}^{d}}]}^{- 1} [{HCF}_{px}^{d} - \bar{{HCF}^{d}}] .

As expressed in Eq. (5), the vector $u$ is an observed signal while $Δ d$ is an unknown original signal to be estimated. The problem can be re-expressed as

u = G Δ d + ϵ, where ϵ = {Diag [\bar{{HCF}^{d}}]}^{- 1} ϵ_{o},

where the noise

ϵ

and its variance

σ^{2}

have been re-defined for simplification. Accordingly, the problem and the merit function are simplified such that

\min J_{px}^{†} = {‖ u - G Δ d ‖}^{2}

subject to

ϵ \sim N (ϵ | 0, σ^{2} I)

.

D. ISR Methodology with Noise Compensation

Although the existing ISR method [15,16,19] has been used to determine the surface topography of a layer buried under a transparent thin oxide film, the method can induce spurious roughness caused by system and environmental noise in the signal. This is illustrated in Fig. 1 for data taken from a 520 nm ${SiO}_{2}$ thin film. If the optimization process averages over the full scanned area as in Fig. 1(a), the result is a smoothly varying HCF function. However, if the HCF function is determined locally, as shown in Figs. 1(b) and 1(c), there is a different functionality.

Consider that we have a variance-covariance matrix $Σ$ for the noise, then the optimal solution given by the least-squares error method in Eq. (5) is not valid. This is because the probability distribution of an observed signal $u$ now follows $N (u | G Δ d, Σ)$ . It follows that we need to modify the merit function to deal with this different probability distribution, otherwise the least-squares method would lead to an erroneous solution or become unstable. Without the modification, the ISR method measures a higher interfacial surface roughness.

1. ISR with Noise Compensation (ISR-NC) Determination

Let $p (u)$ be the probability density function (PDF) of the observed signal $u$ , then using the assumption that the spectral noise $ϵ$ follows the normal distribution $N (ϵ | 0, Σ)$ , the PDF also is a normal distribution. Therefore, the PDF and its log likelihood function $L (Δ d)$ are expressed as follows [18]:

p (u) = \frac{1}{{(2 π)}^{\frac{m}{2}} {| Σ |}^{\frac{1}{2}}} \exp [- \frac{1}{2} {(u - G Δ d)}^{⊺} Σ^{- 1} (u - G Δ d)], L (Δ d) = \log p (u) = - \frac{1}{2} {(u - G Δ d)}^{⊺} Σ^{- 1} (u - G Δ d) + C,

where

C

is a constant independent of

u

and

Δ d

. Maximization of the log-likelihood function with respect to

Δ d

is equivalent to minimization of the following merit function

J_{px}^{‡}

:

\underset{Δ d}{minimize} J_{px}^{‡} = {(u - G Δ d)}^{⊺} Σ^{- 1} (u - G Δ d), subject to ϵ \sim N (ϵ | 0, Σ) .

As with the existing ISR method, the optimal solution for this linear inverse problem $\hat{Δ} d$ is obtained analytically [18] with the variance-covariance matrix as follows [17,18,20]:

\hat{Δ} d = {(G^{⊺} Σ^{- 1} G)}^{- 1} G^{⊺} Σ^{- 1} u .

In statistical signal processing, this method is often referred to as pre-whitening [18]. One of the benefits of this method is that the multi-correlation (covariance) of the noise is also considered when minimizing the merit function. This means that the ISR with noise compensation (ISR-NC) method puts more importance on the wavelength domains with smaller noise variance when determining an optimal solution. The way in which the variance-covariance matrix is calculated is described in the following section.

2. Determination of the Variance-Covariance Matrix of the Noise

The variance-covariance matrix of the noise $Σ$ is determined from the reference measurement with a known material. First the reference global interferogram $I_{ref}$ is computed by averaging over the interferogram at the $i$ -th pixel $I_{ref}^{i}$ resulting in $\bar{{HCF}_{ref}^{d}}$ . The HCF function at the $i$ -th pixel on the reference sample ${HCF}_{ref}^{d, i}$ is also obtained.

Defining $ϵ_{i} = {HCF}_{ref}^{d, i} - \bar{{HCF}_{ref}^{d}}$ as the noise at the $i$ -th pixel its variance-covariance matrix $Σ_{o}$ can be computed to match the expression in Eq. (9) as shown in Eq. (10). Note that $Σ_{o}$ is determined by computing the sample variance and covariance over an area corresponding to $M_{ref}$ pixels (normally $\sim from 20 \times 20 to 50 \times 50$ ):

Σ_{o} = E [({HCF}_{ref}^{d} - \bar{{HCF}_{ref}^{d}}) {({HCF}_{ref}^{d} - \bar{{HCF}_{ref}^{d}})}^{⊺}] ≃ \frac{1}{M_{ref}} \sum_{i = 1}^{M_{ref}} ({HCF}_{ref}^{d, i} - \bar{{HCF}_{ref}^{d}}) {({HCF}_{ref}^{d, i} - \bar{{HCF}_{ref}^{d}})}^{⊺}, ∴ Σ ≃ Diag {[\bar{{HCF}^{d}}]}^{- 1} Σ_{o} {Diag {[\bar{{HCF}^{d}}]}^{- 1}}^{⊺},

where

\bar{{HCF}_{ref}^{d}} (ν) = {\bar{r}}_{ref} (ν) \cdot \frac{F {[I_{ref}]}_{S B +}}{F {[I_{ref}]}_{S B +}} = {\bar{r}}_{ref} (ν), {HCF}_{ref}^{d, i} (ν) = {\bar{r}}_{ref} (ν) \cdot \frac{F {[I_{ref}^{i}]}_{S B +}}{F {[I_{ref}]}_{S B +}} + ϵ_{i}, I_{ref} ≃ \frac{1}{M_{ref}} \sum_{i = 1}^{M_{ref}} I_{ref}^{i} .

Note that no additional process is required to obtain $Σ$ since the measurement of a reference sample is a prerequisite for the existing HCF-based techniques [13–15].

Figures 2 and 3 show the actual variance-covariance matrix of the noise and its variances (diagonal elements) obtained from a flat silicon surface used as a reference. It is clear that the noise variance is larger as the wavelength approaches its limits for both the real and imaginary parts. As expected, the noise variance is not constant over the spectral range of interest.

Fig. 2. Noise variance-covariance matrix $Σ_{o}$ from a silicon reference sample with $M_{ref} = 21 \times 21 pixels$ (actual CSI measurement): (a) real part, (b) imaginary part (color available online).

Download Full Size | PDF

Fig. 3. Noise variance in the frequency domain (actual CSI measurement). The diagonal element of the (a) real and (b) imaginary parts of the noise variance-covariance matrix illustrated in Fig. 2.

Download Full Size | PDF

3. COMPUTER SIMULATION

In this section a comparison is made between the existing ISR method and the ISR-NC method. Due to the approximations made there could be a discrepancy between the approximated HCF function and the original even if the interference signals are free from noise. Thus, the performance of each method is compared as follows: the ISR method free of noise (ISR-NF) the ISR method with noise (ISR), and the ISR method with noise compensation (ISR-NC).

A. Simulation Setup

For the model, we assume that a nanometer-sized feature is buried under a thin film as shown in Fig. 4. The number of pixels used for the reference measurement $M_{ref}$ is fixed as $32 \times 32 = 1024$ throughout the simulations. White Gaussian noise is added to the interferogram (time domain) and the light intensity is tuned to produce profiles similar to those obtained from the experimental data (Figs. 2 and 3) resulting in the noise variances shown in Fig. 5. Comparing the actual noise variances with those simulated as shown in Figs. 3 and 5, the Gaussian noise used in the simulations is considered to reasonably reproduce the actual noise in the frequency domain. Note that the wavelength range for the model is set from 400 to 730 nm, similar to that used previously [15].

Fig. 4. Schematic drawing of the model. The number of the pixels in the measurement area is $M$ together with the pixels having the feature is $M_{f}$ . Note that the global film thickness $\hat{d} ≃ d$ .

Download Full Size | PDF

Fig. 5. Noise variance in the frequency domain: The diagonal element of the (a) real and (b) imaginary parts of the noise variance-covariance matrix given to the simulations. Note that the signal-to-noise (S/N) ratio is 2000.

Download Full Size | PDF

Table 1 shows the results from all the models tested. To simulate a high-performance instrument having a $4 M$ pixel camera, such as the CCI HD (Taylor Hobson), the signals will be averaged over every four pixels to create an interference signal, so that 1024 signals in each measurement become 256 averaged signals.

Table 1. Simulation Conditions Together with the Number of Pixels Allocated for the “Global” and “Featured” Areas

View Table | View all tables in this article

B. Results and Analysis

Comparisons between the noise-free ISR (ISR-NF), the ISR, and the noise robust ISR (ISR-NC) methods are made by examining the height of the buried substrate and the surface roughness ( $S q$ ) of the top surface and buried layer interface. Figure 6 shows three-dimensional images of the resulting computations using the three methods. The ISR-NC method yields the smoothest surfaces, which is also the case for all the models tested since the method is free from noise.

Fig. 6. Comparisons between the three computational methods on the sample: ${SiO}_{2}$ (thicknes $s = 514 nm$ ) on a Si substrate; the feature height is 5 nm: (a) the ISR method (with noise), (b) the ISR-NC method (with noise), (c) the ISR-NF method (noise free). The S/N ratio is set at $10^{2}$ to correspond to Sim 3-1 in Table 1 (color available online).

Download Full Size | PDF

1. Simulation 1: Performance Sensitivity to Variation in Thin Film Thickness

Thin films, of ${SiO}_{2}$ and ${ZrO}_{2}$ , of varying thickness were investigated while other parameters remained unchanged, as shown in Table 1. The thicknesses in Table 1 correspond to odd integer multiples (3, 5, 7, 9) of the quarter-wavelength optical thickness (QWOT).

Although the feature heights determined by the ISR and ISR-NC methods were very similar, the height variance of the ISR-NC method was always less than that provided by the ISR method.

The surface roughness ( $S q$ ) of the top surface and the substrate determined by the ISR-NC method are smoother than those of the ISR method, as shown in Fig. 7. In addition, one simulation (Sim 1-2 with a 610 nm ${ZrO}_{2}$ film) did not work properly with the ISR method due to an inaccurate approximation. This is discussed later in Section 4.

Fig. 7. Surface roughness ( $S q$ ) as a function of film thickness: Red circles, top ISR; red squares, Sub ISR; blue circles, top ISR-NC; blue squares, sub ISR-NC.

Download Full Size | PDF

2. Simulation 2: The Effect of Feature Height on Performance

The ISR methods, including ISR-NC, use a first-order Taylor expansion to the HCF function to make the problem linear [15,16,19]. This requires that the perturbation of the interfacial surface topography is “small.” In this experiment, the noise is fixed at $S / N = 10^{3}$ and we evaluate the performance of the methods as a function of feature height with the fixed thin film thicknesses of 514 nm for ${SiO}_{2}$ and 339 nm for ${ZrO}_{2}$ .

Figure 8 shows that the ISR methods with or without noise work well up to $\sim 10 nm$ in feature height for the ${SiO}_{2}$ film and up to $\sim 5 nm$ for the ${ZrO}_{2}$ film. The ISR-NC method gives a reasonable approximation up to $\sim 10 nm$ for ${ZrO}_{2}$ film. These results, however, do not necessarily prove the superiority of the ISR-NC method. The root cause of the deterioration in performance, which is basically proportional to the feature height, is the quality of the first-order approximation to the HCF function of interest. The HCF function is poorly approximated in some wavelength regions which have a high noise variation, as shown in Fig. 5.

Fig. 8. Feature height sensitivity as a function of feature height: black circles, ISR-NF; red triangles, ISR; blue squares, ISR-NC.

Download Full Size | PDF

3. Simulation 3: Noise Compensation Performance

In this set of simulations, the performance of each method as a function of the signal-to-noise (S/N) ratio is investigated. The other parameters remain unchanged, as shown in Table 1.

The ISR-NF, ISR, and ISR-NC methods give similar mean feature height values regardless of the noise level for the ${SiO}_{2}$ film but this is not the case for ${ZrO}_{2}$ film. The ISR method determines the height as 4.3 nm compared to the actual value of 5 nm, as shown in Fig. 9(b). As in the previous Section 3.B.2, this is due to a poor first-order approximation of the amplitude reflection coefficient in the smaller variance wavelength regions.

Fig. 9. Signal-to-noise ratio sensitivity to the determined feature height: black circles, ISR-NF; red triangles, ISR; blue squares, ISR-NC.

Download Full Size | PDF

The thin film and substrate surface roughnesses determined by the ISR and ISR-NC methods are proportional to the increase in the S/N ratio, as shown in Fig. 10. However, the level of surface roughness determined by the ISR-NC method is lower for all noise levels.

Fig. 10. Surface roughness ( $S q$ ) as a function of the S/N ratio: red circles, top ISR; red squares, sub ISR; blue circles, top ISR-NC; blue squares, sub ISR-NC.

Download Full Size | PDF

4. Simulation 4: Effect of Substrate Materials

All the variables except for the substrate material are unchanged in the simulations 4-1 and 4-2, as shown in Table 1. The substrate materials used are Si, SiC, BK7 glass, and Ge.

Similar to previous results in Sections 3.B.1–3.B.3, the ISR-NC method resulted in a smaller variance in feature height in both simulations 4-1 and 4-2. The film and substrate surfaces determined by the ISR-NC method are about an order of magnitude smoother than those from the ISR method.

5. Simulation 5: Effect of the Type of Deposited Film

Simulations 5-1 and 5-2 shown in Table 1 investigate the effect of different film materials for 350 nm and 700 nm thickness films. The ISR-NC method again provided more accurate buried surface topographies together with smaller variances. The reconstructed surfaces were about an order of magnitude smoother irrespective of the thin film material using the ISR-NC method.

Table 2 shows the effective QWOT values of the films used in the simulations. Usually films with thickness greater than $QWOT \times 3$ are considered to have enough features in the frequency domain for the HCF theory to work [15]. It follows that there should not be much difference in the simulated performance of the various films unless the first-order approximation to the HCF function is sufficiently accurate. However, the ISR method did not work for the ${Ta}_{2} O_{5}$ film in simulation 5-1 while the ISR-NC method did. This issue will be discussed in the following Section 4.

Table 2. Corresponding Effective Quarter-Wavelength Optical Thickness Values at the Wavelength of 600 nm

View Table | View all tables in this article

4. DISCUSSION

As shown in Sections 3.B.1, 3.B.2 and 3.B.5, the ISR method does not work optimally resulting in erroneous buried feature heights such as those from simulation 2-2 with a feature height $> 10 nm$ , simulation 1-2 with a ${ZrO}_{2}$ 600 nm film, and simulation 5-1 with a ${Ta}_{2} O_{5}$ 350 nm film, as shown in Figs. 8(b) and 11, respectively. All these simulations show that the corresponding surfaces determined by the ISR-NC method are more accurately represented than those using the noise-free ISR method. The root cause of this problem lies in an inaccurate approximation to the HCF function. The lack of accuracy of the HCF function arises when the perturbation (feature height) of the interfacial topography is too large ( $\sim > 10 nm$ ) and when the approximated spectral amplitude reflection coefficient locally deviates from the true value resulting in a spike.

Fig. 11. Erroneous determined feature height (originally set as 5 nm): black circles, ISR-NF; red triangles, ISR; blue squares, ISR-NC.

Download Full Size | PDF

Consider first the simulation 2-2, which has a 20 nm feature height. If we compare the true HCF function with its first-order approximation, then Fig. 12(a) shows that the first-order approximation does not hold, especially in the wavelength region between 400 and 475 nm. Fig. 12(b) shows the difference between the true HCF function ${HCF}_{px}^{d}$ (without noise) and the approximated estimates by each method ${HCF}_{px}^{s}$ (in the presence of noise). Prior knowledge of the noise variance-covariance matrix $Σ$ allows the ISR-NC method to put less importance on the value of the HCF function in the specific wavelength domains where the noise is large, i.e., from $\sim 400$ to $\sim 450 nm$ and from $\sim 700$ to $\sim 730 nm$ , as shown in Fig. 5. This is not the case for the ISR method and is the reason why the ISR-NC method provides more accurate determinations.

Fig. 12. HCF functions generated at the feature pixel (simulation 2-2 with 20 nm feature height): (a) true HCF function (without noise) denoted by “Org” and its first-order approximation by “aprx”; (b) spectral difference between the true HCF function ${HCF}_{px}^{d}$ and the HCF functions produced by each method ${HCF}_{px}^{s}$ (noise occurs in both ISR and ISR-NC, and NF stands for ISR-NF) (color available online).

Download Full Size | PDF

The second cause of inaccurate surface reconstruction observed in simulations 1-2 and 5-1 is due to an inaccurate approximation to the amplitude reflection coefficient. Consider simulation 5-1 using the ${Ta}_{2} O_{5}$ thin film. The first-order approximation to the HCF function is successful, as shown in Fig. 13(a), except for a spike observed at $\sim 440 nm$ wavelength denoted by “aprx”. The solution provided by the ISR method defined in Eq. (5) is influenced by this spike, which reduces the fitting performance as shown in Fig. 13(b). The residual of ${‖ {HCF}_{px}^{s} - {HCF}_{px}^{d} ‖}^{2}$ at 435 nm wavelength is relatively small for the ISR method whereas that given by the ISR-NC method is large. It follows that the ISR-NC method does not attempt to fit the spike feature due to the noise variance-covariance matrix $Σ$ .

Fig. 13. HCF functions generated at the feature pixel (simulation 5-1 with ${Ta}_{2} O_{5}$ film): (a) the true HCF function (without noise) denoted by “Org” and its first-order approximation by “aprx”; (b) the spectral difference between the true HCF function ${HCF}_{px}^{d}$ and the HCF functions produced by each method ${HCF}_{px}^{s}$ (noise exists for ISR and ISR-NC, and NF stands for ISR-NF); (c) the spectral difference between the real and imaginary parts of the true amplitude reflection coefficient and its first-order approximation. Note that the dotted lines (black and pink) represent the maximum deviations of the real ( $Re [r - r_{aprx}]$ ), imaginary ( $Im [r - r_{aprx}]$ ), and the reflectivity $R$ , respectively (color available online).

Download Full Size | PDF

To confirm this further, an improvement in the performance of the ISR method was achieved by reducing the wavelength region used for numerical optimization to avoid the region in which the spikes occur.

To achieve a good fit between the determined and synthesized HCF functions in the frequency domain, there are two options: (1) using the ISR method with wavelength domains having less noise variance, such as from 430 to 700 nm in the examples above, or (2) using the ISR-NC method. The latter option enables the measurement of thinner films to be more stable owing to the wider wavelength domain for curve-fitting, irrespective of the noise characteristics.

5. CONCLUSIONS

Present methods for interfacial surface roughness measurement using CSI can be classified into two types: those that compute surface topographies in the time domain and those that determine surface topographies in the frequency domain. The methods belonging to the first group are used for films over $\sim 1.5 μm$ in thickness whereas those in the second group are able to deal with thin films less than $\sim 1.5 μm$ . The frequency domain methods usually use the least-squares optimization to fit the mathematical model to the measurement signal. However, the basic assumption for the method is that the noise is normally distributed and thus least-squares is not always suitable. In the examples above, the noise variance of the HCF function was not constant over the wavelength domain from 430 to 730 nm. The noise variance is always larger at the ends of the spectral region of interest where the light intensity is low. Therefore the noise variance-covariance matrix should be used in the numerical optimization of the ISR method. Such a matrix is obtained from the measurement of a known flat reference material and will vary depending on the environmental situation and the particular light source used.

Although the ISR method using the HCF function successfully determined the roughness of the thin film top surfaces and buried surfaces [15,16], spurious surface roughness in the determined substrate surfaces could be observed. This paper has presented an effective solution to that problem by introducing the noise variance-covariance matrix, which only involves a small computation when measuring the reference surface. Measurement of the reference surface is required anyway to counteract unknown changes in the phase and amplitude of the light provided by the optical system of the CSI instrument. Using these signals at the same time for the noise analysis is a beneficial side effect.

The reproducibility of the ISR-NC method was better than the existing ISR method for all the computer simulations in the presence of noise for determination of interfacial topography and surface roughness ( $S q$ ). The method was also effective over a wide wavelength range, thus allowing use of more features of the HCF function for the curve-fitting and hence better reproducibility. The noise used in the computer simulations is realistic since the noise variance-covariance matrix obtained from the flat silicon surface in Figs. 2 and 3 is similar to the noise in Fig. 5. Incorporation of noise compensation to the ISR method will improve the measurement accuracy.

Funding

Engineering and Physical Sciences Research Council (EPSRC) (EP/J017361/1, EP/M014797/1).

Acknowledgment

Hirokazu Yoshino is grateful to Taylor Hobson Ltd. for funding a studentship at Loughborough University.

REFERENCES

1. P. de Groot, “Coherence scanning interferometry,” in Optical Measurement of Surface Topography (Springer, 2011), Chap. 9, pp. 187–208.

2. T. R. Thomas, “Other measurement topics,” in Rough Surfaces, 2nd ed. (Imperial College, 1999), Chap. 3, pp. 35–61.

3. H. Yoshino, R. Smith, J. M. Walls, and D. Mansfield, “The development of thin film metrology by coherence scanning interferometry,” Proc. SPIE 9749, 97490P(2016).

4. F. Blateyron, “The areal field parameters,” in Characterisation of Areal Surface Texture (Springer, 2013), pp. 15–43.

5. B. S. Lee and T. C. Strand, “Profilometry with a coherence scanning microscope,” Appl. Opt. 29, 3784–3788 (1990). [CrossRef]

6. P. J. De Groot and X. C. de Lega, “Transparent film profiling and analysis by interference microscopy,” Proc. SPIE 7064, 70640I (2008).

7. A. Bosseboeuf and S. Petitgrand, “Application of microscopic interferometry techniques in the MEMS field,” Proc. SPIE 5145, 1–16 (2003).

8. S. Kim and G. Kim, “Method for measuring a thickness profile and a refractive index using white-light scanning interferometry and recording medium therefor,” U.S. patent 6,545,763 (8 April 2003).

9. P. J. de Groot, “Interferometry method for ellipsometry, reflectometry, and scatterometry measurements, including characterization of thin film structures,” U.S. patent 7,403,289 (22 July 2008).

10. Y. Ghim and S. Kim, “Spectrally resolved white-light interferometry for 3D inspection of a thin-film layer structure,” Appl. Opt. 48, 799–803 (2009). [CrossRef]

11. I. Abdulhalim, “Spectroscopic interference microscopy technique for measurement of layer parameters,” Meas. Sci. Technol. 12, 1996–2001 (2001). [CrossRef]

12. D. Mansfield, “Apparatus for and a method of determining characteristics of thin-layer structures using low-coherence interferometry,” WO patent PCT/GB2005/002,783 (19 January 2006).

13. D. Mansfield, “The distorted helix: thin film extraction from scanning white light interferometry,” Proc. SPIE 6186, 61860O (2006).

14. H. Yoshino, P. M. Kaminski, R. Smith, J. M. Walls, and D. Mansfield, “Refractive index determination by coherence scanning interferometry,” Appl. Opt. 55, 4253–4260 (2016). [CrossRef]

15. H. Yoshino, A. Abbas, P. M. Kaminski, R. Smith, J. M. Walls, and D. Mansfield, “Measurement of thin film interfacial surface roughness by coherence scanning interferometry,” J. Appl. Phys. 121, 105303 (2017). [CrossRef]

16. D. Mansfield, “Extraction of film interface surfaces from scanning white light interferometry,” Proc. SPIE 7101, 71010U (2008).

17. T. Lewis and P. Odell, “A generalization of the Gauss-Markov theorem,” J. Am. Stat. Assoc. 61, 1063–1066 (1966). [CrossRef]

18. K. Sekihara, Introduction to Statistical Signal Processing (Kyouritsu Shuppan, 2011).

19. D. I. Mansfield, “Apparatus for and a method of determining surface characteristics,” U.S. patent 13/352,687 (12 July 2012).

20. A. Albert, “The Gauss-Markov theorem for regression models with possibly singular covariances,” SIAM J. Appl. Math. 24, 182–187 (1973). [CrossRef]

Sim #	Substrate Type	Film Type	Film Thickness (nm)	Feature (nm)	Noise S/N^a	$M$ ^b	$M_{f}$ ^b
1-1	Si	${SiO}_{2}$	309, 514, 720, 925	5	$10^{3}$	$36^{2}$	$8^{2}$
1-2	Si	${ZrO}_{2}$	203, 339, 464, 610	5	$10^{3}$	$36^{2}$	$8^{2}$
2-1	Si	${SiO}_{2}$	514	2.5, 5, 10, 20	$10^{3}$	$36^{2}$	$8^{2}$
2-2	Si	${ZrO}_{2}$	339	2.5, 5, 10, 20	$10^{3}$	$36^{2}$	$8^{2}$
3-1	Si	${SiO}_{2}$	514	5	$10^{4} - 10^{2}$	$36^{2}$	$8^{2}$
3-2	Si	${ZrO}_{2}$	339	5	$10^{4} - 10^{2}$	$36^{2}$	$8^{2}$
4-1	Si, SiC, BK7, Ge	${SiO}_{2}$	514	5	$10^{3}$	$36^{2}$	$8^{2}$
4-2	Si, SiC, BK7, Ge	${ZrO}_{2}$	339	5	$10^{3}$	$36^{2}$	$8^{2}$
5-1	Si	${SiO}_{2}$ , ${ZrO}_{2}$ , ${Ta}_{2} O$ , AZO	350	5	$10^{3}$	$36^{2}$	$8^{2}$
5-2	Si	${SiO}_{2}$ , ${ZrO}_{2}$ , ${Ta}_{2} O$ , AZO	700	5	$10^{3}$	$36^{2}$	$8^{2}$

Film Material	Index of Refraction at 600 nm	Film Thickness
Film Material	Index of Refraction at 600 nm	350 nm	700 nm
${SiO}_{2}$	1.46	3.4	6.8
${ZrO}_{2}$	2.21	5.2	10.3
${Ta}_{2} O_{5}$	2.12	5.0	9.9
AZO	1.83	4.3	8.5

Sim #	Substrate Type	Film Type	Film Thickness (nm)	Feature (nm)	Noise S/N^a	$M$ ^b	$M_{f}$ ^b
1-1	Si	${SiO}_{2}$	309, 514, 720, 925	5	$10^{3}$	$36^{2}$	$8^{2}$
1-2	Si	${ZrO}_{2}$	203, 339, 464, 610	5	$10^{3}$	$36^{2}$	$8^{2}$
2-1	Si	${SiO}_{2}$	514	2.5, 5, 10, 20	$10^{3}$	$36^{2}$	$8^{2}$
2-2	Si	${ZrO}_{2}$	339	2.5, 5, 10, 20	$10^{3}$	$36^{2}$	$8^{2}$
3-1	Si	${SiO}_{2}$	514	5	$10^{4} - 10^{2}$	$36^{2}$	$8^{2}$
3-2	Si	${ZrO}_{2}$	339	5	$10^{4} - 10^{2}$	$36^{2}$	$8^{2}$
4-1	Si, SiC, BK7, Ge	${SiO}_{2}$	514	5	$10^{3}$	$36^{2}$	$8^{2}$
4-2	Si, SiC, BK7, Ge	${ZrO}_{2}$	339	5	$10^{3}$	$36^{2}$	$8^{2}$
5-1	Si	${SiO}_{2}$ , ${ZrO}_{2}$ , ${Ta}_{2} O$ , AZO	350	5	$10^{3}$	$36^{2}$	$8^{2}$
5-2	Si	${SiO}_{2}$ , ${ZrO}_{2}$ , ${Ta}_{2} O$ , AZO	700	5	$10^{3}$	$36^{2}$	$8^{2}$

Film Material	Index of Refraction at 600 nm	Film Thickness
Film Material	Index of Refraction at 600 nm	350 nm	700 nm
${SiO}_{2}$	1.46	3.4	6.8
${ZrO}_{2}$	2.21	5.2	10.3
${Ta}_{2} O_{5}$	2.12	5.0	9.9
AZO	1.83	4.3	8.5

Interfacial surface roughness determination by coherence scanning interferometry using noise compensation

Abstract

1. INTRODUCTION

2. THEORY

A. Helical Complex Field Function

B. First-Order Approximation to the Synthesized HCF Function

C. Interfacial Surface Roughness Determination by the Present ISR Method

D. ISR Methodology with Noise Compensation

1. ISR with Noise Compensation (ISR-NC) Determination

2. Determination of the Variance-Covariance Matrix of the Noise

3. COMPUTER SIMULATION

A. Simulation Setup

B. Results and Analysis

1. Simulation 1: Performance Sensitivity to Variation in Thin Film Thickness

2. Simulation 2: The Effect of Feature Height on Performance

3. Simulation 3: Noise Compensation Performance

4. Simulation 4: Effect of Substrate Materials

5. Simulation 5: Effect of the Type of Deposited Film

4. DISCUSSION

5. CONCLUSIONS

Funding

Acknowledgment

REFERENCES

Cited By

Figures (13)

Tables (2)

Equations (14)

Applied Optics