We present a maximum-likelihood method for parameter estimation in
terahertz time-domain spectroscopy. We derive the likelihood function
for a parameterized frequency response function, given a pair of
time-domain waveforms with known time-dependent noise amplitudes. The
method provides parameter estimates that are superior to other
commonly used methods and provides a reliable measure of the goodness
of fit. We also develop a simple noise model that is parameterized by
three dominant sources and derive the likelihood function for their
amplitudes in terms of a set of repeated waveform measurements. We
demonstrate the method with applications to material
characterization.
1. Introduction
At the heart of most applications of terahertz time-domain spectroscopy
(THz-TDS) is a mathematical procedure that relates raw THz-TDS waveform
measurements to parameters of scientific and technological interest [1–3]. Typically this
analysis is formulated in the frequency domain, since it provides the most
natural description of any linear, time-invariant system of interest. But
since a THz-TDS waveform describes the electric field as a function of
time, it must be Fourier-transformed for use in any
frequency-domain analysis. The Fourier transform scrambles the THz-TDS
measurement noise, which is more naturally represented in the time domain,
and produces artifacts that can degrade the overall measurement quality
and yield misleading results [4–13]. Furthermore, the standard approaches
to parameter estimation in THz-TDS involve the ratio of two noisy
waveforms, which further distorts the noise spectrum and can yield noise
distributions that are far from Gaussian. Other approaches have emerged
that improve on the standard procedures [14–16], but so far all of the existing approaches to THz-TDS
analysis lack clear, model-independent statistical criteria for
establishing parameter confidence intervals or for deciding whether a
given physical model describes the data well or not. Here, we introduce a
general framework to remedy this [17]. We describe a maximum-likelihood approach to THz-TDS analysis
in which both the signal and the noise are treated explicitly in the time
domain, together with a frequency-domain constraint between the input and
the output signal. We show that this approach produces superior results to
existing analysis methods.
2. Signal and noise in THz-TDS
A Monte Carlo simulation of an elementary THz-TDS analysis illustrates the
basic problems that our method solves; it also allows us to develop
notational conventions that we will use to describe our main results.
Figure 1(a) shows an ideal,
noiseless THz-TDS waveform $\mu (t)$ normalized to its peak value ${\mu
_{\textrm{p}}}$ [18], together with a time-domain Gaussian noise amplitude $\sigma _\mu (t)$ given by
(1)$$\sigma_\mu^{2}(t) =
{\sigma_\alpha}^{2} + [{\sigma_\beta}\mu(t)]^{2} +
\left[{\sigma_\tau}\dot{\mu}(t)\right]^{2}$$
with amplitudes ${\sigma _\alpha } =
10^{-4}{\mu _{\textrm{p}}}$, ${\sigma _\beta } =
10^{-2}$, and ${\sigma _\tau } =
1~\textrm{fs}$. Physically, the additive noise term ${\sigma _\alpha
}$ is produced by the detection electronics
(in units of ${\mu
_{\textrm{p}}}$); the multiplicative noise term ${\sigma _\beta }|\mu
(t)|$ is produced by laser power fluctuations,
which modulate the signal strength; and the jitter term ${\sigma _\tau }|\dot {\mu
}(t)| \approx |\mu (t + {\sigma _\tau })-\mu (t)|$ is produced by fluctuations in the
optical path lengths that control the delay between the terahertz pulse
arrival time and the receiver sampling time. In this example the overall
noise is dominated by ${\sigma _\beta }|\mu
(t)|$, which produces the two peaks near $t=2.75~\textrm{ps}$ and $t=3.50~\textrm{ps}$. The contribution from ${\sigma _\tau }|\dot {\mu
}(t)|$ is less evident, except at $t=3.10~\textrm{ps}$, where $\mu (t)$ crosses zero with a nonzero slope. The
contribution from ${\sigma _\alpha
}$ dominates at times when both $\mu (t)$ and $\dot {\mu }(t)$ are small, although in the figure it is
barely discernible from zero. Such structured, signal-dependent,
time-domain noise is common in THz-TDS waveform measurements, and leads to
well-known ambiguities in defining the signal-to-noise ratio and dynamic
range for them [5,6,8].We simulate a single THz-TDS waveform measurement by computing
(2)$$x(t_k) = \mu(t_k) +
\sigma_\mu(t_k)\epsilon(t_{k})$$
at equally spaced times $t_k = kT, k \in \{0,
1,\ldots , N-1\}$, with $N = 256$ and $T =
50~\textrm{fs}$, where each $\epsilon (t_k)$ is an independent random variable with a
standard normal distribution. This sequence has the discrete Fourier
transform (DFT) (3)$$X(\omega_l) =
\tilde{x}(\omega_l) = \tilde{\mu}(\omega_l) +
[\tilde{\sigma}_\mu{\circledast}\tilde{\epsilon}](\omega_l)$$
at the discrete frequencies $\omega _l = 2\pi l/(NT), l
\in \{0, 1, \ldots , N-1\}$, where $\tilde {x}(\omega
_l)$, $\tilde {\sigma }_\mu (\omega
_l)$, and $\tilde {\epsilon }(\omega
_l)$ denote the DFTs of $x(t_k)$, $\sigma _\mu
(t_k)$, and $\epsilon (t_k)$, respectively, and the ${\circledast }$ symbol denotes circular discrete
convolution. Figure 1(b)
shows the power spectral density ${S_{xx}}(\omega _l) =
(T/N)|X(\omega _l)|^{2}$ at the unique frequencies given by $l \leq \operatorname
{floor}(N/2)=\left \lfloor N/2\right \rfloor$ for ten such simulations, where each
spectrum is normalized to its peak value. The signal decreases
exponentially with frequency, falling by a bit more than 40 dB from its
peak power before reaching the noise floor near 5 THz. The red-highlighted
spectrum in Fig. 1(b) shows
that while the noise is relatively flat between 5 THz and 10 THz, it
exhibits oscillatory fluctuations with a period of about 0.75 THz. These
also appear in all of the other spectra in Fig. 1(b), and arise because the convolution $[\tilde {\sigma }_\mu
{\circledast }\tilde {\epsilon }](\omega _l)$ smooths the noise over a frequency scale
given by the inverse width of $\sigma _\mu (t)$. The resulting correlations blur the
distinction between the signal and the noise, and their presence implies
that the true uncertainty in $X(\omega _l)$ is significantly smaller than the noise
floor suggested by ${S_{xx}}(\omega
_l)$, which neglects the information that
noise at one frequency provides about the noise at neighbouring
frequencies.3. Transfer function estimation in THz-TDS
To measure the response of a system requires measurements of two THz-TDS
waveforms: an input, $\mu (t)$, and an output, $\psi (t) = [h{*}\mu
](t)$, where $h(t)$ is the impulse response of the system and
the ${*}$ symbol denotes continuous-time
convolution. Fourier transforming the ideal relationship $\psi (t) = [h{*}\mu
](t)$ converts the time-domain convolution to a
frequency-domain multiplication, $\tilde {\psi
}_{\textrm{c}}(\omega ) = H(\omega )\tilde {\mu }_{\textrm{c}}(\omega
)$, where $\tilde {\mu
}_{\textrm{c}}(\omega )$ and $\tilde {\psi
}_{\textrm{c}}(\omega )$ denote the continuous-time Fourier
transforms of $\mu (t)$ and $\psi (t)$, respectively, and $H(\omega ) = \tilde
{h}_{\textrm{c}}(\omega )$ is the continuous-time transfer function
of the system. Proceeding as we did for a single waveform, we simulate
input and output waveform measurements $x(t_k)$ and $y(t_k)$, respectively, by computing $\mu (t)$ and $\psi (t)$ at the discrete times $t_k$ and adding time-domain noise $\sigma _\mu (t_k)\epsilon
(t_{k})$ and $\sigma _\psi (t_k)\delta
(t_k)$, where each $\epsilon (t_k)$ and $\delta (t_k)$ is an independent random variable with a
standard normal distribution. After applying the discrete Fourier
transform to obtain $X(\omega _l) = \tilde
{x}(\omega _l)$ and $Y(\omega _l) = \tilde
{y}(\omega _l)$, we can determine the empirical
transfer function estimate (ETFE) [19],
(4)$$\hat{H}_{\textrm{ETFE}}(\omega_l) = \frac{Y(\omega_l)}{X(\omega_l)}
= \frac{\tilde{\psi}(\omega_l) +
[\tilde{\sigma}_\psi{\circledast}\tilde{\epsilon}](\omega_l)}{\tilde{\mu}(\omega_l)
+
[\tilde{\sigma}_\mu{\circledast}\tilde{\delta}](\omega_l)},$$
where $\tilde {\mu }(\omega
_l)$ and $\tilde {\psi }(\omega
_l)$ are the DFTs of the ideal input and
output signals, respectively, that we would obtain in the absence of
noise, and $[\tilde {\sigma }_\mu
{\circledast }\tilde {\epsilon }](\omega _l)$ and $[\tilde {\sigma }_\psi
{\circledast }\tilde {\delta }](\omega _l)$ are the corresponding frequency-domain
noise terms. The ETFE is a common starting point for THz-TDS
analysis—it is easy to compute from the raw data, and it provides
an estimate of $H(\omega )$ for any linear, time-invariant system.
Frequently, though, the focus of interest is not $H(\omega )$ itself but a relatively small number of
parameters that specify $H(\omega )$ through a physical model, such as the
thickness and refractive index of a material. In this case, fitting the
model directly to $\hat
{H}_{\textrm{ETFE}}(\omega _l)$ can yield ambiguous results, because $\hat
{H}_{\textrm{ETFE}}(\omega _l)$ is biased and has noise that is both
correlated and non-Gaussian.To illustrate these deficiencies, Fig. 2 shows 250 simulations of $\hat
{H}_{\textrm{ETFE}}(\omega _l)$ with nominally identical input and output
pulses, corresponding to $H(\omega ) = 1$. The red-highlighted curves show
correlations similar to those shown in Fig. 1, but that extend to frequencies where the signal
is much stronger. The average over all simulations reveals the bias in $\hat
{H}_{\textrm{ETFE}}(\omega _l)$: in Fig. 2(a), $\textrm{Re} \{\hat
{H}_{\textrm{ETFE}}\}$ falls from the correct value of 1 to the
biased value of 0 as the frequency increases above 5 THz, where the signal
reaches the noise floor in Fig. 1(b). Also, the noise distribution for $\hat
{H}_{\textrm{ETFE}}(\omega _l)$ develops wide, non-Gaussian tails at
frequencies where the signal is small, because noise fluctuation that
cause $|X(\omega _l)|\rightarrow
0$ become more likely, and these in turn
cause $\hat
{H}_{\textrm{ETFE}}(\omega _l)$ to diverge [20]. If the noise is uncorrelated, such large
fluctuations are only expected when the signal is small. But in the more
typical case that the noise is correlated, they can influence frequencies
well within the primary signal bandwidth, as indicated by the
red-highlighted curves in Figs. 2(c) and 2(d).
Figure 3 shows how these
problems distort standard measures of fit quality. It displays the results
of weighted least-squares fits to the ETFE simulations in
Fig. 2 with the model
(5)$$H_1(\omega;
{\boldsymbol{\theta}}) = \theta_1\exp(i\omega\theta_2),$$
which rescales the input by an amplitude $\theta _1$ and shifts it by a delay $\theta _2$ when we adopt the $\exp (-i\omega
t)$ convention for harmonic time dependence.
For each fit, the estimated parameter vector ${\boldsymbol {\hat {\theta
}}}$ is chosen to minimize (6)$$Q_{\textrm{ETFE}}({\boldsymbol{\theta}}) = \sum_{l =
0}^{N-1}\frac{\left|\hat{H}_{\textrm{ETFE}}(\omega_l) -
H_1(\omega_l;{\boldsymbol{\theta}})\right|^{2}}{\sigma_{\omega_l}^{2}},$$
where each $\sigma _{\omega _l}^{2} =
(\textrm{Var}\{\textrm{Re} [\hat {H}_{\textrm{ETFE}}(\omega _l)]\}
+\textrm{Var}\{\textrm{Im} [\hat {H}_{\textrm{ETFE}}(\omega
_l)]\})$ is determined from the Monte Carlo
samples. These fits return accurate estimates for the model
parameters—over all simulated samples, we obtain $\hat {\theta }_1= 1.000\pm
0.005$ and $\hat {\theta }_2 = (0.0\pm
0.8)~\textrm{fs}$, which are consistent with the true
values, $\theta _1=1$ and $\theta _2=0$, used in the simulation. But the
normalized fit residuals, given by (7)$$r_{\textrm{ETFE}}(\omega_l;
{\boldsymbol{\hat{\theta}}}) = \frac{\hat{H}_{\textrm{ETFE}}(\omega_l)
-
H_1(\omega_l;{\boldsymbol{\hat{\theta}}})}{\sigma_{\omega_l}},$$
show strong deviations from the standard
normal distribution, exhibit structure that could easily be mistaken for
real physical features, and yield a goodness of fit (GOF) statistic $S_{\textrm{ETFE}} =
Q_{\textrm{ETFE}}({\boldsymbol {\hat {\theta }}})$ that looks nothing like the $\chi ^{2}$ distribution that we would normally use
to evaluate the fit quality. Also, the uncertainty estimates, $\hat {\sigma }_{\theta _1} =
0.001$ and $\hat {\sigma }_{\theta _2} =
0.2~\textrm{fs}$, obtained by the usual method of
inverting the curvature matrix of $Q_{\textrm{ETFE}}({\boldsymbol {\theta }})$ around the minimum of each fit,
significantly underestimate the values that are actually observed over the
set of Monte Carlo samples, $\sigma _{\theta _1} =
0.005$ and $\sigma _{\theta _2} =
0.8~\textrm{fs}$. In short, while an ETFE fit with
Eq. (6) may provide
accurate parameter estimates when the underlying model is known in
advance, it offers poor discrimination between good models and bad ones,
and it yields parameter uncertainty estimates that are unrealistically
precise.An alternative approach is to use a least-squares (LS) procedure that
minimizes the sum of the squared differences between the output signal and
the transformed input signal [14–16],
(8)$$Q_{\textrm{LS}}({\boldsymbol{\theta}}) =
\sum_{l=0}^{N-1}\left|Y(\omega_l) -
H(\omega_l;{\boldsymbol{\theta}})X(\omega_l)\right|^{2} =
\sum_{k=0}^{N-1}\left\{y(t_k) - [h({\boldsymbol{\theta}}){\circledast}
x](t_k)\right\}^{2}.$$
Here, $h(t_k; {\boldsymbol {\theta
}})$ is the model impulse response, equal to
the inverse DFT of $H(\omega _l;{\boldsymbol
{\theta }})$, and $[h({\boldsymbol {\theta
}}){\circledast } x](t_k)$ represents the convolution of impulse
responses with the input vector at the time $t_k$. The equivalence between the
frequency-domain sum and the time-domain sum is assured by
Parseval’s theorem. The LS procedure avoids the division by $X(\omega _l)$ that distorts the noise distribution of
the ETFE in the small-signal limit, and the time-domain residuals $r_{\textrm{LS}}(t_k;
{\boldsymbol {\hat {\theta }}}) = y(t_k) - [h({\boldsymbol {\theta
}}){\circledast } x](t_k)$ are a sensitive indicator of fit quality.
But the uniform weighting in Eq. (8) implicitly assumes that this noise is
constant in time, which as we have noted is frequently not the case.In principle, we should be able to account for time-dependent noise by
assigning appropriate weights to the sum, but this is not as simple as it
might seem. The problem is that Eq. (8) treats the input and output noise
asymmetrically, which we can see more clearly if we express it explicitly
in terms of the time-domain signal and noise sequences:
(9)$$Q_{\textrm{LS}}({\boldsymbol{\theta}}) =
\sum_{k=0}^{N-1}\left[\psi(t_k) -
[h({\boldsymbol{\theta}}){\circledast}\mu](t_k) + \sigma_\psi(t_k) -
[h({\boldsymbol{\theta}}){\circledast}(\sigma_\mu\epsilon)](t_k)\right]^{2}.$$
This asymmetry results from the implicit
assumption that all noise is restricted to the output variable in an LS
fit, so that the term $[h({\boldsymbol {\theta
}}){\circledast }(\sigma _\mu \epsilon )](t_k)$ can be ignored. When input noise is
present—as it always is in THz-TDS—the optimal fit weights
will depend not just on the measurement noise sequences but also on $h(t_k; {\boldsymbol {\theta
}})$. Any modification of Eq. (8) with constant weights will
generically bias the parameter estimates toward values that minimize the
input noise contribution, and will cause the GOF statistic, $S_{\textrm{LS}} =
Q_{\textrm{LS}}({\boldsymbol {\hat {\theta }}})$, to have a distribution that also depends
on $h(t_k; {\boldsymbol {\theta
}})$. As we discuss below, these problems can
all be solved with a fit procedure derived from the maximum-likelihood
principle.4. Maximum-likelihood estimation of a parameterized transfer function
model
The likelihood function is equivalent to the probability density for
obtaining the measured data under the assumptions of a given model, and
forms the theoretical basis for the standard least-squares fit procedure.
To define it in the THz-TDS context, we express the measured input and
output signals in vector notation as ${\boldsymbol {x}} = [x(t_0),
x(t_1), \ldots , x(t_{N-1})]^{\mathsf {T}}$ and ${\boldsymbol {y}} = [y(t_0),
y(t_1), \ldots , y(t_{N-1})]^{\mathsf {T}}$, respectively, and assume that they
represent noisy measurements of bandlimited but otherwise unknown ideal
signal vectors ${\boldsymbol {\mu }} = [\mu
(t_0), \mu (t_1), \ldots , \mu (t_{N-1})]^{\mathsf {T}}$ and ${\boldsymbol {\psi }} =
[\psi (t_0), \psi (t_1), \ldots , \psi (t_{N-1})]^{\mathsf
{T}}$ with noise covariance matrices ${\boldsymbol {\Sigma
_x}}$ and ${\boldsymbol {\Sigma
_y}}$, respectively. We further assume that ${\boldsymbol {\mu
}}$ and ${\boldsymbol {\psi
}}$ satisfy the linear constraint ${\boldsymbol {\psi }} =
{\boldsymbol {h}}({\boldsymbol {\theta }}){\boldsymbol {\mu
}}$, where ${\boldsymbol
{h}}$ is a known matrix function of an unknown $p$-dimensional parameter vector ${\boldsymbol {\theta
}}$. The likelihood function is then a
product of two multivariate Gaussian distributions,
(10)$$L({\boldsymbol{\theta}},
{\boldsymbol{\mu}}, {\boldsymbol{\psi}}; {\boldsymbol{x}},
{\boldsymbol{y}}) = \frac{\exp[-\frac{1}{2}({\boldsymbol{x}} -
{\boldsymbol{\mu}})^{\mathsf{T}}\,{\boldsymbol{\Sigma_x}}^{{-}1}({\boldsymbol{x}}-{\boldsymbol{\mu}})]}{\sqrt{(2\pi)^{N}\det({\boldsymbol{\Sigma_x}})}}\frac{\exp[-\frac{1}{2}({\boldsymbol{y}}
-
{\boldsymbol{\psi}})^{\mathsf{T}}\,{\boldsymbol{\Sigma_y}}^{{-}1}({\boldsymbol{y}}-{\boldsymbol{\psi}})]}{\sqrt{(2\pi)^{N}\det({\boldsymbol{\Sigma_y}})}},$$
where the dependence on ${\boldsymbol {\theta
}}$ on the right side of the expression is
implicit in the constraint between ${\boldsymbol {\mu
}}$ and ${\boldsymbol {\psi
}}$. The signal vectors ${\boldsymbol {\mu
}}$ and ${\boldsymbol {\psi
}}$ appear as arguments in the likelihood
function but are otherwise unimportant—the statistical literature
aptly describes these as nuisance parameters, which we
must eliminate to estimate ${\boldsymbol {\theta
}}$, the parameters of interest [21].We can use discrete-time Fourier analysis to obtain explicit expressions
for ${\boldsymbol
{h}}({\boldsymbol {\theta }})$, ${\boldsymbol {\Sigma
_x}}$, and ${\boldsymbol {\Sigma
_y}}$. The time-domain transfer matrix ${\boldsymbol
{h}}({\boldsymbol {\theta }})$ is
(11)$$h_{jk}({\boldsymbol{\theta}}) = \frac{1}{N}\sum_{l ={-}N/2 +
1}^{N/2-1} H(\omega_l; {\boldsymbol{\theta}}) \exp[{-}2\pi i(j -
k)l/N] +
\frac{1}{N}\textrm{Re}[H(\omega_{N/2};{\boldsymbol{\theta}})]\cos[\pi(j-k)],$$
where $\omega _l$ now extends to negative values of $l$, and $N$ is assumed even; for odd $N$, the sum runs from $l = -(N-1)/2$ to $(N-1)/2$ and the anomalous term at the Nyquist
frequency is absent. Similarly, we can define the discrete-time matrix
representation of the time-derivative operator, ${\boldsymbol
{D}}$, as (12)$$D_{jk} = \frac{1}{N}\sum_{l
={-}\lfloor (N-1)/2 \rfloor}^{\lfloor (N-1)/2 \rfloor} -i\omega_l
\exp[{-}2\pi i(j - k)l/N],$$
by recognizing that the transfer function
for the continuous time-derivative operator is $H(\omega ) =
-i\omega$. Note that Eq. (12) is valid for all values of $N$ because $\textrm{Re} [-i\omega
_{N/2}]=0$, so the anomalous term in
Eq. (11) is zero.
Equation (12) allows
us to express the noise variance function $\sigma _\mu
^{2}(t)$ in Eq. (1) as a matrix function in discrete time,
(13)$$\left[{\boldsymbol{V}}({\boldsymbol{\mu}})\right]_{jk} =
\delta_{jk}\left[{\sigma_\alpha}^{2} + {\sigma_\beta}^{2}\mu_{k}^{2} +
{\sigma_\tau}^{2}({\boldsymbol{D}}{\boldsymbol{\mu}})_{k}^{2}\right].$$
The covariance matrices for ${\boldsymbol
{x}}$ and ${\boldsymbol
{y}}$ are then (14)$${\boldsymbol{\Sigma_x}} =
{\boldsymbol{V}}({\boldsymbol{\mu}}), \quad{\boldsymbol{\Sigma_y}} =
{\boldsymbol{V}}({\boldsymbol{\psi}}).$$
The maximum-likelihood (ML) estimate for the model parameters is given by
the vectors ${\boldsymbol {\hat {\mu
}}}$, ${\boldsymbol {\hat {\psi
}}}$, and ${\boldsymbol {\hat {\theta
}}}$ that maximize $L$ subject to the constraint ${\boldsymbol {\psi }} =
{\boldsymbol {h}}({\boldsymbol {\theta }}){\boldsymbol {\mu
}}$. Alternatively, we can minimize the
negative-log likelihood,
(15)$$\begin{aligned} -\ln
L({\boldsymbol{\theta}}, {\boldsymbol{\mu}}, {\boldsymbol{\psi}};
{\boldsymbol{x}}, {\boldsymbol{y}}) &= N\ln(2\pi) +
\frac{1}{2}\ln\{\det[{\boldsymbol{V}}({\boldsymbol{\mu}})]\} +
\frac{1}{2}\ln\{\det[{\boldsymbol{V}}({\boldsymbol{\psi}})]\}\\
&\quad+ \frac{1}{2}\left\{({\boldsymbol{x}} -
{\boldsymbol{\mu}})^{\mathsf{T}}\,\left[{\boldsymbol{V}}({\boldsymbol{\mu}})\right]^{{-}1}({\boldsymbol{x}}
- {\boldsymbol{\mu}}) + ({\boldsymbol{y}} -
{\boldsymbol{\psi}})^{\mathsf{T}}\,\left[{\boldsymbol{V}}({\boldsymbol{\psi}})\right]^{{-}1}({\boldsymbol{y}}
- {\boldsymbol{\psi}})\right\}, \end{aligned}$$
also subject to the constraint, where the
last term now has the familiar least-squares form. The dependence of the
covariance matrices on the signal vectors prevents us from using a
standard least-squares optimization algorithm to minimize
Eq. (15), but to a
very good approximation we can substitute ${\boldsymbol
{V}}({\boldsymbol {\mu }}) \approx {\boldsymbol {V}}({\boldsymbol
{x}})$ and ${\boldsymbol
{V}}({\boldsymbol {\psi }}) \approx {\boldsymbol {V}}({\boldsymbol
{y}})$ to obtain (16)$$\begin{aligned} -\ln
L({\boldsymbol{\theta}}, {\boldsymbol{\mu}}, {\boldsymbol{\psi}};
{\boldsymbol{x}}, {\boldsymbol{y}}) &\approx N\ln(2\pi) +
\frac{1}{2}\ln\{\det[{\boldsymbol{V}}({\boldsymbol{x}})]\} +
\frac{1}{2}\ln\{\det[{\boldsymbol{V}}({\boldsymbol{y}})]\}\\
&\quad+ \frac{1}{2}\left\{({\boldsymbol{x}} -
{\boldsymbol{\mu}})^{\mathsf{T}}\,\left[{\boldsymbol{V}}({\boldsymbol{x}})\right]^{{-}1}({\boldsymbol{x}}
- {\boldsymbol{\mu}}) + ({\boldsymbol{y}} -
{\boldsymbol{\psi}})^{\mathsf{T}}\,\left[{\boldsymbol{V}}({\boldsymbol{y}})\right]^{{-}1}({\boldsymbol{y}}
- {\boldsymbol{\psi}})\right\}. \end{aligned}$$
The first three terms are now all constant,
so we can focus on minimizing the last term, which we multiply by two to
obtain a cost function in what is known as a total least-squares form
[22], (17)$$\tilde{Q}_{\textrm{TLS}}({\boldsymbol{\theta}}, {\boldsymbol{\mu}},
{\boldsymbol{\psi}}) = ({\boldsymbol{x}} -
{\boldsymbol{\mu}})^{\mathsf{T}}\,[{\boldsymbol{V}}({\boldsymbol{x}})]^{{-}1}({\boldsymbol{x}}
- {\boldsymbol{\mu}}) + ({\boldsymbol{y}} -
{\boldsymbol{\psi}})^{\mathsf{T}}\,[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1}({\boldsymbol{y}}
- {\boldsymbol{\psi}}),$$
which is also subject to the constraint ${\boldsymbol {\psi }} =
{\boldsymbol {h}}({\boldsymbol {\theta }}){\boldsymbol {\mu
}}$. Note that the total least-squares cost
function treats the input and output variables symmetrically, unlike the
conventional least-squares cost function in Eq. (8). Geometrically, it can be interpreted as
the sum of the squared distances between each measurement point $(x_k, y_k)$ and its associated model point $(\mu _k, \psi
_k)$, using the metric tensors ${\boldsymbol
{V}}({\boldsymbol {x}})$ and ${\boldsymbol
{V}}({\boldsymbol {y}})$, respectively, for the input and output
variables. As discussed in Sec. 3,
the LS cost function in Eq. (8) focuses entirely on the deviations in the output variable.Introducing an $N$-dimensional vector of Lagrange parameters ${\boldsymbol {\lambda
}}$ to implement the constraint, we can
define the modified cost function,
(18)$$\tilde{\tilde{Q}}_{\textrm{TLS}}({\boldsymbol{\theta}},
{\boldsymbol{\mu}}, {\boldsymbol{\psi}}, {\boldsymbol{\lambda}}) =
{\boldsymbol{\lambda}}^{\mathsf{T}}\left[{\boldsymbol{\psi}} -
{\boldsymbol{h}}({\boldsymbol{\theta}}){\boldsymbol{\mu}}\right] +
\tilde{Q}_{\textrm{TLS}}({\boldsymbol{\mu}}, {\boldsymbol{\psi}},
{\boldsymbol{\theta}}),$$
which we may now minimize with respect to ${\boldsymbol {\mu
}}$, ${\boldsymbol {\psi
}}$, ${\boldsymbol {\theta
}}$, and ${\boldsymbol {\lambda
}}$. The parameters of interest are ${\boldsymbol {\theta
}}$; minimizing with respect to the remaining
parameters gives the equations (19)$$\left.\frac{\partial
\tilde{\tilde{Q}}_{\textrm{TLS}}}{\partial
\lambda_m}\right|_{{\boldsymbol{\hat{\mu}}},{\boldsymbol{\hat{\psi}}},{\boldsymbol{\hat{\lambda}}},{\boldsymbol{\theta}}}
= {\hat{\psi}}_m - \sum_{l=1}^{N}
h_{ml}({\boldsymbol{\theta}}){\hat{\mu}}_l = 0,$$
(20)$$\left.\frac{\partial
\tilde{\tilde{Q}}_{\textrm{TLS}}}{\partial
\psi_m}\right|_{{\boldsymbol{\hat{\mu}}},{\boldsymbol{\hat{\psi}}},{\boldsymbol{\hat{\lambda}}},{\boldsymbol{\theta}}}
= {\hat{\lambda}}_m -
2\sum_{l=1}^{N}\left\{[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1}\right\}_{ml}(y_l-{\hat{\psi}}_l)
= 0,$$
(21)$$\left.\frac{\partial
\tilde{\tilde{Q}}_{\textrm{TLS}}}{\partial
\mu_m}\right|_{{\boldsymbol{\hat{\mu}}},{\boldsymbol{\hat{\psi}}},{\boldsymbol{\hat{\lambda}}},{\boldsymbol{\theta}}}
={-} \sum_{l=1}^{N}{\hat{\lambda}}_lh_{lm}({\boldsymbol{\theta}}) -
2\sum_{l=1}^{N}\left\{[{\boldsymbol{V}}({\boldsymbol{x}})]^{{-}1}\right\}_{ml}(x_l-{\hat{\mu}}_l)
= 0,$$
which have solutions (22)$${\boldsymbol{\hat{\psi}}}_{{\boldsymbol{\theta}}} =
{\boldsymbol{h}}({\boldsymbol{\theta}}){\boldsymbol{\hat{\mu}}}_{{\boldsymbol{\theta}}},$$
(23)$${\boldsymbol{\hat{\lambda}}}_{{\boldsymbol{\theta}}} =
2[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1}({\boldsymbol{y}} -
{\boldsymbol{\hat{\psi}}}_{{\boldsymbol{\theta}}}),$$
(24)$${\boldsymbol{\hat{\mu}}}_{{\boldsymbol{\theta}}} =
\left\{\boldsymbol{1} +
{\boldsymbol{V}}({\boldsymbol{x}})\left[{\boldsymbol{h}}({\boldsymbol{\theta}})\right]^{\mathsf{T}}[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1}{\boldsymbol{h}}({\boldsymbol{\theta}})\right\}^{{-}1}\left\{{\boldsymbol{x}}
+
{\boldsymbol{V}}({\boldsymbol{x}})\left[{\boldsymbol{h}}({\boldsymbol{\theta}})\right]^{\mathsf{T}}[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1}{\boldsymbol{y}}\right\}.$$
Evaluating $\tilde {\tilde
{Q}}_{\textrm{TLS}}$ at ${\boldsymbol {\mu }} =
{\boldsymbol {\hat {\mu }}}_{{\boldsymbol {\theta }}}$, ${\boldsymbol {\psi }} =
{\boldsymbol {\hat {\psi }}}_{{\boldsymbol {\theta }}}$, and ${\boldsymbol {\lambda }} =
{\boldsymbol {\hat {\lambda }}}_{{\boldsymbol {\theta }}}$ yields a simplified cost function that
involves only the parameter vector ${\boldsymbol {\theta
}}$ and the data vectors ${\boldsymbol
{x}}$ and ${\boldsymbol
{y}}$, (25)$$Q_{\textrm{TLS}}({\boldsymbol{\theta}}) = ({\boldsymbol{x}} -
{\boldsymbol{\hat{\mu}}}_{{\boldsymbol{\theta}}})^{\mathsf{T}}\,[{\boldsymbol{V}}({\boldsymbol{x}})]^{{-}1}({\boldsymbol{x}}
- {\boldsymbol{\hat{\mu}}}_{{\boldsymbol{\theta}}}) +
({\boldsymbol{y}} -
{\boldsymbol{\hat{\psi}}}_{{\boldsymbol{\theta}}})^{\mathsf{T}}\,[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1}({\boldsymbol{y}}
- {\boldsymbol{\hat{\psi}}}_{{\boldsymbol{\theta}}}).$$
We can simplify these expressions further by defining
(26)$${\boldsymbol{U}}({\boldsymbol{x}}, {\boldsymbol{\theta}}) =
{\boldsymbol{h}}({\boldsymbol{\theta}}){\boldsymbol{V}}({\boldsymbol{x}})\left[{\boldsymbol{h}}({\boldsymbol{\theta}})\right]^{\mathsf{T}}.$$
Substituting Eq. (26) in Eqs. (24) and (22) reveals that ${\boldsymbol {\hat {\psi
}}}_{{\boldsymbol {\theta }}}$ is just the weighted average of ${\boldsymbol
{y}}$ and ${\boldsymbol
{h}}({\boldsymbol {\theta }}){\boldsymbol {x}}$, (27)$${\boldsymbol{\hat{\psi}}}_{{\boldsymbol{\theta}}} =
\left\{[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1} +
[{\boldsymbol{U}}({\boldsymbol{x}},{\boldsymbol{\theta}})]^{{-}1}\right\}^{{-}1}\left\{[{\boldsymbol{V}}({\boldsymbol{y}})]^{{-}1}{\boldsymbol{y}}
+
[{\boldsymbol{U}}({\boldsymbol{x}},{\boldsymbol{\theta}})]^{{-}1}[{\boldsymbol{h}}({\boldsymbol{\theta}}){\boldsymbol{x}}]\right\},$$
where the matrices ${\boldsymbol
{U}}({\boldsymbol {x}},{\boldsymbol {\theta }})$ and ${\boldsymbol
{V}}({\boldsymbol {y}})$ are approximately equal to the true (but
unknown) covariance matrices $\operatorname
{Cov}[{\boldsymbol {h}}({\boldsymbol {\theta }}){\boldsymbol {x}}] =
{\boldsymbol {U}}({\boldsymbol {\mu }},{\boldsymbol {\theta
}})$ and $\operatorname
{Cov}({\boldsymbol {y}}) = {\boldsymbol {V}}({\boldsymbol {\psi
}})$, respectively. We emphasize here that
although ${\boldsymbol
{h}}({\boldsymbol {\theta }}){\boldsymbol {\mu }} = {\boldsymbol {\psi
}}$, the covariance matrices $\operatorname
{Cov}[{\boldsymbol {h}}({\boldsymbol {\theta }}){\boldsymbol
{x}}]$ and $\operatorname
{Cov}({\boldsymbol {y}})$ are generally different, since ${\boldsymbol
{h}}({\boldsymbol {\theta }})$ transforms the noise in ${\boldsymbol
{x}}$ as well as the signal. For example, if ${\boldsymbol {h}}(A) =
A{\boldsymbol {I}}, A\neq 1,$ and the noise is purely additive, we have $\operatorname
{Cov}({\boldsymbol {y}}) = {\sigma _\alpha }^{2}{\boldsymbol {I}}\neq
\operatorname {Cov}[{\boldsymbol {h}}({\boldsymbol {\theta
}}){\boldsymbol {x}}] = A^{2}{\sigma _\alpha }^{2}{\boldsymbol
{I}}$.Substituting Eq. (26)
in Eq. (25) yields
(28)$$Q_{\textrm{TLS}}({\boldsymbol{\theta}}) = \left[{\boldsymbol{y}} -
{\boldsymbol{h}}({\boldsymbol{\theta}}){\boldsymbol{x}}\right]^{\mathsf{T}}\left[{\boldsymbol{V}}({\boldsymbol{y}})
+ {\boldsymbol{U}}({\boldsymbol{x}},
{\boldsymbol{\theta}})\right]^{{-}1}\left[{\boldsymbol{y}} -
{\boldsymbol{h}}({\boldsymbol{\theta}}){\boldsymbol{x}}\right].$$
Like the simpler LS cost function in
Eq. (8), the TLS cost
function in Eq. (28)
is expressed in terms of the deviations ${\boldsymbol {y}} -
{\boldsymbol {h}}({\boldsymbol {\theta }}){\boldsymbol
{x}}$, but is now properly normalized with
respect to the associated covariance matrix, ${\boldsymbol
{V}}({\boldsymbol {y}}) + {\boldsymbol {U}}({\boldsymbol {x}},
{\boldsymbol {\theta }})$. Introducing the matrix square root, $\boldsymbol {A}^{1/2} =
\boldsymbol {X}\leftrightarrow \boldsymbol {A} = \boldsymbol
{X}^{2}$, we may define the normalized residual
vector as (29)$${\boldsymbol{r}}_{\textrm{TLS}}({\boldsymbol{\theta}}) =
\left[{\boldsymbol{V}}({\boldsymbol{y}}) +
{\boldsymbol{U}}({\boldsymbol{x}},
{\boldsymbol{\theta}})\right]^{{-}1/2}\left[{\boldsymbol{y}} -
{\boldsymbol{h}}({\boldsymbol{\theta}}){\boldsymbol{x}}\right],$$
which allows us to express the TLS cost
function in the compact form, (30)$$Q_{\textrm{TLS}}({\boldsymbol{\theta}}) =
[{\boldsymbol{r}}_{\textrm{TLS}}({\boldsymbol{\theta}})]^{\mathsf{T}}{\boldsymbol{r}}_{\textrm{TLS}}({\boldsymbol{\theta}}).$$
The TLS estimate for the parameter vector, ${\boldsymbol {\hat {\theta
}}}$, may now be determined by minimizing $Q_{\textrm{TLS}}({\boldsymbol {\theta }})$ using a standard least-squares
optimization algorithm.Figure 4 shows fit results
for the same model and data shown in Fig. 3, but obtained by minimizing $Q_{\textrm{TLS}}({\boldsymbol {\theta }})$ in Eq. (30) instead of $Q_{\textrm{ETFE}}({\boldsymbol {\theta }})$ in Eq. (6). The statistical properties obtained
with $Q_{\textrm{TLS}}({\boldsymbol {\theta }})$ are clearly superior to those with $Q_{\textrm{ETFE}}({\boldsymbol {\theta }})$. The GOF statistic, $S_{\textrm{TLS}} =
Q_{\textrm{TLS}}({\boldsymbol {\hat {\theta }}})$, approximates a $\chi ^{2}$ distribution with $\nu = N-p$ degrees of freedom, as expected. The
normalized residual vector ${\boldsymbol
{r}}_{\textrm{TLS}}({\boldsymbol {\hat {\theta }}})$, shown for one of the fits, exhibits a
standard normal distribution with no discernible correlations among
neighboring time points. The distribution for $S_{\textrm{TLS}}$ is more than 7 times narrower than the
distribution for $S_{\textrm{ETFE}}$, making it that much more sensitive to a
poor fit. Similarly, the lack of structure in ${\boldsymbol
{r}}_{\textrm{TLS}}({\boldsymbol {\hat {\theta }}})$ makes it more useful for residual
analysis than $r_{\textrm{ETFE}}(\omega _l;
{\boldsymbol {\hat {\theta }}})$, since structure associated with poor
fits may be discerned more readily. Finally, unlike the ETFE fits, the TLS
fits yield estimates for the parameter uncertainties that agree with the
values observed over the set of Monte Carlo samples, $\sigma _{\theta _1} =
0.002$ and $\sigma _{\theta _2} =
0.4~\textrm{fs}$, which are both about a factor of 2
smaller than the values observed in the ETFE parameter estimates. In
summary, when compared with the standard ETFE fit procedure, the TLS fit
procedure offers better discrimination between good models and bad ones,
more precise values for the parameters, and more accurate estimates of the
parameter uncertainties.
5. Maximum-likelihood estimation of the noise model
We have assumed so far that the noise amplitudes $\sigma _\alpha , \sigma
_\beta ,$ and $\sigma _\tau$ in Eq. (13) are known, but in general they must
also be determined experimentally. One way to do this is to measure the
noise at three different points on the THz waveform: as we saw in
Fig. 1, the $\sigma _\beta$ term dominates near the peak, the $\sigma _\tau$ term dominates where the signal crosses
zero with a large slope, and the $\sigma _\alpha$ term dominates where both the signal and
its slope are small. Another approach is to determine the variance as a
function of time from a set of repeated waveform measurements [5], then obtain the noise parameters from
a fit with Eq. (1).
But both of these methods are susceptible to systematic errors from drift,
which produces excess signal variability over the timescale of multiple
measurements [8,9,23]. In this section, we describe a likelihood method for
estimating the noise parameters that accounts for this drift.
We consider repeated measurements of an ideal primary waveform, $\mu (t)$, which has an amplitude and a delay that
drift on a timescale longer than the waveform acquisition time. We can
then associate each measurement with an ideal secondary waveform,
(31)$$\zeta(t; A_l, \eta_l) =
A_l\mu(t - \eta_l),$$
where $A_l$ is the relative amplitude, $\eta _l$ is the delay, and $l \in \{0, 1, \ldots ,
M-1\}$. We also set $A_0 = 1$ and $\eta _0 = 0$, so that $\zeta (t; A_0, \eta _0) =
\mu (t)$. As before, we sample these waveforms at
the nominal times $t_k = kT, k \in \{0,
1,\ldots , N-1\}$ to obtain the ideal primary waveform
vector ${\boldsymbol {\mu }} = [\mu
_0, \mu _1, \ldots , \mu _{N-1}]^{\mathsf {T}}$ and $M$ measurement vectors ${\boldsymbol {x}}_l =
[x_l(t_0), x_l(t_1), \ldots , x_l(t_{N-1})]^{\mathsf {T}}$, which we can arrange in an $N\times M$ matrix ${\boldsymbol {X}} =
[{\boldsymbol {x}}_0, {\boldsymbol {x}}_1, \ldots , {\boldsymbol
{x}}_{N-1}]$. We also write the amplitudes and delays
in vector form, ${\boldsymbol {A}} = [A_0,
A_1, \ldots , A_{M-1}]^{\mathsf {T}}$ and ${\boldsymbol {\eta }} =
[\eta _0, \eta _1, \ldots , \eta _{M-1}]^{\mathsf {T}}$, respectively.With these definitions, we can express the sampled secondary waveforms in
vector form,
(32)$${\boldsymbol{\zeta}}_l =
{\boldsymbol{\zeta}}({\boldsymbol{\mu}}, A_l, \eta_l) =
A_l{\boldsymbol{S}}(\eta_l){\boldsymbol{\mu}},$$
where the matrix ${\boldsymbol {S}}(\eta
_l)$ is defined to impart a delay by $\eta _l$. Using Eq. (11) with the transfer function $H(\omega ; \eta ) = \exp
(i\omega \eta )$, we can write this matrix explicitly as
(33)$$S_{jk}(\eta) =
\frac{1}{N}\sum_{l ={-}N/2+1}^{N/2-1} \exp(i\omega_l\eta) \exp[{-}2\pi
i(j - k)l/N] +
\frac{1}{N}\cos(\omega_{N/2}\eta)\cos[\pi(j-k)]$$
for even $N$, with changes for odd $N$ as described for Eq. (11). Following Eqs. (13) and (14), and arranging the secondary waveform
vectors in an $N\times M$ matrix ${\boldsymbol {Z}} =
[{\boldsymbol {\zeta }}_0, {\boldsymbol {\zeta }}_1,\ldots
,{\boldsymbol {\zeta }}_{M-1}]$, we also have (34)$$[\operatorname{Cov}({\boldsymbol{x}}_l)]_{jk} =
[{\boldsymbol{V}}({\boldsymbol{\zeta}}_l; {\sigma_\alpha},
{\sigma_\beta}, {\sigma_\tau})]_{jk} =
\delta_{jk}\left[{\sigma_\alpha}^{2} + {\sigma_\beta}^{2} Z_{kl}^{2} +
{\sigma_\tau}^{2}({\boldsymbol{D}}{\boldsymbol{Z}})_{kl}^{2}\right],$$
where ${\boldsymbol
{D}}$ is defined in Eq. (12), ${\boldsymbol {\zeta
}}_l$ and ${\boldsymbol
{Z}}$ depend implicitly on ${\boldsymbol
{A}}$ and ${\boldsymbol {\eta
}}$, and the dependence of ${\boldsymbol
{V}}$ on the noise amplitudes is now expressed
explicitly.The likelihood function for observing ${\boldsymbol
{X}}$ under these assumptions is
(35)$$L(\sigma_\alpha,\sigma_\beta,\sigma_\tau,{\boldsymbol{\mu}},{\boldsymbol{A}},{\boldsymbol{\eta}};{\boldsymbol{X}})
= \prod_{l=0}^{M-1} \frac{\exp\left\{-\frac{1}{2}({\boldsymbol{x}}_l -
{\boldsymbol{\zeta}}_l)^{\mathsf{T}}[{\boldsymbol{V}}({\boldsymbol{\zeta}}_l;
{\sigma_\alpha}, {\sigma_\beta},
{\sigma_\tau})]^{{-}1}({\boldsymbol{x}}_l -
{\boldsymbol{\zeta}}_l)\right\}}{\sqrt{(2\pi)^{N}\det[{\boldsymbol{V}}({\boldsymbol{\zeta}}_l;
{\sigma_\alpha}, {\sigma_\beta}, {\sigma_\tau})]}},$$
in which the noise amplitudes ${\sigma _\alpha
}$, ${\sigma _\beta }$ and ${\sigma _\tau }$ are the parameters of interest and the
signal vectors ${\boldsymbol {\mu
}}$, ${\boldsymbol
{A}}$ and ${\boldsymbol {\eta
}}$ are nuisance parameters. As with
Eq. (10), it is more
convenient computationally to work with the negative-log likelihood,
(36)$$\begin{aligned}
-\ln[L(\sigma_\alpha,\sigma_\beta,\sigma_\tau,{\boldsymbol{\mu}},{\boldsymbol{A}},{\boldsymbol{\eta}};{\boldsymbol{X}})]
&= \frac{MN}{2}\ln(2\pi) + \frac{1}{2}\sum_{l=0}^{M-1}
\ln\left\{\det[{\boldsymbol{V}}({\boldsymbol{\zeta}}_l;
{\sigma_\alpha}, {\sigma_\beta}, {\sigma_\tau})]\right\}\\ &+
\frac{1}{2}\sum_{l=0}^{M-1} \left\{({\boldsymbol{x}}_l -
{\boldsymbol{\zeta}}_l)^{\mathsf{T}}[{\boldsymbol{V}}({\boldsymbol{\zeta}}_l;
{\sigma_\alpha}, {\sigma_\beta},
{\sigma_\tau})]^{{-}1}({\boldsymbol{x}}_l -
{\boldsymbol{\zeta}}_l)\right\}. \end{aligned}$$
Ignoring the constant first term,
multiplying the remaining terms by 2, and expressing the matrix elements
explicitly, we can define the ML cost function, (37)$$\begin{aligned}
Q_{\textrm{ML}}(\sigma_\alpha,\sigma_\beta,\sigma_\tau,{\boldsymbol{\mu}},{\boldsymbol{A}},{\boldsymbol{\eta}};{\boldsymbol{X}})
= \sum_{k=0}^{N-1}\sum_{l=0}^{M-1}&
\left\{\ln\left[{\sigma_\alpha}^{2} + {\sigma_\beta}^{2} Z_{kl}^{2} +
{\sigma_\tau}^{2}({\boldsymbol{D}}{\boldsymbol{Z}})_{kl}^{2}\right]\right.\\
&\quad + \left.\frac{(X_{kl} - Z_{kl})^{2}}{{\sigma_\alpha}^{2} +
{\sigma_\beta}^{2} Z_{kl}^{2} +
{\sigma_\tau}^{2}({\boldsymbol{D}}{\boldsymbol{Z}})_{kl}^{2}}\right\},
\end{aligned}$$
which we minimize to obtain ML estimates
for all of the free parameters in the model.The resulting estimates for the noise parameters (${\sigma _\alpha
}$, ${\sigma _\beta }$, and ${\sigma _\tau }$) are biased below their true values,
which we can attribute to the presence of the nuisance parameters (${\boldsymbol {\mu
}}$, ${\boldsymbol
{A}}$, and ${\boldsymbol {\eta
}}$) [21]. For example, in 1000 Monte Carlo simulations of $M=10$ repeated measurements using the waveforms
described in Sec. 2, the ratios of
the ML estimates to their true values are ${\hat {\sigma }_\alpha
}/{\sigma _\alpha } = 0.95 \pm 0.02$, ${\hat {\sigma }_\beta
}/{\sigma _\beta } = 0.94 \pm 0.03$, and ${\hat {\sigma }_\tau
}/{\sigma _\tau } = 0.89 \pm 0.09$, all significantly below unity.
Increasing the number of observations to $M=50$ reduces the bias to ${\hat {\sigma }_\alpha
}/{\sigma _\alpha } = 0.990 \pm 0.007$, ${\hat {\sigma }_\beta
}/{\sigma _\beta } = 0.98 \pm 0.01$, and ${\hat {\sigma }_\tau
}/{\sigma _\tau } = 0.93 \pm 0.04$, but some bias remains in ${\hat {\sigma }_\beta
}$ and ${\hat {\sigma }_\tau
}$ even in the limit $M\rightarrow
\infty$. This residual bias arises because the
number of elements in ${\boldsymbol
{A}}$ and ${\boldsymbol {\eta
}}$ both grow with the number of
observations, which allows us to account for drift but dilutes some of the
information that the data provide about the noise [21,24]. In
principle, we can resolve this problem by integrating out all of the
nuisance parameters in Eq. (35) to obtain a marginal likelihood that depends only on ${\sigma _\alpha
}$, ${\sigma _\beta }$, and ${\sigma _\tau }$ [21], but this involves computationally expensive integrations for
a relatively small benefit. As we discuss below, we can remove much of the
bias by simply rescaling the noise parameters by a suitable correction
factor.
To determine the bias correction factor it is helpful to consider a
simplified example in which there is no drift and only additive noise, so
that $A_l = 1$ and $\eta _l = 0$ for all $l$ and ${\sigma _\beta } = {\sigma
_\tau } = 0$. The ML cost function then reduces to
(38)$$\tilde{Q}_{\textrm{ML}}(\sigma_\alpha,{\boldsymbol{\mu}};{\boldsymbol{X}})
= \sum_{k=0}^{N-1}\sum_{l=0}^{M-1} \left[\ln{\sigma_\alpha}^{2} +
\frac{(X_{kl} -
\mu_{k})^{2}}{{\sigma_\alpha}^{2}}\right],$$
which is minimized by (39)$${\hat{\mu}}_k =
\frac{1}{N}\sum_{l = 0}^{M-1} X_{kl} = \bar{x}_k, \qquad
{\hat{\sigma}_\alpha}^{2} = \frac{1}{N}\sum_{k = 0}^{N-1}s_M^{2}(t_k)
= \overline{s_M^{2}},$$
where (40)$$s_M^{2}(t_k) =
\frac{1}{M}\sum_{l=0}^{M-1}(X_{kl} - \bar{x}_k)^{2}.$$
This result for ${\hat {\sigma }_\alpha
}^{2}$ is biased because it is the waveform
average of $s_M^{2}(t_k)$, which in turn is just the biased sample
variance of the measurements at $t_k$. To remove this bias we can apply
Bessel’s correction, which replaces the factor of $1/M$ with $1/(M-1)$ in Eq. (40). Alternatively, we can multiply ${\hat {\sigma }_\alpha
}^{2}$ by $M/(M-1)$ in Eq. (39). Returning to Eq. (37) and following similar
reasoning, we can justify rescaling all of the ML noise parameter
estimates by the same factor, (41)$${\hat{\sigma}_\alpha}^{*} =
{\hat{\sigma}_\alpha}\sqrt{\frac{M}{M-1}}, \qquad
{\hat{\sigma}_\beta}^{*} = {\hat{\sigma}_\beta}\sqrt{\frac{M}{M-1}},
\qquad {\hat{\sigma}_\tau}^{*} =
{\hat{\sigma}_\tau}\sqrt{\frac{M}{M-1}}.$$
With these corrections, the Monte Carlo
simulations with $M=10$ yield ${\hat {\sigma }_\alpha
}^{*}/{\sigma _\alpha } = 1.00 \pm 0.02$, ${\hat {\sigma }_\beta
}^{*}/{\sigma _\beta } = 0.99 \pm 0.03$, and ${\hat {\sigma }_\tau
}^{*}/{\sigma _\tau } = 0.94 \pm 0.09$, which are all within the statistical
uncertainty of the true values. The simulations with $M=50$ yield lower statistical uncertainty, but
with the same average values: ${\hat {\sigma }_\alpha
}^{*}/{\sigma _\alpha } = 1.000 \pm 0.007$, ${\hat {\sigma }_\beta
}^{*}/{\sigma _\beta } = 0.99 \pm 0.01$, and ${\hat {\sigma }_\tau
}^{*}/{\sigma _\tau } = 0.94 \pm 0.04$. For all practical purposes, the
remaining bias in ${\hat {\sigma }_\beta
}^{*}$ and ${\hat {\sigma }_\tau
}^{*}$ is negligible.6. Experimental verification
In this section we present experimental results that verify our analysis.
Figure 5(a) shows the raw
data that we use to specify the noise model, ${\boldsymbol
{X}}$, which comprises $M=50$ waveforms, each with $N=256$ points sampled every $T =
50~\textrm{fs}$. With this data as input, we minimize
Eq. (37) to obtain
the ML estimates, ${\hat {\sigma }_\alpha
}$, ${\hat {\sigma }_\beta
}$, ${\hat {\sigma }_\tau
}$, $\hat {{\boldsymbol {\mu
}}}$, $\hat {{\boldsymbol
{A}}}$, and $\hat {{\boldsymbol {\eta
}}}$ for all of the free parameters in the
model. The resulting time-dependent noise amplitude, corrected for bias
using Eq. (41), is
(42)$$\hat{\sigma}^{*}_\mu(t_k) =
\sqrt{V_{kk}({\boldsymbol{\hat{\mu}}}; {\hat{\sigma}_\alpha}^{*},
{\hat{\sigma}_\beta}^{*}, {\hat{\sigma}_\tau}^{*})}.$$
In Fig. 5(b) we compare this model to the observed
time-dependent noise, which we estimate by using $\hat {{\boldsymbol
{A}}}$ and $\hat {{\boldsymbol {\eta
}}}$ to correct for the drift, (43)$$\tilde{{\boldsymbol{x}}}_l
=
\frac{1}{\hat{A}_l}{\boldsymbol{S}}(-\hat{\eta}_l){\boldsymbol{x}}_l,$$
then compute the unbiased standard
deviation at each time point over the set $\{\tilde {{\boldsymbol
{x}}}_l\}$. The model quantitatively describes most
features of the measurements, with small deviations near some of the
peaks. As a further consistency check, Fig. 5(c) shows the normalized residuals for one of the
waveforms, (44)$${\boldsymbol{r}}_{\textrm{ML},l} =
\left\{{\boldsymbol{V}}\left[{\boldsymbol{\zeta}}(\hat{{\boldsymbol{\mu}}},
\hat{A}_l, \hat{\eta}_l); {\hat{\sigma}_\alpha}^{*},
{\hat{\sigma}_\beta}^{*},
{\hat{\sigma}_\tau}^{*}\right]\right\}^{{-}1/2}\left[{\boldsymbol{x}}_{l}
- {\boldsymbol{\zeta}}(\hat{{\boldsymbol{\mu}}}, \hat{A}_l,
\hat{\eta}_l)\right],$$
which demonstrates that they approximately
follow a standard normal distribution, with small but statistically
significant correlations among neighboring points.Figure 6 shows fits to two
sequential waveforms from this set. A fit with Eq. (5) in the time domain, obtained by
minimizing $Q_{\textrm{TLS}}$ in Eq. (28), yields ${\hat {\theta
}}^{\textrm{TLS}}_1 = 1.019\pm 0.003$ for the amplitude and ${\hat {\theta
}}^{\textrm{TLS}}_2 = (-2.8\pm 0.5)~\textrm{fs}$ for the delay, with the normalized
residuals shown in Fig. 6(b). A fit with the same model in the frequency domain, obtained
by minimizing $Q_{\textrm{ETFE}}$ in Eq. (6), yields ${\hat {\theta
}}^{\textrm{ETFE}}_1 = 1.020\pm 0.004$ and ${\hat {\theta
}}^{\textrm{ETFE}}_2 = (-4.4\pm 0.6)~\textrm{fs}$, with the normalized residuals shown in
Fig. 6(d). Both fit methods
reveal significant differences from the ideal values of $\theta _1 = 1$ and $\theta _2 = 0$, reflecting the measurement drift
discussed in Sec. 5. As we found
for the residuals of the noise-model fit in Fig. 5(c), the residuals of the transfer-function fit in
Fig. 6(b) show small but
statistically significant correlations. But as we also found with the
idealized Monte Carlo simulations, the residuals of the ETFE fit in the
frequency domain are much more structured than the residuals of the TLS
fit to the same data in the time domain.
To illustrate the analysis problem that this raises, in Fig. 6(c) we compare an ETFE fit with
Eq. (5) to an ETFE
fit with a more flexible transfer-function model,
(45)$$H_2(\omega;
{\boldsymbol{\theta}}) = (\theta_1 +
\theta_3\omega^{2})\exp[i\omega(\theta_2 +
\theta_4\omega^{2})].$$
Since the two input waveforms are nominally
identical, we know that the downturns in $\textrm{Re} [\hat
{H}_{\textrm{ETFE}}(\omega )]$ and $\textrm{Im} [\hat
{H}_{\textrm{ETFE}}(\omega )]$ with increaing frequency near 2 THz are
spurious. But if we did not know this in advance, we might conclude from
Fig. 6(c) that
Eq. (45) describes
the measurements better than Eq. (5). We would also be able to support this
conclusion by comparing the GOF statistic for the fit with
Eq. (5), $S^{(1)}_{\textrm{ETFE}}
\approx 223$, with $\nu = 254$ degrees of freedom, to that obtained with
Eq. (45), $S^{(2)}_{\textrm{ETFE}}
\approx 191$, with $\nu = 252$ degrees of freedom. By adding only two
additional fit parameters, we reduce $S_{\textrm{ETFE}}$ by 33, which erroneously suggests that
the added complexity of Eq. (45) captures a real physical effect. The
TLS method is more robust against such overfitting: the GOF statistic with
Eq. (5) is $S^{(1)}_{\textrm{TLS}}
\approx 198$, while with Eq. (45), $S^{(2)}_{\textrm{TLS}}
\approx 196$. In this case, adding two free parameters
reduces $S_{\textrm{TLS}}$ by only two, so Occam’s razor
favors the simpler model, Eq. (5). More formally, to select from a set of
transfer-function models $H_k(\omega ; {\boldsymbol
{\theta }}_k)$, $k = 1, 2, \ldots ,
N_{\textrm{model}}$, each with $p_k$ free parameters, we can minimize a
modified cost function based on the Akaike information criterion [21], (46)$$Q_{\operatorname{AIC}}(k) =
S_{\textrm{TLS}}^{(k)} + 2p_k,$$
where $S_{\textrm{TLS}}^{(k)}$ is the TLS GOF statistic for the model $H_k (\omega ; {\boldsymbol
{\theta }}_k)$. As we discussed in Sec. 4, this ability to discriminate among
competing models is one of the major advantages of the TLS method.If we divide the 50 experimental waveforms into 25 sequential pairs and fit
each pair with Eq. (5), the results are qualitatively consistent with the Monte Carlo
simulations and support the conclusion that $S_{\textrm{TLS}}$ offers the better absolute measure of fit
quality. The distribution of $S_{\textrm{ETFE}}$ over the experiments has a mean $\bar {S}_{\textrm{ETFE}}
\approx 235$ and standard deviation $\sigma (S_{\textrm{ETFE}})
\approx 113$, while the distribution for $S_{\textrm{TLS}}$ has a mean $\bar {S}_{\textrm{TLS}}
\approx 246$ and standard deviation $\sigma (S_{\textrm{TLS}})
\approx 39$. For the simulated distributions shown in
Fig. 3(a) and
Fig. 4(a), we obtain $\bar {S}_{\textrm{ETFE}}
\approx 235$, $\sigma (S_{\textrm{ETFE}})
\approx 160$, $\bar {S}_{\textrm{TLS}}
\approx 250$, and $\sigma (S_{\textrm{TLS}})
\approx 22$. As we discussed at the end of Sec. 4, a narrower GOF distribution provides
better sensitivity to fit quality. And while the experimental distribution
for $S_{\textrm{TLS}}$ is nearly twice as broad it is in the
simulations, it is still nearly a factor of 3 narrower than the
experimental distribution for $S_{\textrm{ETFE}}$. Despite the quantitative differences
between the simulations and the experiment, the TLS method consistently
shows better performance than the ETFE.
7. Conclusion
In summary, we have developed a maximum-likelihood approach to THz-TDS
analysis and demonstrated that it has numerous advantages over existing
methods. Starting from a few simple assumptions, we derived a method to
determine the transfer function relationship between two THz-TDS
measurements, together with a method to specify a parameterized noise
model from a set of repeated measurements. In each case, we also derived
expressions for fit residuals in the time domain that are properly
normalized by the expected noise. Through a combination of Monte Carlo
simulations and experimental measurements, we verified that these tools
yield results that are accurate, precise, responsive to fit quality, and
generally superior to the results of fits to the ETFE.
We focused on simple examples to emphasize the logical structure of the
method, but we can readily apply the same approach to more complex
problems. For example, we have successfully used the framework presented
here to measure a weak frequency dependence in the Drude scattering rate
of a metal, which is predicted by Fermi liquid theory; we have also used
it to measure small variations in the THz-frequency refractive index of
silicon with temperature [17]. In
both of these applications we found that maximum-likelihood analysis in
the time domain provided better performance than a least-squares analysis
based on the ETFE in the frequency domain. We expect similar improvements
in other applications, and provide MATLAB functions in Code Repository 1
(Ref. [25]) for others to try.
Funding
Natural Sciences and Engineering Research
Council of Canada.
Acknowledgments
JSD thanks John Bechhoefer for encouragement and stimulating discussions as
this work developed.
Disclosures
The authors declare no conflicts of interest.
References
1. L. Duvillaret, F. Garet, and J.-L. Coutaz, “Highly precise
determination of optical constants and sample thickness in terahertz
time-domain spectroscopy,” Appl.
Opt. 38(2),
409 (1999). [CrossRef]
2. T. D. Dorney, R. G. Baraniuk, and D. M. Mittleman, “Material parameter
estimation with terahertz time-domain
spectroscopy,” J. Opt. Soc. Am.
A 18(7),
1562–1571
(2001). [CrossRef]
3. M. Naftaly ed., Terahertz
Metrology (Artech
House, Boston,
2015).
4. W. Withayachumnankul, B. M. Fischer, H. Lin, and D. Abbott, “Uncertainty in
terahertz time-domain spectroscopy
measurement,” J. Opt. Soc. Am.
B 25(6), 1059
(2008). [CrossRef]
5. M. Naftaly and R. Dudley, “Methodologies for
determining the dynamic ranges and signal-to-noise ratios of terahertz
time-domain spectrometers,” Opt.
Lett. 34(8),
1213–1215
(2009). [CrossRef]
6. M. Naftaly, “Metrology issues and
solutions in THz time-domain spectroscopy: Noise, errors,
calibration,” IEEE Sens. J. 13(1),
8–17 (2013). [CrossRef]
7. M. Naftaly, R. G. Clarke, D. A. Humphreys, and N. M. Ridler, “Metrology
state-of-the-art and challenges in broadband phase-sensitive terahertz
measurements,” Proc. IEEE 105(6),
1151–1165
(2017). [CrossRef]
8. M. Skorobogatiy, J. Sadasivan, and H. Guerboukha, “Statistical Models for
Averaging of the Pump–Probe Traces: Example of Denoising in
Terahertz Time-Domain Spectroscopy,”
IEEE Trans. Terahertz Sci. Technol. 8(3),
287–298
(2018). [CrossRef]
9. H. W. Hubers, M. F. Kimmitt, N. Hiromoto, and E. Brundermann, “Terahertz spectroscopy:
System and sensitivity considerations,”
IEEE Trans. Terahertz Sci. Technol. 1(1),
321–331
(2011). [CrossRef]
10. M. Krüger, S. Funkner, E. Bründermann, and M. Havenith, “Uncertainty and
ambiguity in terahertz parameter extraction and data
analysis,” J. Infrared, Millimeter,
Terahertz Waves 32(5),
699–715
(2011). [CrossRef]
11. D. Jahn, S. Lippert, M. Bisi, L. Oberto, J. C. Balzer, and M. Koch, “On the influence of
delay line uncertainty in THz time-domain
spectroscopy,” J. Infrared, Millimeter,
Terahertz Waves 37(6),
605–613
(2016). [CrossRef]
12. A. Rehn, D. Jahn, J. C. Balzer, and M. Koch, “Periodic sampling
errors in terahertz time-domain measurements,”
Opt. Express 25(6),
6712–6724
(2017). [CrossRef]
13. W. Withayachumnankul and M. Naftaly, “Fundamentals of
measurement in terahertz time-domain
spectroscopy,” J. Infrared, Millimeter,
Terahertz Waves 35(8),
610–637
(2014). [CrossRef]
14. J. L. M. van Mechelen, A. B. Kuzmenko, and H. Merbold, “Stratified dispersive
model for material characterization using terahertz time-domain
spectroscopy,” Opt. Lett. 39(13), 3853
(2014). [CrossRef]
15. S. Krimi, J. Klier, J. Jonuscheit, G. von Freymann, R. Urbansky, and R. Beigang, “Highly accurate
thickness measurement of multi-layered automotive paints using
terahertz technology,” Appl. Phys.
Lett. 109(2),
021105 (2016). [CrossRef]
16. R. Peretti, S. Mitryukovskiy, K. Froberger, M. A. Mebarki, S. Eliet, M. Vanwolleghem, and J. Lampin, “THz-TDS time-trace
analysis for the extraction of material and metamaterial
parameters,” IEEE Trans. Terahertz Sci.
Technol. 9(2),
136–149
(2019). [CrossRef]
17. L. Mohtashemi, “Test of Fermi liquid
theory with terahertz conductivity measurements of
MnSi,” Ph.D. thesis, Simon
Fraser University
(2020).
18. D. Grischkowsky and N. Katzenellenbogen, “Femtosecond pulses of
terahertz radiation: physics and applications,”
in OSA Proceedings on Picosecond Electronics and
Optoelectronics, vol. 9T. C. L. G. Sollner and J. Shah, eds. (Optical Society of
America, Washington,
DC, 1991), p.
9–14.
19. L. Ljung, System Identification: Theory
for the User, Prentice-Hall Information and System
Sciences Series (Prentice Hall,
Upper Saddle River, N.J.,
1999), 2nd ed.
20. P. Guillaume, I. Kollar, and R. Pintelon, “Statistical analysis of
nonparametric transfer function estimates,”
IEEE Trans. Instrum. Meas. 45(2),
594–600
(1996). [CrossRef]
21. Y. Pawitan, In All Likelihood: Statistical
Modelling and Inference Using Likelihood
(Oxford University Press,
Oxford,
2001).
22. S. Van Huffel and J. Vandewalle, The Total Least Squares
Problem, Frontiers in Applied Mathematics
(Society for Industrial and Applied
Mathematics,
Philadelphia,
1991).
23. W. Withayachumnankul, B. M. Fischer, and D. Abbott, “Material thickness
optimization for transmission-mode terahertz time-domain
spectroscopy,” Opt. Express 16(10),
7382–7396
(2008). [CrossRef]
24. J. Neyman and E. L. Scott, “Consistent Estimates
Based on Partially Consistent Observations,”
Econometrica 16(1),
1–32 (1948). [CrossRef]
25. J. S. Dodge, L. Mohtashemi, P. Westlund, D. G. Sahota, G. B. Lea, I. Bushfeld, and P. Mousavi, “MATLAB functions for
maximum-likelihood parameter estimation in terahertz time-domain
spectroscopy,” Zenodo (2020).
https://zenodo.org/record/4326594.