We analyze the statistical properties of the maximum likelihood estimator, least squares estimator, and Pearson’s χ2-based and Neyman’s χ2-based estimators for the estimation of decay constants and amplitudes for fluorescence lifetime imaging. Our analysis is based on the linearization of the gradient of the objective functions around true parameters. The analysis shows that only the maximum likelihood estimator based on the Poisson likelihood function yields unbiased and efficient estimation. All other estimators yield either biased or inefficient estimations. We validate our analysis by using simulations.
© 2013 OSA
Fluorescence lifetime imaging microscopy (FLIM) provides useful information about the status of fluorophores and their molecular environments, which can be used to resolve physiological parameters in cells [1, 2]. Moreover, FLIM can be extended to monitor fluorescence resonance energy transfer that can be used to study the dynamics of proteins within the cells . Since FLIM is based on the decay properties of fluorophores, an accurate estimation of fluorescence lifetime becomes important. The fluorescence lifetime is usually estimated either in the time domain, using a time correlated single photon counting (TCSPC) technique or in the frequency domain, using an intensity-modulated light source and phase-shifted emission light . Compared with methods used for the frequency domain, the TCSPC-based method has advantages such as robustness to scattering, higher precision, and the ability to measure short lifetimes .
To estimate decay time constants in time domain, one usually fits an exponential decay model to the measured photon counts by maximizing an objective function that measures the goodness of the fit to the data. Since the detected photon count follows a Poisson distribution, one recommended measure for the goodness of fit is the Poisson likelihood function, which leads to the maximum likelihood estimator (MLE) that has many desirable properties such as asymptotic unbiasedness, asymptotic consistency, and asymptotic efficiency . Nevertheless, many previous investigations did not use the Poisson likelihood function. Instead, these investigations used the sum of square errors, Pearson’s χ2 and Neyman’s χ2, that originate from approximations of the Poisson likelihood function to the Gaussian likelihood function [1,6–9]. We suspect that this is because χ2-based measures (including the sum of square errors, which is χ2 for data having equal variances) have been frequently used to measure the goodness of fit in data analysis , and because it may look natural to measure the goodness of fit by a function that includes the sum of square errors between the model and data, as in the case with χ2-based measures. However, since χ2-based measures assume that each data value is a Gaussian random variable, estimation using such measures is based on an incorrect probability model. For the case of high photon counts, it is known that the Gaussian likelihood function might be an alternative to the Poisson likelihood function, because the Poisson probability distribution can be approximated by the Gaussian probability distribution . However, for the case of low photon counts, one should not use the estimators that are based on the Gaussian likelihood function. Moreover, even in case of high photon counts, one should not use χ2-based estimators because these estimators are either biased or inefficient, as demonstrated herein. Nevertheless, the least squares estimator (LSE) was used in software packages for FLIM , and χ2-based estimators were used in FLIM imaging [7–9] until recently, in addition to the many χ2-based investigations that were reported earlier .
We think that the plurality of incorrect estimators is due to the lack of an analysis of statistical properties of each estimator. Although there are some investigations that compare the performance of different estimators [11,12], the comparisons are based on only numerical simulations and experiments, and do not use analysis. In addition, the usefulness of these comparisons is limited, because most studies are based on estimating the parameters of a single exponential decay. Recently, an investigation on estimating parameters of a double exponential model for measuring protein interacting fractions was reported . Except for empirical comparisons, to our knowledge, there exists only one study that analyzed the performance of the estimators . However, this analysis was not very useful since it assumed an infinite number of photons and was also used for the estimation of parameters of a a single decay. Moreover, since the behaviors of MLE and Pearson’s χ2-based estimator for the asymptotic case of infinite photon counts are identical in their analysis , it may mislead readers, who may think that equal performance can be achieved either using the MLE or Pearson’s χ2-based estimator. In fact, the Pearson’s χ2-based estimator should not be used since, as demonstrated herein, this estimator is biased for a finite number of photon counts.
In this investigation, we approximate the mean and variance of MLE, χ2, Pearson’s χ2, and Neyman’s χ2 methods by linearizing the estimators. Such analysis was used effectively in some previous works to analyze the properties of nonlinear estimators [15, 16]. Our analysis shows that the MLE is both unbiased and efficient, while the variance of the LSE is larger than that of the MLE, although the LSE is unbiased. In addition, we show that the frequently used Pearson’s χ2 and Neyman’s χ2-based estimation yield a biased estimator. We use simulations to validate our analysis.
2.1. Problem formulation
We model a photon count y(ti), that is measured at time ti from a mixture of fluorophores, as a Poisson random variable :1]. To simplify the notations, we denote λ (ti; θ) by λi(θ) and y(ti) by yi. Because FLIM should be based on accurate decay properties, the parameter vector θ should be accurately estimated from the noisy measurement Y = [y0, y1,..., yN−1]. To achieve this goal, the estimation is usually carried out by determining the minimizer of an objective function, in the following way: Eq. (1) is determined by the minimizer of the negative Poisson log-likelihood function, defined as follows: 11]: 11]. We note that if all yi are non-zero, the modified Neyman’s χ2 is the same as the original .
2.2. Mean and variance approximation
If the mean and the variance of all estimators (that are based on the corresponding objective functions described in the previous section) are known, it would be easy to compare their performance. One would prefer an unbiased estimator that has small variance. However, since the estimator is defined implicitly by the minimizer of some objective function, it is not possible to derive exact analytical expressions for the mean and variance . To theoretically analyze the performance of the estimators, we approximate the mean and variance of each estimator by linearizing the gradient of the objective function around the true parameter as reported previously [15, 16].
We consider an estimator that is defined by the minimizer of an analytic objective function. The minimizer is determined by zeroing the gradient of the objective function, as follows :16]. To approximate its mean and variance, we linearize the column gradient ∇θΦ(θ̂, Y) by the first-order Taylor series expansion that is performed around the true parameter θ, as follows : 16]: 16]: 16]:
2.3. Cramer-Rao bound
For any unbiased estimator, the Cramer-Rao bound is the lowest achievable bound for the variance of the estimator [5, 17]. The variance of the j-th component of any unbiased estimator θ̂j is bounded by Jjj, as follows :Eq. (1)). By using the negative log-likelihood function defined in Eq. (5), it is straightforward to show that the (j, k)-th element of the Fisher information matrix F can be defined as follows:
2.4. Maximum likelihood estimation
The MLE that use the negative Poisson log-likelihood function, defined in Eq. (5), is unbiased in our approximation since17, p. 80] 5].
2.5. Least squares estimation
The least squares estimation is also unbiased in our approximation, because18]: Appendix A. Therefore, we conclude that the variance of the LSE is larger than the MLE, although the estimator is unbiased. This is because the LSE is based on an incorrect probability model.
2.6. Pearson’s χ2
The approximated mean of the Pearson’s χ2-based estimator shows that the estimator is biased because, from what follows, the negative expected gradient value at true parameter is not zero:19]. Although this statement is correct, readers should not be misled, because this case describes the case of infinite amplitude. For the estimation of a single decay constant, the bias in estimating the decay constant decreases in proportion to the reciprocal value of the amplitude (see Appendix B). Since the rate of bias decrease is only sub-linear, one should expect the presence of bias even for the case of high photon counts. For the estimation of a single decay constant, the Pearson’s χ2-based estimator will show positive bias for the case of high photon counts, because the expectation of the negative of the gradient, defined in Eq. (24), has a positive value, and the approximated second order derivative be a positive constant. This is related to the fact that Pearson’s χ2-based estimation of the mean of a Poisson random variable overestimates the mean , since an overestimated mean would imply slower decay than the true one. However, in estimating multiple parameters simultaneously, the bias can be either negative or positive, depending on the value of the Hessian matrix. Because unbiased estimation of decay parameters is important, we believe that one should avoid using Pearson’s χ2 for FLIM imaging. Regarding the variance approximation, although we were able to derive a formula, we have not included it here because the expression is lengthy and includes even fourth order statistics. Instead, we include the empirical variances and discuss the performance in comparison with other estimators in the following section.
2.7. Neyman’s χ2
The approximated mean of the modified Neyman’s χ2-based estimator shows that, similar to the Pearson’s χ2-based estimator, this estimator is biased as well. The expectation of the negative of gradient at true θ is calculated as follows:20]: Eq. (26) and Eq. (28), we approximated 1/(λi(θ) − 1) by 1/λi(θ) to avoid the numerical instability that arises when λi(θ) = 1. An interesting observation is that the expectation of the negative of gradient, defined in Eq. (26), attains a negative value, whereas that of Pearson’s χ2 attains a positive value. Again, this is related to the fact that Neyman’s χ2-based mean estimation underestimates the mean . Therefore, an estimation of a single time constant using Neymans’s χ2 and Pearson’s χ2-based estimation methods result in underestimation and overestimation, respectively. This is consistent with experimental results that were reported previously . Regarding the variance approximation, we do not include the approximation formula, as it also includes fourth order statistics and does not provide useful information. Instead, in the subsequent section, we include the numerical simulation results.
2.8. Numerical optimization
To determine the MLE and the χ2-based estimations, it is required to solve the minimization problem defined in Eq. (4) with the objective functions defined in Eq. (5) through Eq. (8). Note that the minimization problem is a constrained optimization problem with non-negativity constraints since both decay and amplitude parameters have positive values. We solve the constrained minimization problem using the trust region reflective method [21–23] implemented in the Optimization toolbox in MATLAB (Mathworks, USA), which is based on the quadratic approximation of an objective function. The Hessian matrix HM of the negative log-likelihood function, which is the objective function for the MLE, is defined as follows:Eq. (29) since inclusion of the second order derivative term can be destabilizing if the model fits badly [10, p. 688]. The Hessian matrices for the χ2-based estimators are also computed using only first order derivative terms for the same reason.
The approximations for the statistical properties of each estimator developed in the preceding sections are based on the assumption that each estimator is determined by zeroing the gradient of corresponding objective function around true parameter. In practice, since objective functions for the MLE and the χ2-based estimators are non-convex functions, numerical optimization may find only a local minimum depending on initial parameter values. Because it is very challenging to analyze the behaviors of local minima theoretically, we study the effects of local minima in decay and amplitude parameter estimation using numerical simulations in the following section.
3. Simulation results
To validate the analysis that was presented in the preceding section, we conducted simulation studies in order to estimate the two decay constants and their associated amplitudes in an exponential decay model that is defined as follows:Eq. (30), we used the IRF of a truncated Gaussian function with σ = 1 and 5 nonzero points. Note that the Gaussian function was often used to model an IRF [2, 6]. We generated 1024 Poisson random variables yi whose mean is λi(θ) with uniformly spaced ti from t0 = 0 to tN−1 = 12.788ns. We set true values as τ1 = 1ns, τ2 = 4ns. To simulate a different number of photon counts, we changed A1 from 23 to 125 while we set A2 = A1 − 10. By using the generated photon counts (i.e., the Poisson random variables), we estimated the parameters using the MLE, LSE, and Pearson’s χ2 and Neyman’s χ2-based estimators. We set initial parameter values by two times the true values. We repeated the estimation 5000 times, using different Poisson random variable realizations.
Figure 1 shows the sample mean of each estimator for [τ1, τ2, A1, A2] from the 5000 estimations with the approximated mean values that were obtained using the formula developed in the preceding section for the Pearson’s and Neyman’s χ2-based estimators. Notably, because there is no bias in our approximation, the approximated means of the MLE and the LSE are the same as true ones. As shown in Fig. 1, the sample means of the MLE and the LSE are very close to the true parameters for the entire range of photon counts. However, the Pearson’s χ2-based estimator shows positive biases in estimating τ1 and τ2. There also exists a small positive bias in estimating A1 and a small negative bias in estimating A2. For higher photon counts, the approximated mean formula is in better agreement with the sample mean. The approximated mean values are included only in the range for which the approximated Hessian is a positive definite matrix. Compared with the Pearson’s χ2-based estimator, the Neyman’s χ2-based estimator exhibits more complicated behaviors. The Neyman’s χ2-based estimator shows positive biases in estimating τ1 and τ2 when photon counts are very low. However, in the region wherein photon counts are relatively high, the biases are negative. The approximated mean formula agrees with the sample mean only in the regime of high photon counts. As a matter of fact, the approximated Hessian matrix of the Neyman’s χ2 is positive definite only for sufficiently high photon counts, indicating that the Neyman’s χ2-based estimator might be more nonlinear than the Pearson’s χ2-based estimator.
In Fig. 2, we present the sample variance of each estimator with the Cramer-Rao bound and approximated variances for the LSE. As shown in Fig. 2, the sample variances of the MLE and the LSE agree with the Cramer-Rao bound and the approximated variance formula very well for the entire range of photon counts. We note that, according to the proof presented in Appendix A, the approximated variance of the LSE is always greater than the Cramer-Rao bound. The variances of the Pearson’s χ2-based estimator are larger than those of the MLE for low photon counts, although the variances become close to the Cramer-Rao bound as the number of photons increases. Note that even for high photon counts, the Pearson’s χ2-based estimator is biased in estimating τ1 and τ2. For Neyman’s χ2-based estimator, the variances are greater than those of the MLE for relatively low photon counts. Although the variances for estimating τ1 and τ2 approach the Cramer-Rao bound as photon counts increase, variances in estimated amplitudes do not approach the Cramer-Rao bound even in the case of very high photon counts, as is shown in Fig. 2(c) and Fig. 2(d). We believe that the Neyman’s χ2-based estimator should not be used because it is more biased than the Pearson’s χ2-based method, in addition to showing larger variances. Note that the Neyman’s χ2-based method was a frequently used method in FLIM imaging [7, 12]. We believe that one should choose the MLE for FLIM imaging, not only for low photon counts but also for high photon counts. Our analysis and simulations show biases in Pearson’s and Neyman’s χ2-based methods even for the case of more than 45000 photon counts. In addition, since the LSE shows larger variances than the MLE, there is no reason to favor the LSE over to the MLE.
As mentioned in the preceding section, numerical optimization may determine only a local minimizer since the objective functions for the MLE and the χ2-based estimators are non-convex functions. To study the effects of local minimizers in estimating decay and amplitude parameters, we conducted another simulation study. We generated 1024 Poisson random variables yi whose mean is λi(θ) with true parameter values as τ1 = 1ns, τ2 = 4ns, A1 = 72, and A2 = 62 (number of average total photon counts is 24852). Then, we estimated the decay and the amplitude parameters using the MLE and the χ2-based methods while increasing initial parameter values from the true values to five times the true values. We repeated these estimations 5000 times with different Poisson random variable realizations. Figure 3 shows the sample mean of each estimator from 5000 realizations for different initial parameters. To our surprise, all the four estimation methods are very robust to the change of initial parameters. As shown in Fig. 3, the sample mean values of estimated parameters using the four methods are almost the same for the entire range of initial parameters. Similar results are found in the sample variance of each estimator which is shown in Fig. 4. One can see that the sample variances of the MLE and the χ2-based estimations do not change much for the entire range of initial parameters. In addition, the sample variance of the MLE and the LSE agree with the Cramer-Rao bound and the variance approximation formula for the LSE, respectively. These simulation results suggest that the objective functions of the four estimation methods might be locally convex functions for the cases of high photon counts.
It is expected that the correct estimation of decay and amplitude parameters becomes more difficult if two decay parameters are more similar and/or the number of photons is smaller. For such cases, the effect of local minimizers in estimating the parameters might be more severe than the cases of high photon counts. To study the statistical properties of each estimator when the number of photon counts is small and two decay parameters are similar, we changed the true parameters values in λi(θ) to τ1 = 1ns, τ2 = 2ns, A1 = 28 and A2 = 18 (number of average total photon counts is 5138). Then, we conducted the same simulation study as the previous one. Figure 5 shows the sample mean values of estimated decay and amplitude parameters for different initial parameters. One can see that the sample mean values of the MLE exhibit small biases. In addition, the sample mean values are different for different initial parameter values. We suspect that this dependency on initial parameter is due to the effects of local minimizers. Compared with the MLE, the Pearson’s χ2-based estimation was more robust to the change of initial parameters. However, the Pearson’s χ2-based method shows larger biases than the MLE as shown in Fig. 5(a) through Fig. 5(d). The Neyman’s χ2 and the LSE exhibit very large bias in estimating τ2 for some initial parameters as shown in Fig. 5(b). Although the amounts of the biases of the MLE are different depending on initial parameters, the sample mean values of the MLE are the closest ones to the true parameter values for the entire range of initial parameters.
Figure 6 shows the sample variances of each estimator from the 5000 realizations. We do not include the Cramer-Rao bound and the approximated variances for the LSE in Fig. 6 since neither the MLE is unbiased nor the approximations developed in the preceding section are valid if the number of photon counts is small. As shown in Fig. 6(a), Fig. 6(c) and Fig. 6(d), the sample variances of the Pearsons and Neymans χ2 were smaller than those of the MLE for τ1, A1 and A2. However, in estimating τ2, Pearson’s and Neyman’s χ2 showed larger variances than the MLE. In addition, the variances of the LSE were always larger than the MLE. Note that although the Pearson’s χ2-based estimator shows smaller variances than the MLE in estimating some parameters, it has larger biases than the MLE. Since unbiased estimation of decay and amplitude parameters is important, we believe that one should choose the MLE for the cases of low photon counts with similar decay parameters. Note that a previous investigation also reported that the MLE should be used for the cases of low photon counts based on empirical comparisons .
We used gradient linearization around true parameters to analyze the statistical properties of amplitude and decay constant estimators for FLIM. In the analysis, the MLE that was based on the Poisson likelihood function exhibited both unbiasedness and efficiency. The LSE was unbiased, but it showed a larger variance than the MLE. Since the Pearson’s and Neyman’s χ2-based estimators showed biases, it can be concluded that they are not suitable for even high photon counts. We believe that the analysis presented in this work can also be used to predict the performance of the estimators for FLIM.
Appendix A: Inequality for the variance approximations of MLE and LSE
Matrix Cauchy-Schwarz inequality is defined as follows :Eq. (21) is ATB, and the K matrix, defined in Eq. (22), is ATA, the covariance approximation of the LSE satisfies the following inequality:
Appendix B: Mean approximation for the estimation of a single time constant using the Pearson’s χ2
For the estimation of a single decay constant from a single fluorophore, the mean intensity is modeled as follows:Eq. (25) dominates when A is very large, since the first term is proportional to A but others are not. Therefore, we approximate the second derivative of the objective function as follows:
This work was supported by the Korean Research Foundation Grant funded by the Korean Government (KRF-2008-331-D00419) and Ewha Global Top 5 Grant 2011 of Ewha Womans University.
References and links
1. J. R. Lakowicz, Principles of Fluorescence Spectroscopy (Kluwer Academic/Plenum, 1999) [CrossRef] .
3. S. Kumar, C. Dunsby, P. A. A. D. Beule, D. M. Owen, U. Anand, P. M. P. Lanigan, R. K. P. Benninger, D. M. Davis, M. A. A. Neil, P. Anand, C. Benham, A. Naylor, and P. M. W. French, “Multifocal multiphoton excitation and time correlated single photon counting detection for 3-D fluorescence lifetime imaging,” Opt. Express 15, 12548–12561 (2007) [CrossRef] [PubMed] .
5. H. Cramer, Mathematical Methods of Statistics (Princeton University, 1999).
6. S. Laptenok, K. M. Mullen, J. W. Borst, I. H. M. van Stokkum, V. V. Apanasovich, and A. J. W. G. Visser, “Fluorescence lifetime imaging microscopy (FLIM) data analysis with TIMP,” J. Stat. Softw. 18, 1–20 (2007).
7. S. Pelet, M. J. R. Previte, L. H. Laiho, and P. T. C. So, “A fast global fitting algorithm for fluorescence lifetime imaging microscopy based on image segmentation.” Biophys. J. 87, 2807–17 (2004) [CrossRef] [PubMed] .
8. N. Boens, W. Qin, N. Basarić, J. Hofkens, M. Ameloot, J. Pouget, J.-P. Lefèvre, B. Valeur, E. Gratton, M. vandeVen, N. D. Silva, Y. Engelborghs, K. Willaert, A. Sillen, G. Rumbles, D. Phillips, A. J. W. G. Visser, A. van Hoek, J. R. Lakowicz, H. Malak, I. Gryczynski, A. G. Szabo, D. T. Krajcarski, N. Tamai, and A. Miura, “Fluorescence lifetime standards for time and frequency domain fluorescence spectroscopy,” Anal. Chem. 79, 2137–2149 (2007) [CrossRef] [PubMed] .
9. P. J. Steinbach, R. Ionescu, and C. R. Matthews, “Analysis of kinetics using a hybrid maximum-entropy/nonlinear-least-squares method: application to protein folding,” Biophys. J. 82, 2244–2255 (2002) [CrossRef] [PubMed] .
10. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C++ - The Art of Scientific Computing (Cambridge University, 2002).
11. T. Hauschild and M. Jentschel, “Comparison of maximum likelihood estimation and chi-square statistics applied to counting experiments,” Nucl. Instrum. Meth. A 457, 384–401 (2001) [CrossRef] .
12. M. Maus, M. Cotlet, J. Hofkens, T. Gensch, F. C. De Schryver, J. Schaffer, and C. A. M. Seidel, “An experimental comparison of the maximum likelihood estimation and nonlinear least-squares fluorescence lifetime analysis of single molecules,” Anal. Chem. 73, 2078–2086 (2001) [CrossRef] [PubMed] .
13. K. A. Walther, B. Papke, M. B. Sinn, K. Michel, and A. Kinkhabwala, “Precise measurement of protein interacting fractions with fluorescence lifetime imaging microscopy,” Mol. BioSyst. 7, 322–336 (2011) [CrossRef] [PubMed] .
14. J. Tellinghuisen and C. W. Wilkerson, “Bias and precision in the estimation of exponential decay parameters from sparse data,” Anal. Chem. 65, 1240–1246 (1993) [CrossRef] .
15. J. Fessler, “Mean and variance of implicitly defined biased estimators (such as penalized maximum likelihood): applications to tomography,” IEEE Trans. Image Process. 5, 493 –506 (1996) [CrossRef] [PubMed] .
16. J. Kim and J. Fessler, “Intensity-based image registration using robust correlation coefficients,” IEEE Trans. Med. Imag. 23, 1430 –1444 (2004) [CrossRef] .
17. H. Van Trees, Detection, Estimation, and Modulation Theory, Part 1 (John Wiley & Sons, 2001).
18. T. G., “A matrix extension of the cauchy-schwarz inequality,” Econ. Lett. 63, 1–3 (1999) [CrossRef] .
19. P. Hall and B. Selinger, “Better estimates of exponential decay parameters,” J. Phys. Chem. 85, 2941–2946 (1981) [CrossRef] .
20. K. J. Mighell, “Parameter estimation in astronomy with poisson-distributed data. I. the statistic,” The Astrophys. J. 518, 380 (1999) [CrossRef] .
21. J. Moré and D. Sorensen, “Computing a trust region step,” SIAM J. Sci. Comput. 4, 553–572 (1983) [CrossRef] .
22. M. A. Branch, T. F. Coleman, and Y. Li, “A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems,” SIAM J. Sci. Comput. 21, 1–23 (1999) [CrossRef] .
23. R. Byrd, R. Schnabel, and G. Shultz, “Approximate solution of the trust region problem by minimization over two-dimensional subspaces,” Math. Program. 40, 247–263 (1988) [CrossRef] .