We present a new multiple-tau correlation algorithm which is the fastest to date. The resulting curve is identical to that obtained with the conventional multiple-tau algorithm, but the calculation time is much shorter. It combines two approaches. For short values of the lag-time a very simple correlation histogram is used, while for higher lag-time values the traditional multiple-tau bin-and-multiply approach is used. The lag-time limit between these two stages depends on the count rate. The computation time scales linearly with the count rate and is as fast as 0.1µs/photon.
© 2012 OSA
Photon correlation spectroscopy is used in two standard techniques, which are applied in a large number of laboratories all over the world: Dynamic Light Scattering (DLS) and Fluorescence Correlation Spectroscopy (FCS). DLS  analyzes light intensity fluctuations resulting from scattering by small particles in suspension or polymers in solution. It is useful in determining the size distribution of the particles.
FCS  analyzes the fluorescence fluctuations resulting from diffusion, flow, or other processes affecting the brightness of individual fluorescent molecules. It is useful in determining concentration, mobility, and brightness of molecules as well as molecular interactions.
Photon correlation can be performed using a hardware correlator. This kind of correlator can measure the whole photon record correlation curve efficiently and in real time. It usually adopts the so-called multiple-tau architecture , which has the advantage of measuring correlation times that span a many decades lag-time range, while maintaining a reasonably limited number of channels. But it is rather expensive, and lacks flexibility, especially if one wants to analyze only a part of the record.
For instance in FCS, working on living cells, the fluorescence fluctuation is not always stationary, and it would be very convenient to be able to visualize in real time the current autocorrelation curve over a time window of a few seconds. Similarly for offline analyze one may want to compute time-resolved autocorrelation, by correlating the photon record on a sliding window, or very simply for excluding from the calculation artifacts resulting from the passage of big aggregates in the focus region. In this respect software correlators are superior. An additional advantage of having access to the entire time sequence of photon arrival times , is the possibility to analyze it using other complementary techniques such as Photon Counting Histogram / Fluorescence Intensity Distribution Analysis (PCH or FIDA) [5,6], Moment Analysis [7,8], or Photon Arrival-Time Interval Distribution (PAID) .
Because photon records can contain a huge number of photons, correlating takes some time. For online calculation the requirement is that the calculation time be smaller than the record time, which can be achieved at low to moderate count rate. At higher count rate software correlators fail in computing in real time. For offline analysis a reduced computation time is always well-liked. In addition, in case of multichannel acquisitions such as dual focus FCS , dual color FCS , or parallelized FCS , the number of FCS curves to be computed increases the computation burden and a fast algorithm becomes even more important.
Several algorithms have been proposed:
In 2001, Magatti et al.  proposed the first multiple-tau software correlator, based on a bin-and-multiply (B&M) approach in the same way as hardware correlators operate. The multiple-tau correlator is an improved version of the Frenkel’s correlator  that allows an independent control the averaging time and the lag time. It has been reviewed and analyzed by Ramirez et al. . This method induces a systematic error due to triangular averaging. Magatti et al. showed that the absolute systematic error depends on the lag-time to bin-width ratio, and this error is lower than 10−3 for a ratio of 7 or higher.
In 2003, they proposed the first algorithm, which they call Photon-Mode (PM), analyzing photon arrival times, by interpreting the correlation function in term of joint probability distribution . Briefly, choosing lag-times following an approximately geometrical progression, their algorithm searches which lag-time each pair of photon contributes. They show that PM is most efficient for short values of the lag-time. For larger values of the lag-time, they used the traditional B&M multiple-tau scheme. The limit between the PM scheme and the B&M scheme was fixed to 100µs. Using a 1.5GHz Pentium 4 PC computer, they succeeded in processing data in real time up to 33kHz.
Also in 2003, Whal et al.  proposed the “time-lag-to-correlation algorithm”. Very briefly, for each value of the lag-time , two photon arrival time series shifted by are compared, and the number of coincident events are counted. At high count rate, the computation time is the same as B&M multiple-tau scheme, while it is faster at lower count rate.
In 2006, Laurence et al.  proposed an algorithm with arbitrary lag-time and bin-width. Their algorithm converts the correlation products in term of number of photon pairs, similarly to Magatti’s approach. Its advantage is its flexibility. Its rapidity is comparable to that of the conventional B&M multiple-tau algorithm.
In 2009, Yang et al.  proposed a new algorithm working on photon arrival time data, resulting in an effective linear correlator, with lag-times spaced by the time resolution of the acquisition card. It is very simple and makes use of a look up table indicating which elements of the autocorrelation function (ACF) needs to be updated after the arrival of each new photon. In this paper, we name this scheme Simple Correlation Histogram (SCH) algorithm. Incidentally, the computation time is independent of the time resolution.
Our approach resembles that of Magatti et al. [13,16] in that, for higher values of the lag-time we use the B&M scheme, while for lower values of the lag-time we use a slightly improved version of SCH. The combination of these two schemes, were the lag-time limit between the two schemes is optimized, results in what we call Fast Two-stage Correlation algorithm (F2Cor). As we will see, its performances in term of computation speed, at all count rates, are unsurpassed, while providing similar curves as traditional B&M multiple-tau algorithm, both in term of smoothing as well as signal-to-noise ratio.
Usually, in order to save calculation time, the autocorrelation is not processed for all possible value of the lag-time. Indeed for most FCS applications, the time resolution of the photon record only needs to be small compared to that of the ACF lag-time. In the case of multiple-tau hardware correlators the lag-time scale follows an approximately geometrical progression, and our algorithm also uses this lag-time scale .
More precisely, the time scale is linear by part. A specific lag-time of this series is determined by two parameters: the level k, and the level lag-time i, such that , , and :
We denote K the B&M algorithm start level, that is B&M will be performed for the values .
In the B&M scheme, at level k and level lag-time l, we have
, where L is the record duration in time resolution units.
Each individual product with and is a Bernouilli random variable, the value of which can be 0 or 1. As the bin-width is small compared to the lag-time, and in order to estimate approximately the noise-to-signal ratio of the ACF, we assume all these variables to have the same mean value µ. Their variance is . For L large, the number of individual products of the form , with , is approximately . As a result, is an estimation of true mean autocorrelation , affected by a triangular averaging, as it was already shown in , in the continuous case. The total number of individual products is then , to that is the sum of independent random variables. The expected value of is then , and its variance is . Its coefficient of variation (noise to signal ratio) is then
For lower values of , SCH is used. In this case, we have a linear correlator with lag-times comprised between 8 and = . At each new photon, arriving at time , all are increased by one, with , where G is the unnormalized autocorrelation array. Figure 1 represents the SCH algorithm. At each iteration p is updated so that, , and for , is increased by one.
The idea is similar to that of Yang et al. , except that in our case, we don’t need a look-up table, so that the number of iterations is reduced by half, which increases the execution speed.
Assuming the photon stream to follow a Poisson distribution, the loop is iterated on average times for each new photon, where is the average number of photon per time unit. We have photon in the record, so that the total number of iterations is .
The computation time of the SCH stage is then
The B&M stage manages levels . The bin-width at level k is 2k, so that level k is updated every 2k time unit. The average number of products realized by the B&M stage is then proportional to for . Then the computation time of the B&M stage is:
It is minimum for , that is,
For an optimum value of K, we finally deduce the F2Cor computation time:
In particular, we see that is proportional to the count rate or, equivalently, to the total number of photon.
The SCH stage provides an ACF with linear values, while the B&M stage provide a multiple-tau ACF. Incidentally, the limit between these two parts of the curve depends on the count rate, as shown in Eq. (5). What we want finally is a multiple-tau ACF, so that the SCH result should be converted into a multiple-tau ACF. This way, the fact that the whole ACF is processed in two parts, and where the limit lies between these two parts will be transparent to the user. In this respect the linear tau scale SCH result needs to be convoluted so as to compute the values corresponding to the B&M approach. We can estimate from G0, the linear scaled autocorrelation provided by SCH, using a triangular averaging:
We will now calculate the noise-to-signal ratio of : , with , follows the normal distribution . The mean value and the variance of are
We deduce the noise-to-signal ratio:
By comparison with Eq. (1), we see that SCH slightly improves the signal to noise ratio. While the improvement is modest, it is interesting as far as SCH applies to the lower part of the FCS curve, where the points are the noisiest.
The purpose of triangular averaging on SCH raw data was to smooth the real autocorrelation curve the same way B&M stage does. We will now see that a squared filter might also be a good choice.
As long as the multiple-tau sampling is valid, the ACF curve is almost linear between two consecutive values and . Then if we get an estimate of the autocorrelation at multi-tau sampling times, by averaging the linear scaled autocorrelation values by a wide squared filter, the resulting curve will not be distorted. At a same time, square is the filter shape which will maximize signal to noise ratio of the averaged value. The points are at the border of the squared filter. They are weighted by half so that they contribute half to the estimate of , and half to that of . The convolution results in a noise to signal ratio improved by a factor . Thus the squared filter average provides the same noise to signal ratio than the conventional B&M scheme, but the support of the squared filter is narrower than that of the triangle filter. From a practical point of view triangle and square filter provide nearly the same result, and we only used a triangle filter in this paper.
In brief this theory shows that our algorithm is expected to produce the same ACF as the traditional B&M algorithm, just it should be much faster.
Our F2Cor software was written in C++ (Microsoft Visual Studio). The format of the data it manipulates is the photon mode, that is, the individual photon arrival times are recorded since the start of the experiment. We tested it both on simulated data and on real photon records.
3.1 Results on simulated data
In order to systematically test our algorithm, we simulated white Poisson noise set of data for average intensities from 10−4 to 1 photon per time resolution (1kHz to 10MHz for a 100ns resolution system). For count rates lower than 10−3 photon per resolution time, we approximated the Poisson distribution by a Bernouilli distribution. Computation was performed on a modern PC laptop computer, equipped with an Intel i7 Q720 CPU running at 1.60GHz. Figure 2 represents the computation time normalized to the record time for a 100ns resolution system as a function of the B&M start level K. The points which correspond to the experimental result have been fitted globally (MATLAB, Mathworks) according to Eq. (4). The result of the fit is represented by the solid curves. All four curves have been fitted with the same pair of value A = 0.267 and B = 0.760. The fit is in good agreement with the experimental data, which validates the theoretical computation time model. At a given average count rate each curve presents a minimum which corresponds to the optimum B&M start level Kopt, allowing the fastest calculation time. According to Eq. (6), the F2Cor computation speed is then is 0.09µs/photon. For K << Kopt and K >> Kopt the curve is rectilinear with a slope of .
The minima are equidistant along the dash line with a slope of , in line with Eq. (5). Then, knowing the count rate, the computer can easily choose the optimum B&M start level prior calculating any correlation. For a 10MHz average count rate, which is much higher than usual count rates in FCS experiments, the computation time to record time ratio does not excess 1 which means that our dual stage algorithm could to process correlation curves in real time, even at such high count rates. The conventional B&M algorithm is equivalent to setting K=0. At 10MHz count rates, the optimum B&M start level is higher than zero, showing that F2Cor is still significantly faster than the traditional B&M algorithm. Because of data format in our software, the maximum value of the K is 13. At very low count rate, the optimum B&M start level would normally be higher than 13, as seen on the curve corresponding to 10−4 photon/resolution time (1kHz for a 100ns resolution system), the minimum of which cannot be reached. Practically, this is not a limiting factor, as, first usual count rates in FCS are higher than 1kHz, second even if K is not optimal, the calculation time is very small compared to the acquisition time (e.g. the computation time is 6800 faster than the acquisition time for a 1kHz average count rate signal).
In Fig. 3 , we have plot the computation time normalized to the record time for a 100ns time resolution system in three cases. The conventional B&M case is represented by the open circles on the figure. As expected in Eq. (3), the computation time barely depends on the count rate. Still, at the very highest count rate we notice a slight increase of the computation time. In our software, products are operated on 64b variables, and we guess that products are faster if the actual operands do not exceed 32b words. The open triangles represent the computation time for the maximum B&M start level, namely 13. For data format reasons, this is the maximum value allowed by our software. In this case, except for the very lowest count rate, most of the computing time is spend in the SCH stage. As expected in Eq. (2) the computing time is proportional to squared count rate, as the slope of two shows on the figure. In the 100kHz – 300kHz count rate range, both conventional B&M and SCH have comparable performances. Open squares represent the performances we got with F2Cor. As expected, it is faster than both conventional B&M and SCH algorithm. At the limit of extremely low count rate it tends to behave the same as SCH and at very high count rate it tends to behave the same as B&M algorithm. The slope of the curve is one, that is, the computation time is proportional to the count rate, as expected theoretically in Eq. (6). The bold solid curve with black diamonds represents the ratio between the fastest of B&M algorithm and SCH, and F2Cor, which represent the speedup provided by our algorithm compared to both B&M and SCH. As expected, the speedup is the highest in the count rate range where B&M and SCH are comparable in performance. The speedup is higher than 20 in the 100kHz-300kHz range. It is higher than two in the 10kHz-3MHz range. As said before, the computation time of the SCH is underestimated in the low count rate regime because of the maximum K value allowed by our software, and the speedup we measure in the kHz range is underestimated. Anyway, our F2Cor is faster at all frequencies with a maximum value of 22 around 200kHz.
3.2 Results on real photon records
The experimental data were acquired on PIXEL (facility of GIS EUROPIA, University of Rennes 1, France). The measurements were performed on a Leica DMIRE2 SP2 confocal microscope coupled to an IR femtosecond laser source (Maitai, SPECTRA-PHYSICS), the wavelength of which was tuned to 850nm . The laser power was controlled using a half wave-plate mounted in a motorized rotation stage (PR50, Newport Micro-Controle) and a glan laser polarizing cube (10GL08, Newport Micro-Controle). Our microscope objective is a 1.2 NA water immersion x60 Objective (UPLSAPO60XW/1.20/WD:0.28mm, Olympus). The fluorescence was detected in the descanned mode, using the so-called “X1 output” of the Leica SP2 scanning head. The residual IR light was blocked using short-pass filter (FF01-680/SP, SEMROCK). As a photon counting module, we used a SPAD (SPCM-AQR-12-FC, Perkin Elmer). The acquisition card was a time-correlated single photon counting device (SPC150, Becker & Hickl). Data are acquired in the FIFO time-tag mode, that is, each photon is recorded individually with the time from the start of the experiment.
We used a 5-Carboxytetramethylrhodamine solution diluted at 100nM in water (reference dye sampler kit R-14782, Molecular Probes Invitrogen).
Figure 4 represents three normalized autocorrelation curves obtained for K = 0, 9, and 13, and for an average laser power of 13%. The record duration was 30s. The curves superimpose almost perfectly. We check this way that the result provided by the two stages algorithm does not depend on K, except for a very slight noise difference as expected from the theory.
The same measurement was repeated at three excitation power: 3, 13 and 72%. The result is shown Fig. 5 . The increased power results in an improved signal-to-noise ratio. But at a same time photobleaching occurs, which can be seen from the fact that the correlation time decreases for increased values of the excitation power. For low values of the lag-time , the rapid decrease of the autocorrelation is an artifact, due to the after-pulse effect of the photoreceiver. The lower the brightness of the molecules, the more visible is the after pulse effect. That’s the reason why it is more apparent at lower excitation power.
Table 1 shows the B&M start levels and the computation durations, as a function of the excitation power. As expected Kopt decreases with the count rate, and the computation time per photon does not depend on the count rate and is about 0.1µs/photon.
4. Discussion and prospects
By construction, our algorithm is faster than both SCH and B&M. Whal et al.  have proposed a different algorithm, which is as fast as B&M algorithm at high count rate while it is faster at lower count rate. We will now discuss briefly why, in our view, F2Cor is much faster.
Figure 6 represents a comparison of the computation time between Whal et al. algorithm and F2Cor. For F2Cor, the computation time scales linearly with the count rate and is about 0.1 µs/photon. By comparison, in time-tag-to-correlation of Whal et al, the computing time dependency on the count rate is also linear and approximately equals 9.7 µs/ph. Of course the computation time is dependent on the computer, and technical programming details. Whal et al. published their algorithm several years ago, and the computer they used (Atlhon @ 1.5GHz) is much slower than ours. Still, there is an almost two orders of magnitude faster computation speed in favor of F2Cor. It demonstrates a clear improvement of our dual stage algorithm in term of computation time.
The speed difference between these two algorithms lies in the lower stage. Indeed, in contrast to Whal et al. approach, SCH stage does not require any product at all, does not require any comparison, just each photon which arrives leads to an increase by one of , with .
Several FCS systems based on software correlators are available on the market (Confocor3 from Zeiss, TCSPC cards from Becker & Hickl and Picoquand, M9003 counting board and C9413 FCS Unit from Hamamtsu, Alba FCS system from ISS), which means that many labs are already equipped with such devices. All of these systems could potentially benefit from our very fast algorithm. Having such a fast algorithm would certainly facilitate FCS acquisition and analysis.
The performance of our algorithm is most impressive at lower count rate. Still even at high count rate is could be advantageous: In FCS applications, as long as the number of particles in the excitation volume is much higher than one, the signal to noise ratio of the ACF does not depend on the concentration. What determines the upper limit of the concentration is the maximum count rate of the photoreceiver. While currently single photon detector rarely exceed a few MHz, our algorithm could potentially display in real time FCS curves for count rates as high as 10MHz, and one may imagine that in the future detector with higher maximum count rate could become available. Then F2Cor would permit real time measurements at higher concentration.
The possibility to process the data in real time at any practical count rate would be very interesting for biological applications. Living systems, such as living cells, are highly dynamic and evolving systems. The time the experimentalist can spend performing measurements on a sample is limited. The possibility to display a temporal plot, not only of the count rate, but also of very informative parameters such as the molecule number and the diffusion time would be very helpful for deciding how to continue the measurements. It would also help detecting anomalies such as complexes, dusts, mechanical perturbations and help aligning the setup.
For offline analysis, multiple analyzes often need to be done on the raw data. For instance, for measuring the evolution of concentration, the experimentalist has to adjust the part of the record he wants to analyze, and the width of the temporal sliding window. Besides, not all of the raw data necessarily need to be re-analyzed, and the next record to be analyzed depends on the result of the previous one. In this sense the rapid response provided by our F2Cor algorithm would be very useful. Of course increasing the calculation speed using a conventional algorithm is always possible by increasing the power of the computer. But F2Cor has the advantage that it will work fine on any standard PC computer, making very fast ACF calculation accessible to anybody. The software will simply have to be adapted to the various data formats provided by the existing acquisition hardware. The code can be obtained by contacting the author.
There are different directions, this work could continue:
- - While our current algorithm is very fast, it could be even faster. Indeed, in our current implementation, while our computer is a multi-core computer, only one of them is used. In the future, we plan to parallelize our algorithm. The fact that our algorithm is dual stage makes it particularly suitable for parallelization. Indeed, it is interesting to notice that, according to Eq. (4) the computation time of both SCH and B&M stages are exactly the same. Then one core could process the SCH stage while a second one could be used for the B&M stage.
- - At present time, our software correlator can only process autocorrelations, but all principles exposed in this paper would apply to cross-correlation.
- - The tau sampling we have used follows a quasi geometric progression. In some cases, it could result in aliasing, as in the case of circular correlation [21,22] or dual focus FCS . A future work could be to make our current algorithm more flexible in term of tau sampling as in , while maintaining its rapidity.
- - Another development would be the adaptation of F2Cor to Fluorescence Lifetime Correlation Spectroscopy .
We have presented the fastest multiple-tau photon correlation algorithm to date. At 100kHz, the computation time is 100 times shorter than the acquisition time. F2Cor will be helpful both for real time photon correlation calculation, as well as for offline analysis. While this first version is limited to autocorrelation, our dual stage fast correlation algorithm could be further improved and extended to other modalities such as cross-correlation or fluorescence lifetime correlation spectroscopy.
References and links
1. B. J. Berne and R. Pecora, Dynamic Light Scattering (Wiley, 1976).
2. P. Schwille and J. Ries, “Principles and applications of fluorescence correlation spectroscopy (FCS),” in Biophotonics: Spectroscopy, Imaging, Sensing, and Manipulation (Springer, 2011), pp. 63–85.
3. K. Schätzel, “Correlation techniques in dynamic light scattering,” Appl. Phys. B Photophys. Laser Chem. 42(4), 193–213 (1987). [CrossRef]
4. J. S. Eid, J. D. Muller, and E. Gratton, “Data acquisition card for fluctuation correlation spectroscopy allowing full access to the detected photon sequence,” Rev. Sci. Instrum. 71(2), 361–368 (2000). [CrossRef]
6. P. Kask, K. Palo, D. Ullmann, and K. Gall, “Fluorescence-intensity distribution analysis and its application in biomolecular detection technology,” Proc. Natl. Acad. Sci. U.S.A. 96(24), 13756–13761 (1999). [CrossRef]
9. T. A. Laurence, A. N. Kapanidis, X. X. Kong, D. S. Chemla, and S. Weiss, “Photon arrival-time interval distribution (PAID): A novel tool for analyzing molecular interactions,” J. Phys. Chem. B 108(9), 3051–3067 (2004). [CrossRef]
10. T. Dertinger, V. Pacheco, I. von der Hocht, R. Hartmann, I. Gregor, and J. Enderlein, “Two-focus fluorescence correlation spectroscopy: a new tool for accurate and absolute diffusion measurements,” ChemPhysChem 8(3), 433–443 (2007). [CrossRef]
11. P. Schwille, F. J. Meyer-Almes, and R. Rigler, “Dual-color fluorescence cross-correlation spectroscopy for multicomponent diffusional analysis in solution,” Biophys. J. 72(4), 1878–1886 (1997). [CrossRef]
12. R. A. Colyer, G. Scalia, I. Rech, A. Gulinatti, M. Ghioni, S. Cova, S. Weiss, and X. Michalet, “High-throughput FCS using an LCOS spatial light modulator and an 8 × 1 SPAD array,” Biomed. Opt. Express 1(5), 1408–1431 (2010). [CrossRef]
14. D. Frenkel and B. Smit, Understanding Molecular Simulation: From Algorithms to Applications (Academic, 2002), p. 90.
15. J. Ramírez, S. K. Sukumaran, B. Vorselaars, and A. E. Likhtman, “Efficient on the fly calculation of time correlation functions in computer simulations,” J. Chem. Phys. 133(15), 154103 (2010). [CrossRef]
16. D. Magatti and F. Ferri, “25 ns software correlator for photon and fluorescence correlation spectroscopy,” Rev. Sci. Instrum. 74(2), 1135–1144 (2003). [CrossRef]
17. M. Wahl, I. Gregor, M. Patting, and J. Enderlein, “Fast calculation of fluorescence correlation data with asynchronous time-correlated single-photon counting,” Opt. Express 11(26), 3583–3591 (2003). [CrossRef]
19. L. L. Yang, H. Y. Lee, M. K. Wang, X. Y. Lin, K. H. Hsu, Y. R. Chang, W. Fann, and J. D. White, “Real-time data acquisition incorporating high-speed software correlator for single-molecule spectroscopy,” J. Microsc. 234(3), 302–310 (2009). [CrossRef]
20. ALV-5000 Multiple Tau Digital Correlator Reference Manual (ALV gmbh, 1993).