Amplitude modulated continuous wave time-of-flight range cameras suffer from an inherent depth measurement error due to aliasing of the emitted signal vs reference signal correlation function. This is due to higher harmonics present in both signals which are not accounted for in the model or measurements. This “wiggling” error is generally corrected by employing a correction function based on frequency and depth dependent calibration data. This problem is shown to be equivalent to a multi-path interference problem. Casting the problem into the multi-path interference domain and utilizing multiple modulation frequencies provides tools for dealing with the depth error without calibration in a frequency independent way.
© 2015 Optical Society of America
Amplitude modulated continuous wave (AMCW) time-of-flight (ToF) cameras measure optical travel time, or depth, in a very similar way to optical interferometry .
A reference sinusoidal signal (square wave signal in practice) is used to modulate the amplitude of a Led or diffused laser diode at frequencies ranging from 1 MHz to above 100MHz (wavelengths ranging from a few hundred meters to under 3 meters). The light is reflected off objects in the scene and arrives back at the camera. Due to the distance the light has traveled, the received signal is time delayed with respect to the reference signal (see Fig. 1), thus providing depth information. The time delay is measured by cross-correlating the reflected signal against the reference signal by using a lock in sensor based on the homodyne principle. Under the assumption of modulating both the illumination source as well as the sensor using a sinusoidal signal, the problem of recovering the correlation function is simplified as the correlation function is a phase shifted sinusoidal function as well. The recovery of the amplitude and phase shift of this sinusoidal function can be done by using only three measurements, although four measurements are usually employed in practice [2, 3], a technique known as the four bucket method. The four bucket method has advantages over the three bucket approach both due to mathematical simplicity as well as higher resilience to fixed pattern noise and common mode rejection, a common limiting factor with a large number of current ToF sensors.
Due to implementation in digital logic, square waves are used instead of sinusoidal signals for the reference signal and emitted amplitude modulated illumination. However, higher order modes are present in square waves which correspond to higher order modes in the correlation signal. If the number of samples on the correlation signal is less than twice the highest frequency present in the correlation signal then aliasing will occur. In the case of the four bucket method the higher order harmonics alias onto the first harmonic which causes measurement errors [2,4]. This is a non-systematic error known as wiggling error. This error can be seen in Fig. 2 as well as .
The results is that depth is no longer a linear function of phase. Luckily however the function is still monotonic, so a correction term can be employed. Unfortunately the exact emitted waveform as well as the sensor modulation function depend on things such as transmission line impedance and the capacitance of the laser diode and pixel. These factors are not only difficult to make identical between cameras, but they also vary between pixels on the same sensor, as can be seen in Fig. 2 in . Beyond that, they will also change over time and hardware temperature. The current approach is to perform a lookup table (LUT) based correction by calibrating each device based on the expected working frequency. This process requires complex hardware and time . Although there are works that allow for this calibration to be done by the end user such as , this is still a difficult process to perform by the end user, is required for every working frequency and cannot generally be performed in place as a specific target is required (straight walls in this case).
Another thing to note is that due to modifying the depth information based on a specific model, the LUT approach also inhibits other processing using multi-frequency data such as multi-path interference removal and phase unwrapping both major measurement errors in ToF range cameras.
Payne et al.  instead of calibrating the camera modified its operation to phase shift the reference signal during the integration period. This modification cancels the third harmonic, however it decreases the precision of the measurement.
A different hardware based approach was taken by Microsoft with the Kinect for the XBox one which utilizes a three bucket method rather than a four bucket one [9, 10]. This approach aliases the third harmonic down to DC instead of the main correlation signal and thus only sees the effects of the fifth harmonic and up, reducing distortion by a factor of about 3.5 (see Fig. 2). However, as the Kinect already uses three frequencies, our approach can be used to reduce the measurement error even further, an approach that is more viable than calibration in this case.
In this paper we take an alternative approach. We show that the distortion due to the higher harmonics shows up as multi-path interference “ghost” reflections. We then show that multi path handling techniques can be used to mitigate the need for camera calibration, as well as an analysis of the expected error. For our experiments we utilize a previously developed closed form solution for multi-path interference using three frequencies  that can be implemented efficiently in hardware. The resulting residual model error is a function of the ratio of bandwidth to base frequency used. It is important to note though that our approach is not linked to a specific multi-path processing method and can utilize any such method.
One added benefit of our approach is that it increases measurement accuracy, resilience to amplitude inconsistencies between different frequencies, multi-path interference as well as produces improved SNR.
The multi-path interference problem (also know as multiple returns, inter- reflections or mixed pixels) is an active field of research with AMCW ToF cameras [12–18] that deals with a big limitation of AMCW based depth sensing (see Fig. 1). When light is reflected off objects in the scene or due to effects such as sub surface scattering, objects may be illuminated by both direct and indirect illumination. This results with multiple signals traveling different path lengths to illuminate the same pixel. These signals will then interfere with each other. As the sum of two phase shifted sinusoidal signals is again a phase shifted sinusoidal signal, the end results is that the camera cannot differentiate between the multiple paths and only measures a single wrong depth measurement. This can amount to a large error in both depth and amplitude.
Multiple frequencies recently proved to be very useful to reduce multipath interference in tof cameras as well as global/direct separation using optical techniques, using light transport models or a combination thereof. These include such works as Dorrington et al. , Godbaz et al. , Kirmani et al. , Bhandari et al. [11,17], Freedman et al. , Gupta et al.  and O’Toole et al. .
Note that apart for the practical aspect, our work provides an important conclusion as a result of the theoretical understanding of the effects of the digression from the sinusoidal signal modulation model. That is, for the various multi-path techniques to work, they must use uncalibrated data. Modulation model errors will show up this way as additional multi-path interference but otherwise preserve the multi path model. Using calibrated data will distort the signal and break the multi path model assumptions.
The rest of the paper is organized as follows. Section 2 presents the ToF scene probing model based on the sinusoidal signal probing function. Section 3 analysis the error magnitude and behavior due to the erroneous probing model. Section 4 casts the model error as a multi-path formulation. Then section 5 presents the experimental results, and finally we present our conclusions in section 6.
2. AMCW ToF camera model
An amplitude modulated continuous wave (AMCW) time-of-flight (ToF) range camera works based on the homodyning principle. It actively probes the scene using a time periodic amplitude modulated signal (see Fig. 1(a)). The camera then uses a lock in sensor to correlate the received signal against the transmitted signal, and based on that, to measure the phase shift of the received signal. This phase shift can be translated to time, and thus distance measurement.
Generally, the modulated signal is assumed to be a sinusoidal signal, although other modulation functions have been proposed (see for example  and references therein). This probing function p can be described as
Using the fact that the correlation function is a cosine with a single known frequency, the ToF lock-in sensor [1, 2] computes depth by using the so called Four Bucket Principle. Four discrete measurements of the correlation function are taken at each pixel
The four bucket trick facilitates an elegant, albeit non-linear, solution to recover the distance and attenuation coefficient. Let us define δmk,l = m[k] −m[l] (where we omitted the ω subscript on m for readability). The reflection coefficient Γ and the phase ϕω can be estimated using
The distance and amplitude calculations with the four bucket method are computationally efficient minimizing the latency and cost of ToF range cameras.
3. Model induced depth error
As discussed in the previous section, the model used for the ToF camera is that of emitting and receiving a sinusoidal signal, and thus that the correlation function is a phase shifted sinusoidal signal as well.
The hardware uses a square wave as the modulation function for both the illumination and sensor, and appropriately, that is what the camera tries to both emit and correlate against. The resulting correlation function is a triangular wave rather than a sinusoidal one, which in turn results with error in the recovery of both the phase and amplitude.
Fig. 2 shows the resulting recovery error due to the model mismatch for both the four bucket method (solid black line) and three bucket method (dashed blue line). As can be seen, even with the three bucket method that avoids the distortion due to the third harmonic, the error is still significant.
On the practical end, it is hoped that the hardware low passes the signal enough to attenuate the higher harmonics, although in practice the signal is somewhere in between the two extremes. To be clear, neither the sine wave nor the square wave models are correct. To make things more complex, the model changes across frequencies and even between pixels as different harmonics respond differently to frequency changes and variation of impedance across pixels. Per frequency and camera calibration is then performed to build a correction function that is applied to the depth and amplitude measurements. It is important to note that due to hardware tolerance issues for things such as the transmission line impedance and the capacitance of the laser diode and pixel which are hard to control, calibration has to be done per camera and not just per design.
To best understand the effects of the resulting error, we will look at the square wave model (which is mostly true for lower frequencies). Based on Fourier theory, we can model the square wave as a sum of sinusoidal signals. The first 3 components, i.e the 1st, 3rd and 5th suffice to present the behavior. In this case, our reference signal similar to Eq. (1) is (where we ignore the modulation depth value λ for simplicity)
Next, we need to remember that when using the four bucket principle, we are sub-sampling the correlation function at 4 points. This aliases down the higher harmonic elements back into the first harmonic making them indistinguishable in subsequent processing. To understand the resulting signal we need to turn to aliasing theory as shown in Fig. 3. Sub-sampling can be represented mathematically as multiplying the time domain signal by a Dirac comb, in our case a four point one. This in turn translates to convolution in Fourier space with the Fourier domain Dirac comb. As can be seen in Fig. 3(a), this is a Dirac comb with four bin spacing. Fig. 3(b) shows the resulting aliasing. Since we are sampling at four points (four bucket principle) the output is a 4 bin discrete Fourier transform (DFT), marked by the yellow rectangle, with two positive frequencies and one negative frequency. Each bin is an infinite sum of the resulting convolution over the infinite Fourier series. While the positive fifth harmonic aliases down into the positive first harmonic and the negative fifth harmonic aliases down into the negative first harmonic, the positive third harmonic aliases down in the negative first harmonic and the negative third harmonic aliases down into the positive third harmonic. The resulting coefficients as a function of bin number are
As the negative Fourier coefficients of a real function are the complex adjoint of the positive values, we end up with a phase inversion inversion of the 3rd, 7th, 11th harmonics and so on (this is the same effects that causes carriage wheels to seem like the start spinning backwards and then forwards again in movies).
In our case, the third harmonic shows up as a ghost object with a phase of −3ϕ, while the fifth harmonic creates a ghost at 5ϕ, and so on. The effective correlation function becomes
To better understand the interaction of the ghosts with the main reflection, we rewrite the recovered amplitude and angle (of the first harmonic) in the complex phasor notation
Fig. 2b and 2a show the resulting error in amplitude and depth measurements as a function of phase for several correlation function models. Phase and amplitude error magnitude are 90 degrees out of phase where phase error is maximized when the aliased phases are at 90 degree offsets, at which point amplitude error is zero, while amplitude error is maximized when the aliased phases align, at which point phase error is zero.
Note that error in practice will vary as the hardware low passes higher harmonics based on the frequency used.
To better understand the resulting behavior as well as allow for the application of existing tools to the problem, we shall next show that the resulting measurements coincide with multi-path interference artifacts.
4. Model error in the light of multi-path interference
One of the major drawbacks of AMCW sinusoidal signal modulation arises when facing multi-path interference. Such interference arises when illumination from the modulated light source arrives at the same pixel via multiple paths (see Fig. 1). Multi-path interference is highly scene dependent and can be caused by: inter-reflections, subsurface scattering, volumetric scattering, translucent objects and mixed pixels.
The problem lies in the fact that each propagation path returns a sinusoidal signal of the same frequency but with a different phase shift. However, the sum of several such sinusoidal signals of the same frequency with different phases, is just another sinusoidal signal with the same frequency but with a different phase shift and amplitude. In the case of multi-path, we thus measure a single erroneous depth with no ability to know whether there is a problem in our measurement.
One of the main approaches to resolving the multi-path problem is by measuring the response of the scene using multiple modulation frequencies [13–15, 17]. In the single path case, the measured amplitude Γ is independent of the modulation frequency ω, while the measured phase is linearly dependent on ω. In the multi-path case this relation breaks down due to the interference between the paths, and at least in theory, multi-path interference can be diagnosed, although not resolved, based on just two measurements. The effect however can be subtle, and depend on the relative amplitude of the two paths, as well as the frequency response of the electronics.
For the case when K optical paths contribute to the same pixel, the scene response from Eq. (4) becomes
Going back to our model induced error due to deviation from the sinusoidal wave model, let us look at Eq. (19) and (20). We can see that the higher harmonics, after being aliased down to base band, have the same effect as the multi path problem. They appear as ghost objects at corresponding distances, that is, −d/3 for the 3rd harmonic, d/5 for the 5th, etc. This multi path behavior is consistent across frequencies, so each harmonic appears exactly like a multi path reflection would with clean sinusoidal wave probing.
The higher harmonics can however be resolved by probing the scene with multiple frequencies.
5. Experimental results
In theory, any multi path processing technique will work when utilizing our approach for the correction of the model induced depth errors. For our experimental results we chose to work with the Matrix Pencil based method  presented by Feigin et al. . This method is resilient to recovery of only some of the excess paths, and will in fact allow us to recover only the dominant path, or true, measurement, as we are not interested in the “ghosts” in this case. Also, for the three frequency case and single dominant path solution it can be manipulated to provide a simple closed form solution that is easy to implement in hardware.
We limit our experiments to the three frequency case, as this is considered more or less the practical limit on modern hardware under the constraints of interactive frame rates, and thus looking at more frequencies is of lesser interest.
Let us start with a numerical analysis of the maximum and mean error for the worst case scenario. This is the case when using a square wave modulation function. Fig. 4 shows the maximal (black) and average (blue) phase (depth) error as a function of relative bandwidth Δω/ω1. Dashed lines show the maximal and average errors for the computed phase when using a single frequency. As can be seen, when Δω is too small with respect to the baseline frequency, stability issues can unfortunately degrade the results. However, at a ratio as small as 0.64, the average error is already better than single frequency, and at a ratio of 1 also the maximum error is improved. At a ratio of Δω/ω1 = 10 we get a factor of ×15.6 improvement in error with the maximal error going down to 7 milliradians and average error down to 3 milliradians. At Δω/ω1 = 20 error is down by a factor of ×31, with a maximum error down to 3.5 milliradians and average error down to 1.5 milliradians.
As discussed earlier, at least in theory, the calibration method (state of the art) can achieve a zero error with a single frequency in the noiseless case. In practice however, the error will depend on the accuracy of the calibration, correction function used to interpolate the exact error, as well as deviation in working parameters, component tolerance of the camera and inter-pixel variations. As these values are hardware and manufacturer (calibration processes) dependent, it is difficult to create an encompassing comparison. Within those limits, we will next explore some comparisons to state of the art.
One thing to note is that while our results are expected to improve with a higher bandwidth, the other values are not expected to change.
For the first experiment we used a custom camera to provide ground truth. This camera is built around the PMD ToF sensor. This camera is able to sample the correlation function at arbitrary offsets. For our experiments, we measured the correlation function at 24 points (as opposed to the standard 4 point approach) which avoids artifacts due to the higher harmonics up to the 11th harmonic. We next computed the phase (distance) average over 10 modulation frequencies (between 5Mhz and 50MHz) to reduce noise induced errors.
Fig. 5 shows the experimental results for real data. Our scene is composed of several surfaces at different depths. Fig. 5(a) and 5(b) show the input amplitude and phase images for the base frequency. In Fig. 5(c) and 5(d) we see the phase recovery error for the 4 bucket case and for our methods when using the matrix pencil method with three frequencies respectively. Recovery was performed based on modulation frequencies of 5MHz, 25MHz and 45MHz, giving a relative bandwidth factor of Δω/Ω1 = 4.
From the images we can see that our method provides both a lower error as well as much lower noise than the standard four bucket method. This can be seen as well in Fig. 6. Fig. 6(a) shows the mean error over each of the six patches depicted in 5(b) for our method, the base frequency (5MHz), the best case frequency over the three frequencies used as well as the average mean error over the three frequencies. Fig. 6(b) shows the matching standard deviation values for these measurements over each of the patches. What can be seen from these graphs is that our method is always better than the worst case scenario and is usually not far from the best case scenario. In most cases, one of the probing frequencies does provide better results than our method as the measurement falls close to the ideal point on the phase plot (see Fig. 2), however we have no way of knowing in real life, which frequency that is as we do not have ground truth to compare to and the ideal case changes with object distance. The extreme cases are patch 5, where two out of the three frequencies are close to an optimal point, and on the other end, patch 6 where we actually produce better results than all three frequencies. One should note that in all case, our method is better than taking the average over the three frequencies, usually significantly. Using the standard deviation values as a measure of noise in the image, as well as the resulting images in Fig. 5(c) and 5(d), we see that our method also produces significantly less noise than average across the three frequencies, also indicating that it is highly resilient to noise in the input.
To compare to the LUT approach, we used the MESA Swissranger SR4050 with the scientific lock-in module (SLIM) which allows taking measurements at arbitrary frequencies using an external clock in the range of 0.5MHz and 30MHz. The internal LUT in the camera was factory calibrated for 20MHz.
Fig. 7 shows the cross section of the depth recovery of a scene consisting of two straight walls at a 90 degree angle (in polar coordinates). In Fig. 7(a) we see the recovery at several frequencies based on the internal LUT conversion. While the measurements at 15MHz and 30MHz are mostly consistent with each other as well as our recovery (black), as they are close to the calibration point, we can still see a disagreement in the pixel range between 70 and 140. At 10MHz results start to degrade significantly where the LUT mismatch start becoming significant, and error becomes quite big going down to 3MHz.
In Fig. 7(b) we can see the raw data used for the recovery before denoising and LUT correction, compared again to the reconstruction. Note that the disagreement on depth is due to correction for the cable length connecting the illumination to the camera. It is immediately apparent that our method is very resilient to noisy input.
Fig. 8 presents reconstruction results in light of multi path. Fig. 8(a) shows a cross section of the reconstruction comparing the no-multi path measurement using LUT correction (red), the same with multi-path present (blue) as well as our three frequency reconstruction in the presence of multi path. Seeing that there is strong multi path both due to model error as well as actual multi path, it is difficult for the algorithm to pick out the full recovery, although the results are still better than the single frequency case using LUT correction and are stable and less noisy despite the strong multi path as well as high noise in the input.
This paper presents an analysis of phase and amplitude measurement errors in ToF cameras due to modulation model mismatches. By showing that these errors appear as a multi-path interference induced error, we allow the use of multi path techniques to mitigate the problem as an alternative to a calibration based approach. This allows to avoid costly and time consuming per camera calibration which also interferes with other processing techniques such as multi path interference removal and phase unwrapping.
An important conclusion of the analysis is that all multi-path interference processing approaches should use raw depth data and account for the additional multi-path interference due to the higher harmonics in the modulation function. Otherwise, calibration correction will introduce distortion into the depth data, especially when dealing with multi-path interference.
References and links
1. S. Foix, G. Alenya, and C. Torras, “Lock-in time-of-flight (tof) cameras: A survey,” IEEE Sensors 11, 1917–1926 (2011). [CrossRef]
2. R. Lange and P. Seitz, “Solid-state time-of-flight range camera,” IEEE J. Quantum Electron. 37, 390–397 (2001). [CrossRef]
3. R. Schwarte, Z. Xu, H.G. Heinol, J. Olk, R. Klein, B. Buxbaum, H. Fischer, and J. Schulte, “New electro-optical mixing and correlating sensor: facilities and applications of the photonic mixer device (pmd),” Proc. SPIE 3100, 245–253 (1997). [CrossRef]
4. “Introduction to the time-of-flight (tof) system design,” User’s Guide SBAU219D, Texas Instruments (2014).
5. A. P. Jongenelen, D. G. Bailey, A. D. Payne, A. A. Dorrington, and D. A. Carnegie, “Analysis of errors in ToF range imaging with dual-frequency modulation,” IEEE Trans. Instrum. Meas. 60, 1861–1868 (2011). [CrossRef]
6. M. Lindner and A. Kolb, “Lateral and depth calibration of pmd-distance sensors,” in “Advances in Visual Computing,” 4292 of Lecture Notes in Computer Science (SpringerBerlin Heidelberg, 2006), pp. 524–533.
7. A. Belhedi, A. Bartoli, V. Gay-bellile, S. Bourgeois, P. Sayd, and K. Hamrouni, “Depth correction for depth camera from planarity,” in “Proceedings of the British Machine Vision Conference,” (BMVA Press, 2012), pp. 43.1–43.10.
8. A. D. Payne, A. A. Dorrington, M. J. Cree, and D. A. Carnegie, “Improved measurement linearity and precision for amcw time-of-flight range imaging cameras,” Appl. Opt. 49, 4392–4403 (2010). [CrossRef] [PubMed]
9. C. S. Bamji, P. O’Connor, T. Elkhatib, S. Mehta, B. Thompson, L. A. Prather, D. Snow, O. C. Akkaya, A. Daniel, A. D. Payne, T. Perry, M. Fenton, and V. H. Chan, “A 0.13 μm cmos system-on-chip for a 512 × 424 time-of-flight image sensor with multi-frequency photo-demodulation up to 130 mhz and 2 gs/s adc,” IEEE J. Solid-State Circuits 50, 303–319 (2015). [CrossRef]
10. J. Blake, F. Echtler, and C. Kerl, “libfreenect2 project,” https://github.com/OpenKinect/libfreenect2.
11. M. Feigin, A. Bhandari, S. Izadi, C. Rhemann, M. Schmidt, and R. Raskar, “Resolving multipath interference in kinect: An inverse problem approach,” IEEE Sensors J. PP(99), 1–8 (2015).
12. A. A. Dorrington, M. J. Cree, A. D. Payne, R. M. Conroy, and D. A. Carnegie, “Achieving sub-millimetre precision with a solid-state full-field heterodyning range imaging camera,” Meas. Sci. Technol. 18, 2809 (2007). [CrossRef]
13. S. Fuchs, “Multipath interference compensation in time-of-flight camera images,” in 20th International Conference on Pattern Recognition (ICPR, 2010), pp. 3583–3586.
14. A. A. Dorrington, J. P. Godbaz, M. J. Cree, A. D. Payne, and L. V. Streeter, “Separating true range measurements from multi-path and scattering interference in commercial range cameras,” Proc. SPIE 7864, 786404 (2011).
15. D. Jiménez, D. Pizarro, M. Mazo, and S. E. Palazuelos, “Modeling and correction of multipath interference in time of flight cameras,” Image and Vision Computing 32, 1–13 (2014). [CrossRef]
16. A. Kadambi, R. Whyte, A. Bhandari, L. Streeter, C. Barsi, A. Dorrington, and R. Raskar, “Coded time of flight cameras: sparse deconvolution to address multipath interference and recover time profiles,” in ACM Transactions on Graphics (SIGGRAPH Asia, 2013), pp. 1-10. [CrossRef]
17. A. Bhandari, A. Kadambi, R. Whyte, C. Barsi, M. Feigin, A. Dorrington, and R. Raskar, “Resolving multipath interference in time-of-flight imaging via modulation frequency diversity and sparse regularization,” Opt. Lett. 39, 1705–1708 (2014). [CrossRef] [PubMed]
18. D. Freedman, E. Krupka, Y. Smolin, I. Leichter, and M. Schmidt, “SRA: fast removal of general multipath for tof sensors,” arXiv:1403.5919 (2014).
19. J. P. Godbaz, M. J. Cree, and A. A. Dorrington, “Closed-form inverses for the mixed pixel/multipath interference problem in amcw lidar,” Proc. SPIE 8296, 829618 (2012). [CrossRef]
20. A. Kirmani, A. Benedetti, and P. Chou, “SPUMIC: Simultaneous phase unwrapping and multipath interference cancellation in time-of-flight cameras using spectral methods,” in “IEEE International Conference on Multimedia and Expo (ICME, 2013), pp. 1–6.
21. M. Gupta, S. K. Nayar, M. Hullin, and J. Martin, “Phasor imaging: a generalization of correlation-based time-of-flight imaging,” ACM Trans. Graphics [in press] (2015).
22. M. O’Toole and K. N. Kutulakos, “Visualizing light transport phenomena with a primal-dual coding video camera,” in “ACM SIGGRAPH 2014 Emerging Technologies,” (ACM, 2014), p. 26.
23. H. Yingbo and T. K. Sarkar, “Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise,” IEEE Trans. Acoust., Speech, Signal Process. 38(6), 814–824 (1990).