## Abstract

A novel dual-frequency pattern is developed which combines a high-frequency sinusoid component with a unit-frequency sinusoid component, where the high-frequency component is used to generate robust phase information, and the unit-frequency component is used to reduce phase unwrapping ambiguities. With our proposed pattern scheme, phase unwrapping can overcome the major shortcomings of conventional spatial phase unwrapping: phase jumping and discontinuities. Compared with conventional temporal phase unwrapped approaches, the proposed pattern scheme can achieve higher quality phase data using a less number of patterns. To process data in real time, we also propose and develop look-up table based fast and accurate algorithms for phase generation and 3-D reconstruction. Those fast algorithms can be applied to our pattern scheme as well as traditional phase measuring profilometry. For a 640×480 video stream, we can generate phase data at 1063.8 frames per second and full 3-D coordinate point clouds at 228.3 frames per second. These achievements are 25 and 10 times faster than previously reported studies.

©2010 Optical Society of America

## 1. Introduction

Structured Light Illumination (SLI) [1,2,3] is an active three dimensional (3-D) shape measurement technique widely used for scientific and industrial purposes [4]. Various algorithms have been proposed to achieve high-accuracy and/or high-speed measurement [5, 6].

Higher frequency patterns have been studied and employed to improve measurement accuracy at the extra computational cost of phase unwrapping. Most unwrapping phase strategies can be fallen in two categories [9]: spatial [10, 5, 11, 12] and temporal [7, 13, 14].

Spatial phase unwrapping approaches assume a continuity of scanned surfaces and likely fail in regions with depth discontinuities (step edges) [9]. Temporal methods, alternatively, use the intermediate phase values typically generated by projecting several additional reference patterns and, therefore, are not efficient for real-time operation. Kim *et al*. [15] proposed an algorithm for high-frequency SLI that avoided direct phase unwrapping with 4*N* ≥ 8 patterns. Li *et al*. [16] proposed a two-frequency grating that integrated both high-frequency and unit-frequency into each pattern with 2*N* ≥ 6 patterns, where the high frequency had to be equal to *N*. Su and Liu [17] treated the algorithm of Li *et al*. as one of two bases in their one-shot pattern strategy.

In this paper, we propose a novel dual-frequency pattern scheme that combines, into a single pattern, a high-frequency SLI pattern with a unit-frequency pattern where the high-frequency component generates wrapped phase, resilient to sensor noise, while the unit-frequency component enables phase unwraping. Differing from Li *et al*.’s two-frequency grating [16], the minimum number of our proposed patterns is 5 for any high frequencies. In Li *et al*.’s method, the minimum number of patterns is 6, and the high frequency term is fixed and related to the number of patterns. Our proposed pattern is defined by a single equation while Li *et al*. employed two equations.

With regard to high-speed or real-time 3-D measurement, researchers have proposed one-shot [2,18,19,20] methods that reconstruct depth from a single, continuously projected pattern, while others have proposed employing a high-speed camera and projector pair. One-shot SLI techniques are attractive for their insensitivity to object motion, where pattern strategies are based on spatial neighborhood coding, but suffering from poor resolution and high computational cost [2]. Therefore, researhers have attempted to achieve the high resolution with the resilience to motion by driving the camera/projector pair at such high frame rates as to minimize impact of object motion. A study of particular importance has been led by Zhang *et al*. [8, 21] who employed a new algorithm for phase generation instead of the time-consuming arctangent function [22] and also took the advantage of the computation capability of GPU to generate the absolute 3-D coordinates of scanned object [6]. For the image in size of 532×500, their new algorithm was 3.4 times faster than using the arctangent function while the frame rate of 3-D reconstruction was sped up from 6 fps to 25.56 fps, by means of the GPU.

To achieve real-time 3-D video acquisition and reconstruction, this paper also introduces a lossless lookup table (LUT) method for phase, intensity/texture, and depth data that works in combination with a high-speed projector/camera pair widely used in phase measurement profilometry (PMP) [23]. LUT-based methods have been known for the low computational cost and employed in many applications [24]. We develop the novel method that extracts phase and intensity/texture information from the incoming PMP video by reducing the computational complexity of the arctangent and square-root functions. Furthermore, since no LUT-based method has been introduced for inverting the camera’s calibration matrix to produce 3-D point clouds in real time, we introduce such a method that can also be applied to other triangulation-based 3-D reconstruction techniques.

As will be demonstrated in this paper, our experimental system achieves a processing rate of 714.29 fps to generate phase and texture video and 228.83 fps to produce 3-D point clouds using just one core of an Intel Core 2 Duo Quad Q9650 processor running at 3.0 GHz. No additional cores or GPUs [6] are needed, but using such resources would, of course, increase our frame rate if necessary. For the proposed novel dual-frequency pattern scheme, where the number of patterns is set to 6, as shown in Sec. 4, our fast LUT-based algorithms can be also applied.

## 2. Phase measuring profilometry

PMP is a popular method of SLI due to its high reliability and high accuracy [25]. In practice, either vertical or horizontal sinusoid patterns are projected onto a target object so that the vertical or horizontal correspondence information, between the camera and the projector, can be directly derived from the computed phase data. The PMP patterns are described as:

where (*x ^{p},y^{p}*) is the coordinates of a pixel in the projector,

*I*is the intensity of that pixel,

^{p}_{n}*A*and

^{p}*B*are some constants,

^{p}*f*is the frequency of the sine wave,

*n*represents the phase-shift index, and

*N*is the total number of phase shift. Figure (1) shows a group of sine wave patterns with

*N*= 3,

*f*= 1,

*A*= 127.5 and

^{p}*B*= 127.5 for 8-bits color depth projector are projected.

^{p}For reconstruction, a camera captures each image where the sine wave pattern is distorted by the scanned surface topology, resulting in the patterned images expressed as:

where (*x ^{c},y^{c}*) is the coordinates of a pixel in the camera while

*I*(

^{c}_{n}*x*) is the intensity of that pixel. To simplify the notation, the coordinates index both in the camera and projector will be removed from our equation henceforth. The term

^{c},y^{c}*A*is the averaged pixel intensity across the pattern set, which can be derived according to:

^{c}such that the image *A ^{c}* is equal to an intensity or texture photograph of the scene. Correspondingly, the term

*B*is the intensity modulation of a given pixel and is derived from

^{c}*I*as:

^{c}_{n}such that *B ^{c}* can be thought of as the amplitude of the sinusoid reflecting off of a point on the target surface. If

*I*is constant or less affected by the projected sinusoid patterns,

^{c}_{n}*B*will be close to zero. Thus

^{c}*B*is employed as a shadow noise detector/filter [25] such that the shadow-noised regions, with small

^{c}*B*values, are discarded from further processing. Figure 1 shows, an example scene with a background that includes a fluorescent ceiling light, which over saturates the CMOS cameras pixel and, thereby, erases any signal from the SLI projector. In Fig. 1,

^{c}*A*looks like a standard video frame absent any indication of the projected pattern sequence

^{c}*I*. In contrast,

^{p}_{n}*B*, shown in Fig. 1, looks very similar to

^{c}*A*except that it only shows texture in those areas of the scene that significantly reflect the projected sequence

^{c}*I*. Given the significance of

^{p}_{n}*B*as an indicator of the projected signal strength, the binarized image in Fig. 1 shows only those pixels greater in magnitude to a user defined threshold. It is these pixels that will ultimately be used to reconstruct our 3-D surface with ignored pixels being considered too noisy as to relay an reliable depth information.

^{c}Of the reliable pixels with sufficiently large *B ^{c}*,

*ϕ*represents the phase value of the captured sinusoid pattern derived as:

In Fig. 1, we show the resulting phase image corresponding to the scene from sine wave patterns. For reference, Fig. 1 also shows the phase values for all pixels, including those considered unreliable according to *B ^{c}*.

Having derived the phase image *ϕ*, we likewise have derived a unique correspondence value *y ^{p}* for every camera pixel (

*x*) through the linear equation

^{c},y^{c}*ϕ*(

*x*) = 2

^{c},y^{c}*πy*. The 3-D world coordinates of the scanned object can, therefore, be derived through triangulation with the projector [7] where, under the assumption that the camera and projector are accurately modeled using the pinhole lens model such that their perspective matrices,

^{p}*M*and

^{c}*M*, are given by:

^{p}The 3-D world coordinates *X ^{w}, Y^{w}*, and

*Z*are defined according to [7] as

^{w}It is this matrix inversion as well as the actangent computation in Eq. (5) that will later prove to be the bottleneck preventing real-time surface reconstruction as will be discussed in Sec. 4.3.

## 3. Dual-frequency pattern strategy

For the purpose of minimizing the effects of sensor noise by means of high-frequency PMP while minimizing the computational expense of performing phase unwrapping, we propose a dual-frequency pattern set defined as:

where *I ^{p}_{n}* is the intensity of a pixel in the projector,

*A*,

^{p}*B*

^{p}_{1}and

*B*

^{p}_{2}are some constants to make value of

*I*between 0 and 255 for a 8-bit color depth projector,

^{p}_{n}*f*is the high frequency of the sine wave,

_{h}*f*is the unit frequency of the sine wave and equals to 1,

_{u}*n*represents the phase-shift index, and

*N*is the total number of phase shift and is larger or equal to 5. Figure 2 illustrates one proposed pattern along with its profile for

*N*= 5,

*n*= 0,

*f*= 16,

_{h}*A*= 127.5,

^{p}*B*

^{p}_{1}= 102, and

*B*

^{p}_{2}= 25.5 for 8-bits per pixel grayscale intensity.

Using this novel pattern set, Eq. (2) becomes:

where *I ^{c}_{n}* is the intensity of a pixel in the camera. The term

*A*is still the averaged pixel intensity across the pattern set, which can be derived by Eq. 3 such that the image

^{c}*A*is equal to an intensity or texture photograph of the scene. Correspondingly, the term

^{c}*B*

^{c}_{1}is the intensity modulation of a given pixel corresponding to

*ϕ*and is derived from

_{h}*I*as:

^{c}_{n}with *m* = 1 such that *B ^{c}*

_{1}can be thought of as the amplitude of the sinusoid reflecting off of a point on the target surface.

Similar to traditional PMP, if *I ^{c}_{n}* is constant or less affected by the projected sinusoid patterns, then

*B*

^{c}_{1}will be close to zero, indicating that

*B*

^{c}_{1}is very sensitive to noise in either the camera, projector, and/or ambient light. Thus

*B*

^{c}_{1}is employed as an indicator of the signal to noise ratio such that excessively noisy regions, with small

*B*

^{c}_{1}values, are discarded from further processing [25].

*B*

^{c}_{2}is derived from

*I*by Eq. 10 with

^{c}_{n}*m*= 2 and it is the intensity modulation corresponding to

*ϕ*. Of the reliable pixels with sufficiently large

_{u}*B*

^{c}_{1}, the phase-pair, (

*ϕ*), is derived as:

_{h}, ϕ_{u}where *ϕ _{h}* represents the wrapped phase value of the captured pattern and

*ϕ*represents the base phase used to unwrap

_{u}*ϕ*.

_{h}## 4. LUT-based processing

LUT-based programming is a well known and widely used technique [24] for minimizing processing latency. In this section, we implement LUT-based processing for deriving the modulation, *B ^{c}*, and phase,

*ϕ*, terms for traditional PMP that is suitable for all triangulation-based 3-D measurement, including our proposed dual-frequency pattern set. The proposed LUT-based processing takes advantage of the need by real-time systems to use as few patterns as possible by using the 8-bits per pixel, of the captured pattern set, as the indices into the LUT. By having LUTs that account for every possible combination of captured pixel value over the pattern set while storing double-precision results, the proposed scheme is completely lossless compared to traditional processing.

#### 4.1. Modulation

As a first step in reducing the computational complexity associated with Eqs. (4), (5), and (7), we simplify the determination of *B ^{c}* by rewriting Eq. (4) as:

for *N* = 3,

for *N* = 4, or

for *N* = 6. Noting that we need only solve these equations for 8-bits per pixel *I ^{c}_{n}* images, we can implement a modulation look-up table, MLUT, that, for

*N*= 3, is defined according to

where integer indices *V* and *U* are derived from

for *N* = 4, is defined according to

where integer indices *V* and *U* are derived from

or for *N* = 6, is defined according to

where integer indices *V* and *U* are derived from

The double-precision results of Eq. (15), (17) or (19) are stored in the MLUT. In contrast to Eq. (4), MLUT reduces the run-time computation cost of modulation without losing accuracy, where the size of the LUT is determined by the number of bits per pixel of the camera sensor and the number of patterns being projected.

For our proposed dual-frequency pattern scheme, the minimum number of patterns is 5, but to take advantage of our LUT-based processing, we set the number of patterns to 6 where the modulation terms *B ^{c}*

_{1}and

*B*

^{c}_{2}, in Eq. (9), are described as:

and

In practice, we need only calculate *B ^{c}*

_{1}as a shadow filter and for representing object texture. The MULT for

*B*

^{c}_{1}is the same as Eq. (19).

#### 4.2. Phase

As our second step to producing real-time 3-D video with texture, we intend to minimize the computational complexity associated with generating phase video where the arctangent function has long been considered a significant obstacle for fringe pattern analysis [26]. Previous approaches to this problem include Huang *et al*.’s cross ratio algorithm [22] and Guo *et al*.’s approximation algorithm [26]. However, these methods reduce computational cost at the expense of accuracy and are not faster than our LUT-based algorithms.

As we did with Eq. (4) for *B ^{c}*, we simplify Eq. (5) according to the number of patterns

*N*such that

for *N* = 3,

for *N* = 4, or

for *N* = 6. Again based on the fact that the intensity values of grabbed images are range-limited integers, we can implement these calculations through a phase LUT (PLUT), for *N* = 3, defined according to

where *U* and *V* are defined in Eq. (16); for *N* = 4, defined according to

where *U* and *V* are defined in Eq. (18); or for *N* = 6, defined according to

where *U* and *V* are defined in Eq. (20). The double-precision results are stored in the PLUT. Thus, the time-consuming arctangent operation is pre-performed, and phase values are obtained by accessing the pre-computed PLUT whose size is, again, determined by the number of bits per pixel of the sensor as well as the number of patterns projected with no loss in accuracy. Compared to Eqs. (5), PLUT avoids computing arctangent function at run-time such that the computational cost of phase is greatly reduced without introducing distortion.

For our proposed dual-frequency pattern scheme, the more accurate wrapped phase *ϕ _{h}* and the coarse unit frequency phase

*ϕ*pair in the Eq. (8) is rewritten as

_{u}The PLUT for *ϕ _{h}* is the same as the Eq. (28), and the PLUT for

*ϕ*is defined as

_{u}where *V* and *U* are derived from

Once *ϕ _{h}* and

*ϕ*are obtained, phase unwrapping can be also achieved in real-time.

_{u}#### 4.3. 3-D point cloud

Having obtained both the phase and modulation images by means of LUTs, we begin our derivation of a LUT for the purpose of implementing Eq. (7). After matrix inverse and matrix multiplication operations, Eq (7) can be expanded into direct algebraic forms as

$${Y}^{w}={M}_{y}({x}^{c},{y}^{c})+{N}_{y}({x}^{c},{y}^{c})T$$

$${Z}^{w}={M}_{z}({x}^{c},{y}^{c})+{N}_{z}({x}^{c},{y}^{c})T$$

where *T* = (*C*(*x ^{c},y^{c}*)

*y*+ 1)

^{p}^{-1}and

*M*and

_{x}, M_{y}, M_{z}, N_{x}, N_{y}*N*are defined in the Appendix.

_{z}We then note that, based on the mapping of world coordinates to the camera plane [7] by *M ^{c}*, if

*Z*is calculated according to

^{w}then *X ^{w}* and

*Y*can be computed as

^{w}where

$${F}_{x}({x}^{c},{y}^{c})=\frac{\left({m}_{22}^{c}{m}_{34}^{c}-{m}_{24}^{c}{m}_{32}^{c}\right){x}^{c}+\left({m}_{14}^{c}{m}_{32}^{c}-{m}_{12}^{c}{m}_{34}^{c}\right){y}^{c}+\left({m}_{12}^{c}{m}_{24}^{c}-{m}_{14}^{c}{m}_{22}^{c}\right)}{\left({m}_{21}^{c}{m}_{32}^{c}-{m}_{22}^{c}{m}_{31}^{c}\right){x}^{c}+\left({m}_{12}^{c}{m}_{31}^{c}-{m}_{11}^{c}{m}_{32}^{c}\right){y}^{c}+\left({m}_{11}^{c}{m}_{22}^{c}-{m}_{12}^{c}{m}_{21}^{c}\right)},$$

$${E}_{y}({x}^{c},{y}^{c})=\frac{\left({m}_{23}^{c}{m}_{31}^{c}-{m}_{21}^{c}{m}_{33}^{c}\right){x}^{c}+\left({m}_{11}^{c}{m}_{33}^{c}-{m}_{13}^{c}{m}_{31}^{c}\right){y}^{c}+\left({m}_{13}^{c}{m}_{21}^{c}-{m}_{11}^{c}{m}_{23}^{c}\right)}{\left({m}_{21}^{c}{m}_{32}^{c}-{m}_{22}^{c}{m}_{31}^{c}\right){x}^{c}+\left({m}_{12}^{c}{m}_{31}^{c}-{m}_{11}^{c}{m}_{32}^{c}\right){y}^{c}+\left({m}_{11}^{c}{m}_{22}^{c}-{m}_{12}^{c}{m}_{21}^{c}\right)},\mathrm{and}$$

$${F}_{y}({x}^{c},{y}^{c})=\frac{\left({m}_{21}^{c}{m}_{32}^{c}-{m}_{22}^{c}{m}_{31}^{c}\right){x}^{c}+\left({m}_{12}^{c}{m}_{31}^{c}-{m}_{11}^{c}{m}_{32}^{c}\right){y}^{c}+\left({m}_{11}^{c}{m}_{22}^{c}-{m}_{12}^{c}{m}_{21}^{c}\right)}{\left({m}_{21}^{c}{m}_{32}^{c}-{m}_{22}^{c}{m}_{31}^{c}\right){x}^{c}+\left({m}_{12}^{c}{m}_{31}^{c}-{m}_{11}^{c}{m}_{32}^{c}\right){y}^{c}+\left({m}_{11}^{c}{m}_{22}^{c}-{m}_{12}^{c}{m}_{21}^{c}\right)}.$$

Implementing the 7 parameters *M _{z}, N_{z}, C, E_{x}, E_{y}, F_{x}*, and

*F*by means of table look-up for indices (

_{y}*x*) (camera column and row indices), reduces the total computational complexity associated with deriving the 3-D point cloud from the phase term to 7 look-ups, 4 additions, 3 multiplications, and 1 division, which is significantly less than what is involved in performing matrix inversion and matrix multiplication, as required by Eq. (7). It should be noted that the method presented in Eqs. (32) and (33) can be applied to all triangulation-based, 3-D coordinate, reconstruction techniques including stereo vision, 3-D from video, time-of-flight, and so on.

^{c},y^{c}## 5. Experiments

In order to test the performance of the proposed pattern scheme and LUT-based processing, we programmed the experimental system of Fig. 3 using Microsoft Visual Studio 2005 with managed C++. The imaging sensor is an 8-bits per pixel, monochrome, Prosilica GC640M, gigabit ethernet camera with a frame rate of 120 fps and 640×480 pixel resolution. The projector is composed of a Texas Instrument’s Discovery 1100 board with ALP-1 controller and LED-OM with 225 ANSI lumens. The projector has a maximum frame rate of 150 fps at a resolution of 1024×768 and 8-bits per pixel grayscale. The camera and projector are synchronized by an external triggering circuit. As our processing unit, we used a Dell Optiplex 960 with an Intel Core 2 Duo Quad Q9650 processor running at 3.0 GHz.

Our experiments include: (1) scanning a static scene with the proposed dual-frequency pattern scheme employing 5 patterns without using LUT-based processing; (2) scanning static scene with the traditional PMP with unit frequency for *N* = 3, 4 and 6 involving LUT-based processing; (3) scanning a moving hand with dual-frequency patterns for *N* = 6 by means of LUT-based processing; and (4) scanning a moving white table tennis ball and reconstructing it with a known sphere model.

In the first experiment, the image of Fig. 4 (top-left) shows the scanned static objects. The projected patterns’ parameters, in the Eq. (8), are *N* = 5, *f _{h}* = 6,

*f*= 1,

_{u}*A*= 155,

^{p}*B*

^{p}_{1}= 80 and

*B*

^{p}_{2}= 20. The minimum value in each pattern is 55 in order to prevent underflow of captured images using our camera. The visualizations of the extracted parameters

*I*

^{c}_{0},

*A*,

^{c}*B*

^{c}_{1}and

*B*

^{c}_{2}are shown in Fig. 4 (top center), (top right), (bottom left), and (bottom right). The image in Fig. 4 (bottom center) shows the binarized

*B*

^{c}_{1}, to be used as a shadow noise filter for phase generation, using a threshold of 10.

The visualizations of Eq. (11) are illustrated in Fig. 5 (top-left) and (top-center) where shadow noise was filtered by the binarized *B ^{c}*

_{1}while Fig. 5 (bottom-left) and (bottom-center) show the 250th column curve crossing the angel. In comparison, the unit phase term is noisy while the wrapped phase has high quality but needs to be unwrapped. Because we obtain the wrapped phase and the base phase simultaneously, the unwrapping algorithm for multi-frequency PMP can be performed to generate the final, high-quality and non-ambiguous phase. No neighborhood technique need be employed. The final unwrapped phase image without shadow noise is shown in Fig. 5 (top-right) and the 250th column curve crossing the angel is shown in Fig. 5 (bottom-right). Now once the unwrapped phase is obtained, 3-D world coordinates of the scanned object can be computed by triangulation between the camera and projector. Figure 6 shows the reconstructed 3-D point clouds with texture (top) and depth rendering (bottom) in front (left), side (center), and top view (right).

*A*acts as the texture for the 3-D point clouds in this experiment.

^{c}For our second experiment, a static object was scanned using traditional PMP with unit frequency with *N* = 3, 4, and 6 for the purpose of measuring the speed at which our LUT-based algorithms perform without phase unwrapping. Although the processor has four cores, our reported processing rates for each LUT algorithm are based on using just a one core. And although *B ^{c}* could be used as a shadow noise filter, thus reducing the number of pixels that would need to be processed while deriving

*ϕ*, we computed a phase value for the full set of 640×480 pixels such that our observed processing times represent an upper limit. Under these conditions, the average computation time over 1,000 runs for modulation

*B*and phase

^{c}*ϕ*was then reported in Table 1.

*B*acts as texture in this experiment.

^{c}In an overview of Table 1 for the total processing time including texture generation, phase generation, and 3-D reconstruction, our LUT-based computations are 173.31 fps for *N* = 3, 164.20 fps for *N* = 4, and 139.28 for *N* = 6 versus, using traditional PMP, 15.16 fps for *N* = 3, 15.38 fps for *N* = 4, and 14.55 fps for *N* = 6, which means our LUT-based algorithms is 10× faster than traditional PMP algorithms. Using the general modulation equation, Eqs. (4), and the general phase equation, (5), the processing time is increased with larger *N*, because larger *N* means more sin(·), cos(·) and summation computations. The processing time and rates are listed in Table 1.

With the simplified modulation equations of Eq. (12), (13), and (14), the processing time decreases significantly for *N* = 4 (2.34 *ms*) compared with *N* = 3 (6.72 *ms*) because, before performing square-root computation, there are only 2 subtractions, 1 addition, and 2 square computations for *N* = 4. There are 3 subtractions, 1 addition, 2 multiplications, and 2 square computations for *N* = 3. Using the simplified phase equation, Eq. (23), (24), and (25), the processing time is almost the same for different *N* because, although the processing time for the basic computations, such as addition and subtraction, is still varied for different *N*. Such basic processing time is negligible compared to the processing time for the processing hungry arctangent function.

Furthermore, it is found from Table 1 that the performance of MLUT and PLUT is the same for different *N* because the computations for the integer indices, *U* and *V*, are the same for both MLUT and PLUT with the same *N* as is the time for accessing MLUT and PLUT. However, the processing time for MLUT/PLUT is increased with increasing *N* because the time for accessing the image buffer, *I ^{c}_{n}*, increases; therefore, accessing this buffer is the predominant factor in this case. In practice when we want to access MLUT and PLUT, the computations of

*U*and

*V*need only be performed once. So Table 1 also shows the processing time of the row marked as “MLUT and PLUT, combined” is less than the summation time of the row marked as “MLUT: Eq. (15), (17) and then (19)” and the row marked as “PLUT: Eq. (26), (27) and then (28)” in which their

*U*and

*V*were computed respectively.

For testing 3-D reconstruction using a single processing core, we implemented reconstruction by means of matrix-inversion (Eq. (7)) versus our LUT-based implementation of Sec. 4.3. With our 3-D LUT algorithm, the frame rate of 3-D reconstruction is 228.83 fps for 640×480 image resolution, which is 10.3 times faster than matrix inversion by means of Eq (7). For added comparison, Zhang *et al*. reported a reconstruction frame rate of 25.56 fps with a resolution of 532×500 when performing matrix inversion by means of GPU processing on an nVidia Quadro FX 3450 [6]. For Zhang *et al*.’s 2 + 1 pattern strategy [21], our LUT analysis can be also applied, and the performance is shown in Table 1.

In our third experiment,we scanned a human subject’s hand gestures with our dual-frequency pattern set (*N* = 6) using LUT-based processing. Different from our second experiment, the most recent *N* video frames are captured live and stored in a shared image buffer by a camera thread. At the same time, a second thread performs modulation and phase generation from the shared buffer and stores the computed data into a modulation and phase buffer. A third thread, simultaneously, performs 3-D reconstruction using data from this modulation buffer, storing the 3-D point cloud results into a 3-D data buffer. Finally, a fourth thread displays the data from the 3-D data buffer using OpenGL. Because the speed of the camera/projector pair is 120 fps, the speed of final 3-D reconstruction is also 120 fps while the display speed is limited by the LCD display at 40–50 fps.

Figure 7 shows a human subject’s hand gestures. In the case of Fig. 7 (left), we show the entire system along with the subject’s hand as a way of demonstrating that video is being acquired and displayed on screen and in real-time where there is no noticeable delay between the gesture’s performed by the subject and displayed on screen. On the LCD panel, you can see the XYZ axes labeled as blue, green, and red lines, respectively, while the 3-D point cloud is also being reflected off the XZ plane as a way of better illustrating the hand coming forward and away from the camera. In the video clips of Fig. 7 (center) and (right), we show the subject’s hand held in front of a white cardboard box. The purpose of these clips is to show that we can track multiple, disjoint objects, since there are no phase ambiguities when using dual-frequency pattern scheme. For added inspection, the images in Fig. 8 show the live frames of hand poses with front view and top view where, from visual inspection, one can see the texture/intensity encoding of the points derived from *B ^{c}*. Points with low modulation strength are not displayed.

As a final and quantitative evaluation of our proposed methodology, we scanned a texture-less, white table tennis ball, with a radius of 19 mm, swinging like a pendulum on a 20 cm long string. We used our proposed dual-frequency pattern scheme with *N* = 6, *f _{h}* = 4,

*f*= 1,

_{u}*A*= 155 and

^{p}*B*

^{p}_{1}=

*B*

^{p}_{2}= 50. Our PLUT and 3-D LUT were employed for phase generation and 3-D reconstruction. From the reconstructed point clouds of the moving ball, we found the best-fit sphere with variable position and radius. From this, we compared the estimated sphere radius with the true radius. Plotted in Fig. 9 (top), is this estimated sphere radius. The averaged distance of the table tennis ball point cloud, in Fig. 9 (bottom), was estimated as the centroid of all points in the cloud.

The exact position of the point cloud centroid is of no quantitative significance since we do not have a true measure to compare, but it does indicate the points in time when the ball was at the extreme points of the pendulum swing. What Fig. 9 shows is that we get an accurate and stable measurement of the ball radius at the extreme points of the swing, where the ball comes to a complete stop before changing direction. The mid-way point has the least stable radius estimate due to this being the point in the swing where the ball is moving at its highest velocity. At its farthest point from the sensor, noise in the reconstruction results in a sphere radius that is its smallest because the points of the cloud are not, in any way, constrained to be side by side in the reconstruction. So two neighboring pixels may end up being on the front and back of the best-fit sphere. This behavior is evident in the distance estimate, which has a high degree of variability. At its closest point to the sensor, the point cloud is sufficiently flat as to have the entire point cloud residing on the front-side of the best-fit sphere, resulting in a smooth an consistent distance estimate.

## 6. Conclusion and future works

In this paper, we presented a novel dual-frequency pattern strategy along with three LUT-based algorithms for calculating the modulation and phase terms along with reconstructing 3-D point clouds based upon a popular method of SLI. The proposed pattern coding technique embeds low and high frequency components into a single pattern where high-quality, high-frequency phase is unwrapped according to low-frequency phase. No any intermediate patterns are needed nor are surface discontinuities an issue. With regards to reconstruction quality, we note that scan accuracy is strongly influenced by features of pattern design, which were also considered in this paper. Being that our proposed method requires at least five patterns, future work will look at reducing this pattern number further.

With regards to LUT-based processing, especially, phase generation and 3-D reconstruction contributions of this paper are of particular importance to real-time 3-D reconstructing. Compared with both traditional PMP algorithm (Table 1) and previous published literatures [6,8,22], the improvement in frame rate performance for phase generation and 3-D reconstruction is significant. Furthermore, the methods presented here are not strictly tied to traditional PMP and the proposed dual-frequency approach where the 2+1 method, introduced by Wizinowich [27] and modified by Zhang in his SLI system [21], can be performed at the equivalent of 719 fps (Table 1) when implemented by means of MLUT and PLUT.

As presented using an 8 bits per pixel sensor, the LUT-based method is lossless but requires significant memory, depending on the number of patterns used. This memory consumption can, of course, be manipulated to use only the most significant bits of the incoming video stream to reduce the overall memory requirements for MLUT and PLUT. The LUTs used for 3-D reconstruction are limited in size by the number of pixels forming the sensor, so reducing the memory requirements here corresponds to reducing the overall field of view of the camera. While losing accuracy in MLUT may be insignificant for most applications, quantizing the incoming video to reduce the memory consumed by PLUT incurs a penalty in phase accuracy. For our experimental system, the variance in the phase error (in radians) ranged from 0.031 for 3-pattern, 4-bits per pixel LUTs to 0.001811 for 3-pattern, 10-bits per pixel LUTs.

## Appendix: Definitions of *M*_{x}, M_{y}, M_{z}, N_{x}, N_{y} and *N*_{z}

_{x}, M

_{y}, M

_{z}, N

_{x}, N

_{y}

_{z}

where

$${a}_{{x}_{2}}={m}_{12}^{c}{m}_{33}^{c}{m}_{34}^{p}{+m}_{13}^{c}{m}_{34}^{c}{m}_{32}^{p}+{m}_{14}^{c}{m}_{32}^{c}{m}_{33}^{p}{-m}_{12}^{c}{m}_{34}^{c}{m}_{33}^{p}-{m}_{13}^{c}{m}_{32}^{c}{m}_{34}^{p}-{m}_{14}^{c}{m}_{33}^{c}{m}_{32}^{p},$$

$${a}_{{x}_{3}}={m}_{12}^{c}{m}_{24}^{c}{m}_{33}^{p}{+m}_{13}^{c}{m}_{22}^{c}{m}_{34}^{p}+{m}_{14}^{c}{m}_{23}^{c}{m}_{32}^{p}{-m}_{12}^{c}{m}_{23}^{c}{m}_{34}^{p}-{m}_{13}^{c}{m}_{24}^{c}{m}_{32}^{p}-{m}_{14}^{c}{m}_{22}^{c}{m}_{33}^{p},$$

$${a}_{{y}_{1}}={m}_{21}^{c}{m}_{33}^{c}{m}_{34}^{p}{+m}_{23}^{c}{m}_{34}^{c}{m}_{31}^{p}+{m}_{24}^{c}{m}_{31}^{c}{m}_{33}^{p}{-m}_{21}^{c}{m}_{34}^{c}{m}_{33}^{p}-{m}_{23}^{c}{m}_{31}^{c}{m}_{34}^{p}-{m}_{24}^{c}{m}_{33}^{c}{m}_{31}^{p},$$

$${a}_{{y}_{2}}={m}_{11}^{c}{m}_{34}^{c}{m}_{33}^{p}{+m}_{13}^{c}{m}_{31}^{c}{m}_{34}^{p}+{m}_{14}^{c}{m}_{33}^{c}{m}_{31}^{p}{-m}_{11}^{c}{m}_{33}^{c}{m}_{34}^{p}-{m}_{13}^{c}{m}_{34}^{c}{m}_{31}^{p}-{m}_{14}^{c}{m}_{31}^{c}{m}_{33}^{p},$$

$${a}_{{y}_{3}}={m}_{11}^{c}{m}_{23}^{c}{m}_{34}^{p}{+m}_{13}^{c}{m}_{24}^{c}{m}_{31}^{p}+{m}_{14}^{c}{m}_{21}^{c}{m}_{33}^{p}{-m}_{11}^{c}{m}_{24}^{c}{m}_{33}^{p}-{m}_{13}^{c}{m}_{21}^{c}{m}_{34}^{p}-{m}_{14}^{c}{m}_{23}^{c}{m}_{31}^{p},$$

$${a}_{{z}_{1}}={m}_{21}^{c}{m}_{34}^{c}{m}_{32}^{p}{+m}_{22}^{c}{m}_{31}^{c}{m}_{34}^{p}+{m}_{24}^{c}{m}_{32}^{c}{m}_{31}^{p}{-m}_{21}^{c}{m}_{32}^{c}{m}_{34}^{p}-{m}_{22}^{c}{m}_{34}^{c}{m}_{31}^{p}-{m}_{24}^{c}{m}_{31}^{c}{m}_{32}^{p},$$

$${a}_{{z}_{2}}={m}_{11}^{c}{m}_{32}^{c}{m}_{34}^{p}{+m}_{12}^{c}{m}_{34}^{c}{m}_{31}^{p}+{m}_{14}^{c}{m}_{31}^{c}{m}_{32}^{p}{-m}_{11}^{c}{m}_{34}^{c}{m}_{32}^{p}-{m}_{12}^{c}{m}_{31}^{c}{m}_{34}^{p}-{m}_{14}^{c}{m}_{32}^{c}{m}_{31}^{p},$$

$${a}_{{z}_{3}}={m}_{11}^{c}{m}_{24}^{c}{m}_{32}^{p}{+m}_{12}^{c}{m}_{21}^{c}{m}_{34}^{p}+{m}_{14}^{c}{m}_{22}^{c}{m}_{31}^{p}{-m}_{11}^{c}{m}_{22}^{c}{m}_{34}^{p}-{m}_{12}^{c}{m}_{24}^{c}{m}_{31}^{p}-{m}_{14}^{c}{m}_{21}^{c}{m}_{32}^{p},$$

$${b}_{{x}_{1}}={m}_{22}^{c}{m}_{33}^{c}{m}_{24}^{p}{+m}_{23}^{c}{m}_{34}^{c}{m}_{22}^{p}+{m}_{24}^{c}{m}_{32}^{c}{m}_{23}^{p}{-m}_{22}^{c}{m}_{34}^{c}{m}_{23}^{p}-{m}_{23}^{c}{m}_{32}^{c}{m}_{24}^{p}-{m}_{24}^{c}{m}_{33}^{c}{m}_{22}^{p},$$

$${b}_{{x}_{2}}={m}_{12}^{c}{m}_{34}^{c}{m}_{23}^{p}{+m}_{13}^{c}{m}_{32}^{c}{m}_{24}^{p}+{m}_{14}^{c}{m}_{33}^{c}{m}_{22}^{p}{-m}_{12}^{c}{m}_{33}^{c}{m}_{24}^{p}-{m}_{13}^{c}{m}_{34}^{c}{m}_{22}^{p}-{m}_{14}^{c}{m}_{32}^{c}{m}_{23}^{p},$$

$${b}_{{x}_{3}}={m}_{12}^{c}{m}_{23}^{c}{m}_{24}^{p}{+m}_{13}^{c}{m}_{24}^{c}{m}_{22}^{p}+{m}_{14}^{c}{m}_{22}^{c}{m}_{23}^{p}{-m}_{12}^{c}{m}_{24}^{c}{m}_{23}^{p}-{m}_{13}^{c}{m}_{22}^{c}{m}_{24}^{p}-{m}_{14}^{c}{m}_{23}^{c}{m}_{22}^{p},$$

$${b}_{{y}_{1}}={m}_{21}^{c}{m}_{34}^{c}{m}_{23}^{p}{+m}_{23}^{c}{m}_{31}^{c}{m}_{24}^{p}+{m}_{24}^{c}{m}_{33}^{c}{m}_{21}^{p}{-m}_{21}^{c}{m}_{33}^{c}{m}_{24}^{p}-{m}_{23}^{c}{m}_{34}^{c}{m}_{21}^{p}-{m}_{24}^{c}{m}_{31}^{c}{m}_{23}^{p},$$

$${b}_{{y}_{2}}={m}_{11}^{c}{m}_{33}^{c}{m}_{24}^{p}{+m}_{13}^{c}{m}_{34}^{c}{m}_{21}^{p}+{m}_{14}^{c}{m}_{31}^{c}{m}_{23}^{p}{-m}_{11}^{c}{m}_{34}^{c}{m}_{23}^{p}-{m}_{13}^{c}{m}_{31}^{c}{m}_{24}^{p}-{m}_{14}^{c}{m}_{33}^{c}{m}_{21}^{p},$$

$${b}_{{y}_{3}}={m}_{11}^{c}{m}_{24}^{c}{m}_{23}^{p}{+m}_{13}^{c}{m}_{21}^{c}{m}_{24}^{p}+{m}_{14}^{c}{m}_{23}^{c}{m}_{21}^{p}{-m}_{11}^{c}{m}_{23}^{c}{m}_{24}^{p}-{m}_{13}^{c}{m}_{24}^{c}{m}_{21}^{p}-{m}_{14}^{c}{m}_{21}^{c}{m}_{23}^{p},$$

$${b}_{{z}_{1}}={m}_{21}^{c}{m}_{32}^{c}{m}_{24}^{p}{+m}_{22}^{c}{m}_{34}^{c}{m}_{21}^{p}+{m}_{24}^{c}{m}_{31}^{c}{m}_{22}^{p}{-m}_{21}^{c}{m}_{34}^{c}{m}_{22}^{p}-{m}_{22}^{c}{m}_{31}^{c}{m}_{24}^{p}-{m}_{24}^{c}{m}_{32}^{c}{m}_{21}^{p},$$

$${b}_{{z}_{2}}={m}_{11}^{c}{m}_{34}^{c}{m}_{22}^{p}{+m}_{12}^{c}{m}_{31}^{c}{m}_{24}^{p}+{m}_{14}^{c}{m}_{32}^{c}{m}_{21}^{p}{-m}_{11}^{c}{m}_{32}^{c}{m}_{24}^{p}-{m}_{12}^{c}{m}_{34}^{c}{m}_{21}^{p}-{m}_{14}^{c}{m}_{31}^{c}{m}_{22}^{p},$$

$${b}_{{z}_{3}}={m}_{11}^{c}{m}_{22}^{c}{m}_{24}^{p}{+m}_{12}^{c}{m}_{24}^{c}{m}_{21}^{p}+{m}_{14}^{c}{m}_{21}^{c}{m}_{22}^{p}{-m}_{11}^{c}{m}_{24}^{c}{m}_{22}^{p}-{m}_{12}^{c}{m}_{21}^{c}{m}_{21}^{p}-{m}_{14}^{c}{m}_{22}^{c}{m}_{21}^{p},$$

$${c}_{1}={m}_{21}^{c}{m}_{32}^{c}{m}_{33}^{p}{+m}_{22}^{c}{m}_{33}^{c}{m}_{31}^{p}+{m}_{23}^{c}{m}_{31}^{c}{m}_{32}^{p}{-m}_{21}^{c}{m}_{33}^{c}{m}_{32}^{p}-{m}_{22}^{c}{m}_{31}^{c}{m}_{33}^{p}-{m}_{23}^{c}{m}_{32}^{c}{m}_{31}^{p},$$

$${c}_{2}={m}_{11}^{c}{m}_{33}^{c}{m}_{32}^{p}{+m}_{12}^{c}{m}_{31}^{c}{m}_{33}^{p}+{m}_{13}^{c}{m}_{32}^{c}{m}_{31}^{p}{-m}_{11}^{c}{m}_{32}^{c}{m}_{33}^{p}-{m}_{12}^{c}{m}_{33}^{c}{m}_{31}^{p}-{m}_{13}^{c}{m}_{31}^{c}{m}_{32}^{p},$$

$${c}_{3}={m}_{11}^{c}{m}_{22}^{c}{m}_{33}^{p}{+m}_{12}^{c}{m}_{23}^{c}{m}_{31}^{p}+{m}_{13}^{c}{m}_{21}^{c}{m}_{32}^{p}{-m}_{11}^{c}{m}_{23}^{c}{m}_{32}^{p}-{m}_{12}^{c}{m}_{21}^{c}{m}_{33}^{p}-{m}_{13}^{c}{m}_{22}^{c}{m}_{31}^{p},$$

$${d}_{1}={m}_{21}^{c}{m}_{33}^{c}{m}_{22}^{p}{+m}_{22}^{c}{m}_{31}^{c}{m}_{23}^{p}+{m}_{23}^{c}{m}_{32}^{c}{m}_{21}^{p}{-m}_{21}^{c}{m}_{32}^{c}{m}_{23}^{p}-{m}_{22}^{c}{m}_{33}^{c}{m}_{21}^{p}-{m}_{23}^{c}{m}_{31}^{c}{m}_{22}^{p},$$

$${d}_{2}={m}_{11}^{c}{m}_{32}^{c}{m}_{23}^{p}{+m}_{12}^{c}{m}_{33}^{c}{m}_{21}^{p}+{m}_{13}^{c}{m}_{31}^{c}{m}_{22}^{p}{-m}_{11}^{c}{m}_{33}^{c}{m}_{22}^{p}-{m}_{12}^{c}{m}_{31}^{c}{m}_{23}^{p}-{m}_{13}^{c}{m}_{32}^{c}{m}_{21}^{p},$$

$${d}_{3}={m}_{11}^{c}{m}_{23}^{c}{m}_{22}^{p}{+m}_{12}^{c}{m}_{21}^{c}{m}_{23}^{p}+{m}_{13}^{c}{m}_{22}^{c}{m}_{21}^{p}{-m}_{11}^{c}{m}_{22}^{c}{m}_{23}^{p}-{m}_{12}^{c}{m}_{23}^{c}{m}_{21}^{p}-{m}_{13}^{c}{m}_{21}^{c}{m}_{22}^{p}.$$

## Acknowledgements

This work was sponsored, in part, by National Institute of Hometown Security, Somerset, KY, contract 17-07-UK, and by the Kentucky Science and Technology Company Inc., under contract KSEF-148-502-08-226.

## References and links

**1. **S. Zhang, D. Royer, and S. T. Yau, “High-resolution, real-time-geometry video acquisition system,” ACM SIG-GRAPH (2006).

**2. **C. Guan, L. G. Hassebrook, and D. L. Lau, “Composite structured light pattern for three-dimensional video,” Opt. Express **11**, 406–417 (2003). [CrossRef] [PubMed]

**3. **Y. Wang, Q. Hao, A. Fatehpuria, D. L. Lau, and L. G. Hassebrook, “Data acquisition and quality analysis of 3-dimensional fingerprints,” in IEEE conference on Biometrics, Identity and Security, Tampa, Florida, (2009).

**4. **J. Salvi, J. Pages, and J. Batlle, “Pattern codification strategies in structured light systems,” Pattern Recognition **37**, 827–849 (2004). [CrossRef]

**5. **S. Zhang, X. Li, and S.-T. Yau, “Multilevel quality-guided phase unwrapping algorithm for real-time three-dimensional shape reconstruction,” Appl. Opt. **46**, 50–57 (2007). [CrossRef]

**6. **S. Zhang, D. Royer, and S.-T. Yau, “GPU-assisted high-resolution, real-time 3-d shape measurement,” Opt. Express **14**, 9120–9129 (2006). [CrossRef] [PubMed]

**7. **J. Li, L. G. Hassebrook, and C. Guan, “Optimized two-frequency phase-measuring-profilometry light-sensor temporal-noise sensitivity,” J. Opt. Soc. Am. A **20**, 106–115 (2003). [CrossRef]

**8. **S. Zhang and P. S. Huang, “High-resolution, real-time three-dimensional shape measurement,” Opt. Eng. **45**, 123601 (2006). [CrossRef]

**9. **J. M. Huntley and H. O. Saldner, “Error—reduction methods for shape measurement by temporal phase unwrapping,” J. Opt. Soc. Am. A **14**, 3188–3196 (1997). [CrossRef]

**10. **S. Li, W. Chen, and X. Su, “Reliability—guided phase unwrapping in wavelet-transform profilometry,” Appl. Opt. **47**, 3369–3377 (2008). [CrossRef] [PubMed]

**11. **Y. Shi, “Robust phase unwrapping by spinning iteration,” Opt. Express **15**, 8059–8064 (2007). [CrossRef]

**12. **M. Costantini, “A novel phase unwrapping method based on network programming,” IEEE Trans. Geoscience and Remote Sensing **36**, 813–821 (1998). [CrossRef]

**13. **D. S. Mehta, S. K. Dubey, M. M. Hossain, and C. Shakher, “Simple multifrequency and phase—shifting fringe—projection system based on two—wavelength lateral shearing interferometry for three-dimensional profilometry,” Appl. Opt. **44**, 7515–7521 (2005). [CrossRef] [PubMed]

**14. **G. Pedrini, I. Alexeenko, W. Osten, and H. J. Tiziani, “Temporal phase unwrapping of digital hologram sequences,” Appl. Opt. **42**, 5846–5854 (2003). [CrossRef] [PubMed]

**15. **E.-H. Kim, J. Hahn, H. Kim, and B. Lee, “Profilometry without phase unwrapping using multi-frequency and four-step phase-shift sinusoidal fringe projection,” Opt. Express **17**, 7818–7830 (2009). [CrossRef] [PubMed]

**16. **J.-L. Li, H.-J. Su, and X.-Y. Su, “Two-frequency grating used in phase-measuring profilometry,” Appl. Opt. **36**, 277–280 (1997). [CrossRef] [PubMed]

**17. **W.-H. Su and H. Liu, “Calibration-based two-frequency projected fringe profilometry: a robust, accurate, and single-shot measurement for objects with large depth discontinuities,” Opt. Express **14**, 9178–9187 (2006). [CrossRef] [PubMed]

**18. **S. Y. Chen, Y. F. Li, and J. Zhang, “Vision processing for realtime 3-d data acquisition based on coded structured light,” IEEE Trans. Image Processing **17**, 167–176 (2008). [CrossRef]

**19. **M. Takeda and K. Mutoh, “Fourier transform profilometry for the automatic measurement of 3-d object shapes,” Appl. Opt. **22**, 3977–3982 (1983). [CrossRef] [PubMed]

**20. **T. P. Koninckx and L. V. Gool, “Real-time range acquisition by adaptive structured light,” IEEE Trans. Pattern Anal. Mach Intell. **28**, 432–445 (2006). [CrossRef] [PubMed]

**21. **S. Zhang and S.-T. Yau, “High-speed three-dimensional shape measurement system using a modified two-plus-one phase-shifting algorithm,” Opt. Eng. **46**, 113603 (2007). [CrossRef]

**22. **P. S. Huang and S. Zhang, “Fast three-step phase-shifting algorithm,” Appl. Opt. **45**, 5086–5091 (2006). [CrossRef] [PubMed]

**23. **M. Halioua and H. Liu, “Optical three-dimensional sensing by phase measuring profilometry,” Opt. Lasers in Eng. **11**, 185–215 (1989). [CrossRef]

**24. **S. F. Frisken, R. N. Perry, A. P. Rockwood, and T. R. Jones, “Adaptively sampled distance fields: A general representation of shape for computer graphics,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 249–254 (2000).

**25. **X. Su, G. von Bally, and D. Vukicevic, “Phase-stepping grating profilometry: utilization of intensity modulation analysis in complex objects evaluation,” Opt. Commun. **98**, 141–150 (1993). [CrossRef]

**26. **H. Guo and G. Liu, “Approximations for the arctangent function in efficient frige pattern analysis,” Opt. Express **15**, 3053–3066 (2007). [CrossRef] [PubMed]

**27. **P. L. Wizinowich, “Phase shifting interferometry in the presence of vibration: A new algorithm and system,” Appl. Opt. **29**, 3271–3279 (1990) [CrossRef] [PubMed]