## Abstract

We present a method for the visual measurement of the 3D position and orientation of a moving target. Three dimensional sensing is based on stereo vision while high resolution results from a pseudo-periodic pattern (PPP) fixed onto the target. The PPP is suited for optimizing image processing that is based on phase computations. We describe experimental setup, image processing and system calibration. Resolutions reported are in the micrometer range for target position (*x, y, z*) and of 5.3 × 10^{−4}*rad*. for target orientation (θ* _{x}*, θ

*,*

_{y}*θ*). These performances have to be appreciated with respect to the vision system used. The latter makes that every image pixel corresponds to an actual distance of 0.3 × 0.3

_{z}*mm*

^{2}on the target while the PPP is made of elementary dots of 1

*mm*with a period of 2

*mm*. Target tilts as large as

*π/*4 are allowed with respect to the

*Z*axis of the system.

© 2010 Optical Society of America

## 1. Introduction

Stereo vision is a powerful concept for perceiving the environment in three dimensions. However the reconstruction of the three dimensional scenes observed from binocular images is a difficult task in which the computer vision community has been very active and creative. Recently, impressive progresses were reported in algorithms for depth reconstruction from stereo image pairs [1, 2], in combination of monocular and binocular cues [3] and in synthesis of new views from a limited set of fixed cameras [4, 5].

From the optical point of view, stereo vision assumes a trade-off between the 3D definition achieved and the spatial extension of the resolved volume. Low imaging magnifications are required for both increased depths of focus and wide fields of view, while high magnifications are necessary for image sharpness and consecutive 3D definition improvements. This contradictory constraint could however be released by performing subpixel (or subvoxel) image processing. Indeed, a high level of pixel interpolation would allow significant image definition improvements while accommodating with wide observation fields and depths of focus. Our paper introduces such a subvoxel approach in the specific case of the 3D localization of a labeled target.

Our approach differs from usual stereo vision since the observed scene is already known. Our purpose is to localize accurately a labeled target versus the six degrees of freedom. The target label is made of a PPP chosen for achieving subvoxel resolutions. Computations are thus based on convenient features known to be present in all recorded images. Such processing of an appropriate pattern has already been used successfully in the case of two dimensional measurements [6–8]. This working principle is here extended to three dimensional measurements by using a stereo vision configuration. The setup described allows the demonstration of the 3D target localization with a position resolution in the micrometer range and a 5.3 × 10^{−4}*rad*. angular resolution.

Our aim is to evaluate the capabilities of such a technique to provide a visual position sensor to be used in closed loop for the servo-control of autonomous devices acting at the sub-millimeter scale. Our long term purpose is to monitor the 3D motions of surgical tools for practitioner training and for the development of tele-operated instruments.

## 2. Measurement Principle

#### 2.1. Basics of phase computations for in-plane position measurement

In previous works [6–8], in-plane position measurements with subpixel resolution were demonstrated using a PPP as represented in Fig. 1.a. This pattern, made of a regular distribution of dots is fixed on the target and observed by the vision system. The Fourier spectrum of this PPP (Fig. 1.b) exhibits a few lobes corresponding to its spatial directions. Then a bandwidth filter is applied twice to this spectrum and centered on the lobes (*u*_{1}, *v*_{1}) and (*u*_{2}, *v*_{2}) respectively. After inverse Fourier transform, we obtain the magnitude image of Fig. 1.c and the wrapped phase map of Fig. 1.d; both corresponding to the lobe (*u*_{1}, *v*_{1}). The magnitude image determines clearly the coarse position of the pattern while the wrapped phase encodes its fine position versus the image pixel frame. Since the dot pattern is known to be regular, the phase map can be unwrapped (cf Fig. 1.e) and fitted by a least square plane. The same filtering process is applied to the second spectral lobe (*u*_{2}, *v*_{2}) and thus a second phase plane, perpendicular to the first one is obtained as represented in Fig. 1.f. Since the PPP is known to be made of *N* periods of dots, the unwrapped phase planes correspond to phase excursions of (−*Nπ*, +*Nπ*) in both directions, with a phase equal to zero for the central dot. The unwrapped phase plane equations are thus adjusted with the appropriate 2*kπ* constants and the PPP center is obtained as the (*x,y*) position for which both phase plane equations are equal to zero. This analytic determination of the center position provides subpixel resolution thanks to least square fitting and to noise rejection by spectral filtering. The complete mathematical description of the technique can be found elsewhere [9].

Experimental resolution depends on the actual signal to noise ratio; it is usually of a few 10^{−3} pixel using a standard CCD camera (8bits) while in-plane orientation is also derived from phase map equations with a resolution better than 10^{−3} rad. In the case of Fig. 1.a however, the pattern symmetry induces a *π/*2 ambiguity in in-plane orientation; this point will be dealt with later. Next this subpixel method is applied to 3D measurements by using of a stereo vision configuration.

#### 2.2. Stereo vision configuration and 3D work space calibration

The 3D capabilities of stereo vision are based on triangulation. A same point is seen at different positions in the left and right images because of their different observation angles. This mismatch observed in the positions of a point in the left and right images depends on its depth. Depth information is thus retrieved from the mismatch observed through a geometrical model of the setup. We use these geometrical properties as usually. We simply benefit from the subpixel positions of the PPP center in left and right images to retrieve depth with improved resolution. Then the PPP is observed by means of a stereo vision system as represented in Fig. 2. We used two monochrome CCD cameras (*μ*eye *UI* −2210*SE*-*M*, 640× 480 pixels) equipped with identical 12*mm* focal length lenses (*Computar* 1 : 1.2,1/2”). The target is placed at about 50*cm* from the camera plane and is made of the PPP stuck on a rotation stage (*Thorlabs CR*1 – *Z*7) that is placed on an *XYZ* combination of three linear displacement motors (*Physik Instrumente M*403.1*DG*).

The geometrical characterization of our setup results from a calibration procedure. The latter was carried out thanks to a specific toolbox widely used in the computer vision community and freely available on the internet [10]. For this purpose, we used a set of planar checkerboards as observed objects as described in the toolbox protocol. Intrinsic and extrinsic parameters of our setup are summarized in Tab.1. They allow the derivation of the (*XYZ*) position of the pattern center from its set of subpixel positions in the left and right images.

## 3. Experiments and Results

#### 3.1. Angular measurements

The complete characterization of the target position requires the measurement of angles (Θ* _{X}*, Θ

*, Θ*

_{Y}*) to complement the (X,Y,Z) coordinates of the target centre. Angles are retrieved through the determination the 3D position of four complementary points on the pattern as illustrated in Fig. 3.b. Figure 3 presents images of the pattern as recorded simultaneously by the left and right cameras respectively. We first notice that one white dot is missing on the right-bottom diagonal of the pattern. This missing point was removed deliberately in order to break the pattern symmetry and thus to avoid*

_{Z}*π/*2 ambiguities in in-plane orientation. In Fig. 3.b, the dots symmetrical to the missing one with respect to the pattern corners are marked with red arrows. This set of four points; i.e. the missing one and its three counterparts, corresponds to unwrapped phase values equal to: (–10

*π*, −10

*π*), (−10

*π*, 10

*π*), (10

*π*, −10

*π*) and (10

*π*, 10

*π*). Subpixel coordinates corresponding to these phase values are derived from phase plane least square equations in both images. These coordinates are then fed into the geometrical model issued from the system calibration to obtain the 3D position of these four additional points. The target orientation (Θ

*, Θ*

_{X}*, Θ*

_{Y}*) is thus derived from their spatial distribution with respect to the target center position. However at this stage, Θ*

_{Z}*is obtained with a*

_{Z}*π/*2 ambiguity. The latter is easily removed by identifying the position of the missing dot. For that purpose, the gray level intensity is checked in the recorded images for the four complementary points marked in Fig. 3.b. The missing dot is then simply determined by the position with the lowest intensity observed. In this way, the target position is finally reconstructed unambiguously.

Figure 4 presents the target orientation as reconstructed experimentally for uniformly distributed angle values. In the figure, the perimeter formed by the four diagonal points is represented while the red arrow starting from the missing dot position indicates the pattern normal for each tilt angle. In practise, the PPP visibility is progressively altered while the target normal rotates with respect to the optical axis of the vision system. As can be seen in Fig. 4, tilt angles as large as ±*π*/4 are allowed around axis *X* and *Y* while the full 2*π* range is resolved around *Z*. The actual PPP visibility can be appreciated in the videos of Fig. 5. The latter presents images recorded by the left and right cameras during the target rotation as well as the reconstructed positions. The proper target position reconstruction is achieved while the recorded shape of the PPP is altered as a function of the tilt angle. This capability is verified even in the case of the rotation around the *X* axis that produces increased and dissymmetrical pattern distortions.

Angular measurements presented in this paper are limited to rotations around a single axis because of hardware limitations. The only rotation stage has been set in adequate position to explore the three possible rotations successively. More complicated target trajectories could not be described for an arbitrary example of the six degrees of freedom capabilities of the method. However any position corresponds to a linear combination of the explored ones and should be addressed by the proposed scheme.

#### 3.2. Method repeatability

We evaluated the method capabilities through repeatability measurements and displacement reconstructions. Our repeatability test consists in performing one hundred measurements without motorized displacement of the target. This elementary test was reproduced 51 times at target locations distant of 20*μm* from each other along the *Z* direction, for a total excursion length of 1*mm*. Reconstructed position and orientation dispersions were evaluated at each target position and statistical data presented in Tab.2 were obtained. In the table, *Worst* and *Best* lines are based on a single set of 100 measures while *Mean* lines result from 5100 measures. In the first three columns, we can see that the *best* and *worst* position values are in good agreement with the 1*mm* displacement applied to the target. In statistical position parameters, a factor of about four is observed between worst and best cases. One reason of this is the sensitivity of the setup to external disturbances. However, we are unable to distinguish between contributions of actual method capabilities and that of environmental disturbances. We then consider the average over those 51 measurements to be an unperfect but representative estimation of the method repeatability. The latter is evaluated to be 0.53, 0.52 and 2.06*μm* along the *X*, *Y* and *Z* axis respectively, as given by the standard deviations observed. The resulting *standard deviation volume* is thus as small as 0.57*μm*^{3}. The dispersion of the reconstructed positions is represented in Fig. 6 for two cases. Fig. 6.a corresponds to a position whose statistics are close to the average ones. In this figure, the full scales are 2.5*μm* in both *X* and *Y* and 10*μm* in *Z*. They have to be compared with the optical resolution of the vision system equal to 60*μm* in the object plane. Fig. 6.b corresponds to the worst case observed among the 51 sets of data. It appears clearly that a few points move away from the main cloud of points. The dotted lines between points show that these points result from images recorded consecutively and this continuous perturbation can thus be attributed to environmental disturbances occurring during data recording. In spite of this external noise, the full scales are only 7*μm* in both *X* and *Y* and 30*μm* in *Z* and the standard deviation on (*x*, *y*) coordinates corresponds to only 2 · 10^{−2} image pixel. These results demonstrate that stereo-vision benefits from the subpixel capabilities of phase measurements for achieving subvoxel resolution in 3D target position reconstruction.

Concerning orientation repeatability, we can see in the lower part of Table 2 that the variations in the mean target orientation remains lower than 10^{−4} *rad.* over the 51 target positions and for the three rotation axis. Peak-valley fluctuations are twice lower than for position while standard deviation remains very stable over the 51 data sets and lower than 2 · 10^{−4} *rad.* Orientation seems thus to be less sensitive than position to external disturbances while providing also an excellent level of performances.

#### 3.3. Displacement reconstruction and method specifications

Figure 7 presents the target displacements as reconstructed with our method while a driving signal describing the acronyms of the author institutions inside a diamond-like frame was applied to the translation stages. The 3D position sensing capabilities of the method are clearly validated and the full scale extension of the figure versus the three directions demonstrates its high level of performance. In the zoom of Fig. 7.b, the 200*μm* full scale in *X* and *Y* corresponds to a pattern displacement of only 2/3 of a pixel in the recorded images. In spite of this small shift, the complete acronyms can be reconstructed within this tiny volume thanks to the sub-voxel capabilities of the method. A detailed view of this measurement is presented in the video of Fig. 8 making the subvoxel capabilities of the method more evident.

### Measurement volume

The measurement volume could be described by usual geometrical optics laws as a function of sensor size, working distance, focal and aperture of the lens. The observation field of the stereo vision configuration can also be derived from Fig. 3. The PPP is made of 15 × 15 dots with a 2*mm* period. The PPP extension is thus 29*mm* in both directions and the observation field is evaluated to be 161 × 116*mm*^{2}. To perform 3D position measurements, the PPP has to remain present in the recorded images. The allowed *XY* target displacements are thus reduced to 128 × 83*mm*^{2}; i.e. 106*cm*^{2} (with a 2*mm* margin kept around the image). The depth of focus was measured to be larger than 200*mm*. Furthermore the subpixel method has been found to accommodate with defocus and allows measurements over extended depths [11]. The measurement volume is thus of the order of 10^{3}*cm*^{3} or even larger.

### Method resolution

The method resolution is derived from repeatability tests and following an usual statistical rule [12] it is defined as 3 times the standard deviation observed. With this definition, the angular resolution is equivalent for the three directions and equal to 5.3 × 10^{−4}*rad.* Position resolution is 1.6*μm* in *X* and *Y* directions and 6.2*μm* in *Z*. The vision system magnification makes that a single image pixel corresponds to an actual size of about 300*μm* on the target. The subvoxel capabilities of the method are thus clearly demonstrated. The lower performance along the *Z* direction is due to the geometrical configuration. The proportion between lateral and vertical resolutions depends on the actual angle formed by the optical axis of the two cameras. These method capabilities are valid for the configuration used for experiments. In fact, resolution adjustments can be made through actual system parameters, especially camera signal to noise ratio and optical magnification. In the latter case, the measurement volume is also affected.

The conversion from method resolution to method precision or uncertainty is application dependent since systematic errors might be introduced by passing from stereo vision coordinate system to application coordinate system. Such systematic errors would be avoided by choosing the stereo vision coordinate system as application position reference. In this case, measurement precision and uncertainty will be equal to resolution as defined by the 3*σ* values.

## 4. Conclusions

This paper presents a stereo vision configuration suited for 3D positioning of a labeled target versus the six degrees of freedom with subvoxel resolution. In a stereo vision configuration, subvoxel capability is precious since it releases antagonism between resolution, depth of focus and field of observation extension. In our demonstration, the resolution obtained is in the range of micrometers which usually requires the use of microscope objectives. By achieving this with a 12*mm* focal length lens, we make this micrometer range of resolution compatible with an extended resolved space, about 10*cm* along each spatial direction. In the present configuration, the PPP is assumed to be present in each recorded image. This restriction can be avoided by using a position encryption technique in the pattern design for accommodating with any field of observation as it was demonstrated in other works [13, 14].

Experiments reported here were carried out using postprocessing based on Matlab routines. We are unable to perform video rate measurements at the moment but a few hertz bandwidth could be achieved by choosing a more time-efficient programming language. Furthermore the current development of FPGA-based image processing [15] is very promising for future high-rate reconstruction of target position.

## Acknowledgments

The authors acknowledge the Administrative Department of Science, Technology and Innovation of the Colombia (Colciencias) and the Universidad Industrial de Santander-Colombia, for Financial Aid to make the first author’s doctoral studies and thus to allow the realization of this work.

## References and links

**1. **T. Kanade and C. Zitnick, “A cooperative algorithm for stereo matching and occlusion detection,” IEEE Trans. Pattern Anal. Mach. Intell. **22**(7), 675–684 (2000). [CrossRef]

**2. **D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis. **47**(1), 7–42 (2002). [CrossRef]

**3. **A. Saxena, S. H. Chung, and A. Y. Ng, “3-D depth reconstruction from a single still image,” Int. J. Comput. Vis. **76**(1), 53–69 (2008). [CrossRef]

**4. **W. Matusik, C. Buehler, R. Raskar, S. J. Gortler, and L. McMillan, “Image-based visual hulls,” in *Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques* (ACM Press/Addison-Wesley Publishing Co., 2000), pp. 369–374.

**5. **C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” *ACM SIGGRAPH 2004 Papers*, (ACM, 2004), pp. 600–608.

**6. **P. Sandoz, J. C. Ravassard, S. Dembélé, and A. Janex, “Phase-sensitive vision technique for high accuracy position measurement of moving targets,” IEEE Transactions on Instrumentation and Measurement **49**(44), 867–872 (2000). [CrossRef]

**7. **P. Sandoz, V. Bonnans, and T. Gharbi, “High-accuracy position and orientation measurement of extended two-dimensional surfaces by a phase-sensitive vision method,” Appl. Opt. **41**(26), 5503–5511 (2002). [CrossRef] [PubMed]

**8. **P. Sandoz, B. Trolard, D. Marsaut, and T. Gharbi, “Microstructured surface element for high-accuracy position measurement by vision and phase measurement,” Ibero-American Conf. RIAO-OPTILAS, Proc. SPIE **5622**, 606–611 (2004).

**9. **P. Sandoz, J. M. Friedt, and E. Carry, “In-plane rigid-body vibration mode characterization with a nanometer resolution by stroboscopic imaging of a microstructured pattern,” Rev. Sci. Instrum. **78**, 023706 (2007). [CrossRef] [PubMed]

**10. **J. Y. Bouguet, Camera calibration toolbox for matlab (2008). *http://www.vision.caltech.edu/bouguetj/calibdoc.*

**11. **J. A. Galeano-Zea, P. Sandoz, and L. Robert, “Position encryption of extended surfaces for subpixel localization of small-sized fields of observation,” in *Proc. IEEE on International Symposium on Optomechatronic Technologies*, (IEEE, 2009), pp. 21–27.

**12. **R. J. Hansman, “Characteristics of instrumentation,” in *The Measurement, Instrumentation, and Sensors Handbook*, J. G. Webster, ed. (Springer-Verlag, 1999).

**13. **P. Sandoz, R. Zeggari, L. Froelhy, J. L. Prétet, and C. Mougin, “Position referencing in optical microscopy thanks to sample holders with out-of-focus encoded patterns,” J. Microsc. **255**(3), 293–303 (2007). [CrossRef]

**14. **J. A. Galeano-Zea, P. Sandoz, E. Gaiffe, J. L. Prétet, and C. Mougin, “Pseudo-periodic encryption of extended 2-D surfaces for high accurate recovery of any random zone by vision,” Int. J. Optomechatronics **4**(1), 65–82 (2010). [CrossRef]

**15. **J. Batlle, J. Marti, P. Ridao, and J. Amat, “A new FPGA/DSP-based parallel architecture for real-time image processing,” Real-Time Imaging **8**(5), 345–356 (2002). [CrossRef]