3D localization of a labeled target by means of a stereo vision configuration with subvoxel resolution

Néstor A. Arias H.; Patrick Sandoz; Jaime E. Meneses; Miguel A. Suarez; Tijani Gharbi

doi:10.1364/OE.18.024152

1. Introduction

Stereo vision is a powerful concept for perceiving the environment in three dimensions. However the reconstruction of the three dimensional scenes observed from binocular images is a difficult task in which the computer vision community has been very active and creative. Recently, impressive progresses were reported in algorithms for depth reconstruction from stereo image pairs [1, 2], in combination of monocular and binocular cues [3] and in synthesis of new views from a limited set of fixed cameras [4, 5].

From the optical point of view, stereo vision assumes a trade-off between the 3D definition achieved and the spatial extension of the resolved volume. Low imaging magnifications are required for both increased depths of focus and wide fields of view, while high magnifications are necessary for image sharpness and consecutive 3D definition improvements. This contradictory constraint could however be released by performing subpixel (or subvoxel) image processing. Indeed, a high level of pixel interpolation would allow significant image definition improvements while accommodating with wide observation fields and depths of focus. Our paper introduces such a subvoxel approach in the specific case of the 3D localization of a labeled target.

Our approach differs from usual stereo vision since the observed scene is already known. Our purpose is to localize accurately a labeled target versus the six degrees of freedom. The target label is made of a PPP chosen for achieving subvoxel resolutions. Computations are thus based on convenient features known to be present in all recorded images. Such processing of an appropriate pattern has already been used successfully in the case of two dimensional measurements [6–8]. This working principle is here extended to three dimensional measurements by using a stereo vision configuration. The setup described allows the demonstration of the 3D target localization with a position resolution in the micrometer range and a 5.3 × 10⁻⁴rad. angular resolution.

Our aim is to evaluate the capabilities of such a technique to provide a visual position sensor to be used in closed loop for the servo-control of autonomous devices acting at the sub-millimeter scale. Our long term purpose is to monitor the 3D motions of surgical tools for practitioner training and for the development of tele-operated instruments.

2. Measurement Principle

2.1. Basics of phase computations for in-plane position measurement

In previous works [6–8], in-plane position measurements with subpixel resolution were demonstrated using a PPP as represented in Fig. 1.a. This pattern, made of a regular distribution of dots is fixed on the target and observed by the vision system. The Fourier spectrum of this PPP (Fig. 1.b) exhibits a few lobes corresponding to its spatial directions. Then a bandwidth filter is applied twice to this spectrum and centered on the lobes (u₁, v₁) and (u₂, v₂) respectively. After inverse Fourier transform, we obtain the magnitude image of Fig. 1.c and the wrapped phase map of Fig. 1.d; both corresponding to the lobe (u₁, v₁). The magnitude image determines clearly the coarse position of the pattern while the wrapped phase encodes its fine position versus the image pixel frame. Since the dot pattern is known to be regular, the phase map can be unwrapped (cf Fig. 1.e) and fitted by a least square plane. The same filtering process is applied to the second spectral lobe (u₂, v₂) and thus a second phase plane, perpendicular to the first one is obtained as represented in Fig. 1.f. Since the PPP is known to be made of N periods of dots, the unwrapped phase planes correspond to phase excursions of (−Nπ, +Nπ) in both directions, with a phase equal to zero for the central dot. The unwrapped phase plane equations are thus adjusted with the appropriate 2kπ constants and the PPP center is obtained as the (x,y) position for which both phase plane equations are equal to zero. This analytic determination of the center position provides subpixel resolution thanks to least square fitting and to noise rejection by spectral filtering. The complete mathematical description of the technique can be found elsewhere [9].

Fig. 1 Fourier processing of the dot pattern: (a) recorded image of the pattern; (b) Fourier spectrum of the pattern image; (c) Modulus and (d) unwrapped phase of inverse Fourier transform after bandpass filtering of lobe (u₁, v₁); (e) Unwrapped phase of (d) in the modulus window; (f) Unwrapped phase in the complementary direction; i.e. from lobe (u₂, v₂).

Download Full Size | PDF

Experimental resolution depends on the actual signal to noise ratio; it is usually of a few 10⁻³ pixel using a standard CCD camera (8bits) while in-plane orientation is also derived from phase map equations with a resolution better than 10⁻³ rad. In the case of Fig. 1.a however, the pattern symmetry induces a π/2 ambiguity in in-plane orientation; this point will be dealt with later. Next this subpixel method is applied to 3D measurements by using of a stereo vision configuration.

2.2. Stereo vision configuration and 3D work space calibration

The 3D capabilities of stereo vision are based on triangulation. A same point is seen at different positions in the left and right images because of their different observation angles. This mismatch observed in the positions of a point in the left and right images depends on its depth. Depth information is thus retrieved from the mismatch observed through a geometrical model of the setup. We use these geometrical properties as usually. We simply benefit from the subpixel positions of the PPP center in left and right images to retrieve depth with improved resolution. Then the PPP is observed by means of a stereo vision system as represented in Fig. 2. We used two monochrome CCD cameras (μeye UI −2210SE-M, 640× 480 pixels) equipped with identical 12mm focal length lenses (Computar 1 : 1.2,1/2”). The target is placed at about 50cm from the camera plane and is made of the PPP stuck on a rotation stage (Thorlabs CR1 – Z7) that is placed on an XYZ combination of three linear displacement motors (Physik Instrumente M403.1DG).

Fig. 2 Stereo vision configuration and localization of the volume of measurement (about 10cm wide in X and Y, and 20cm in Z).

Download Full Size | PDF

The geometrical characterization of our setup results from a calibration procedure. The latter was carried out thanks to a specific toolbox widely used in the computer vision community and freely available on the internet [10]. For this purpose, we used a set of planar checkerboards as observed objects as described in the toolbox protocol. Intrinsic and extrinsic parameters of our setup are summarized in Tab.1. They allow the derivation of the (XYZ) position of the pattern center from its set of subpixel positions in the left and right images.

3. Experiments and Results

3.1. Angular measurements

The complete characterization of the target position requires the measurement of angles (Θ_X, Θ_Y, Θ_Z) to complement the (X,Y,Z) coordinates of the target centre. Angles are retrieved through the determination the 3D position of four complementary points on the pattern as illustrated in Fig. 3.b. Figure 3 presents images of the pattern as recorded simultaneously by the left and right cameras respectively. We first notice that one white dot is missing on the right-bottom diagonal of the pattern. This missing point was removed deliberately in order to break the pattern symmetry and thus to avoid π/2 ambiguities in in-plane orientation. In Fig. 3.b, the dots symmetrical to the missing one with respect to the pattern corners are marked with red arrows. This set of four points; i.e. the missing one and its three counterparts, corresponds to unwrapped phase values equal to: (–10π, −10π), (−10π, 10π), (10π, −10π) and (10π, 10π). Subpixel coordinates corresponding to these phase values are derived from phase plane least square equations in both images. These coordinates are then fed into the geometrical model issued from the system calibration to obtain the 3D position of these four additional points. The target orientation (Θ_X, Θ_Y, Θ_Z) is thus derived from their spatial distribution with respect to the target center position. However at this stage, Θ_Z is obtained with a π/2 ambiguity. The latter is easily removed by identifying the position of the missing dot. For that purpose, the gray level intensity is checked in the recorded images for the four complementary points marked in Fig. 3.b. The missing dot is then simply determined by the position with the lowest intensity observed. In this way, the target position is finally reconstructed unambiguously.

Fig. 3 Images of the pattern as recorded by the left (a) and right (b) cameras.

Download Full Size | PDF

Figure 4 presents the target orientation as reconstructed experimentally for uniformly distributed angle values. In the figure, the perimeter formed by the four diagonal points is represented while the red arrow starting from the missing dot position indicates the pattern normal for each tilt angle. In practise, the PPP visibility is progressively altered while the target normal rotates with respect to the optical axis of the vision system. As can be seen in Fig. 4, tilt angles as large as ±π/4 are allowed around axis X and Y while the full 2π range is resolved around Z. The actual PPP visibility can be appreciated in the videos of Fig. 5. The latter presents images recorded by the left and right cameras during the target rotation as well as the reconstructed positions. The proper target position reconstruction is achieved while the recorded shape of the PPP is altered as a function of the tilt angle. This capability is verified even in the case of the rotation around the X axis that produces increased and dissymmetrical pattern distortions.

Fig. 4 Evaluation of Maximum allowed target tilt with respect to the system axis: (–45°, 45°) around X (a) and Y (b), (0°, 360°) around Z (c). Red lines indicate the pattern normal at each reconstructed position.

Download Full Size | PDF

Fig. 5 (a) ( Media 1), (b) ( Media 2), and (c) ( Media 3) Video presentations of target rotation around the three axis. The reconstructed perimeter formed by the four diagonal points are presented simultaneously to the images recorded by the left and right cameras.

Download Full Size | PDF

Angular measurements presented in this paper are limited to rotations around a single axis because of hardware limitations. The only rotation stage has been set in adequate position to explore the three possible rotations successively. More complicated target trajectories could not be described for an arbitrary example of the six degrees of freedom capabilities of the method. However any position corresponds to a linear combination of the explored ones and should be addressed by the proposed scheme.

3.2. Method repeatability

We evaluated the method capabilities through repeatability measurements and displacement reconstructions. Our repeatability test consists in performing one hundred measurements without motorized displacement of the target. This elementary test was reproduced 51 times at target locations distant of 20μm from each other along the Z direction, for a total excursion length of 1mm. Reconstructed position and orientation dispersions were evaluated at each target position and statistical data presented in Tab.2 were obtained. In the table, Worst and Best lines are based on a single set of 100 measures while Mean lines result from 5100 measures. In the first three columns, we can see that the best and worst position values are in good agreement with the 1mm displacement applied to the target. In statistical position parameters, a factor of about four is observed between worst and best cases. One reason of this is the sensitivity of the setup to external disturbances. However, we are unable to distinguish between contributions of actual method capabilities and that of environmental disturbances. We then consider the average over those 51 measurements to be an unperfect but representative estimation of the method repeatability. The latter is evaluated to be 0.53, 0.52 and 2.06μm along the X, Y and Z axis respectively, as given by the standard deviations observed. The resulting standard deviation volume is thus as small as 0.57μm³. The dispersion of the reconstructed positions is represented in Fig. 6 for two cases. Fig. 6.a corresponds to a position whose statistics are close to the average ones. In this figure, the full scales are 2.5μm in both X and Y and 10μm in Z. They have to be compared with the optical resolution of the vision system equal to 60μm in the object plane. Fig. 6.b corresponds to the worst case observed among the 51 sets of data. It appears clearly that a few points move away from the main cloud of points. The dotted lines between points show that these points result from images recorded consecutively and this continuous perturbation can thus be attributed to environmental disturbances occurring during data recording. In spite of this external noise, the full scales are only 7μm in both X and Y and 30μm in Z and the standard deviation on (x, y) coordinates corresponds to only 2 · 10⁻² image pixel. These results demonstrate that stereo-vision benefits from the subpixel capabilities of phase measurements for achieving subvoxel resolution in 3D target position reconstruction.

Table 1. Calibration results obtained for our stereo vision setup (See ref. [10] for parameter definition)

View Table | View all tables in this article

Fig. 6 XYZ position deviations measured for 100 measurements without target displacement. (a) One case with statistics close to the average observed; (b) Worst case observed.

Download Full Size | PDF

Concerning orientation repeatability, we can see in the lower part of Table 2 that the variations in the mean target orientation remains lower than 10⁻⁴ rad. over the 51 target positions and for the three rotation axis. Peak-valley fluctuations are twice lower than for position while standard deviation remains very stable over the 51 data sets and lower than 2 · 10⁻⁴ rad. Orientation seems thus to be less sensitive than position to external disturbances while providing also an excellent level of performances.

Table 2. Repeatability statistics obtained for position and angle reconstruction while sweeping the target over 1mm by steps of 20 microns and performing 100 measurements without motorized target displacement at each position. Mean values columns compare average values obtained on the 51 data clouds while Peak-Valley and Standard Deviation columns correspond to the single data cloud of interest

View Table | View all tables in this article

3.3. Displacement reconstruction and method specifications

Figure 7 presents the target displacements as reconstructed with our method while a driving signal describing the acronyms of the author institutions inside a diamond-like frame was applied to the translation stages. The 3D position sensing capabilities of the method are clearly validated and the full scale extension of the figure versus the three directions demonstrates its high level of performance. In the zoom of Fig. 7.b, the 200μm full scale in X and Y corresponds to a pattern displacement of only 2/3 of a pixel in the recorded images. In spite of this small shift, the complete acronyms can be reconstructed within this tiny volume thanks to the sub-voxel capabilities of the method. A detailed view of this measurement is presented in the video of Fig. 8 making the subvoxel capabilities of the method more evident.

Fig. 7 (a) 3D reconstructed target displacements while a “FEMTO-ST UIS GOTS” drive signal was applied to the motors within a diamond-like contour. (b) Enlarged view of a tiny volume corresponding to a pattern shift of only 2/3 of a pixel in the recorded images. Full acronym extension: 122.8μm in X, 94.1μm in Y, 54.1μm in Z.

Download Full Size | PDF

Fig. 8 Video presentation of 3D reconstructed target displacements. ( Media 4)

Download Full Size | PDF

Measurement volume

The measurement volume could be described by usual geometrical optics laws as a function of sensor size, working distance, focal and aperture of the lens. The observation field of the stereo vision configuration can also be derived from Fig. 3. The PPP is made of 15 × 15 dots with a 2mm period. The PPP extension is thus 29mm in both directions and the observation field is evaluated to be 161 × 116mm². To perform 3D position measurements, the PPP has to remain present in the recorded images. The allowed XY target displacements are thus reduced to 128 × 83mm²; i.e. 106cm² (with a 2mm margin kept around the image). The depth of focus was measured to be larger than 200mm. Furthermore the subpixel method has been found to accommodate with defocus and allows measurements over extended depths [11]. The measurement volume is thus of the order of 10³cm³ or even larger.

Method resolution

The method resolution is derived from repeatability tests and following an usual statistical rule [12] it is defined as 3 times the standard deviation observed. With this definition, the angular resolution is equivalent for the three directions and equal to 5.3 × 10⁻⁴rad. Position resolution is 1.6μm in X and Y directions and 6.2μm in Z. The vision system magnification makes that a single image pixel corresponds to an actual size of about 300μm on the target. The subvoxel capabilities of the method are thus clearly demonstrated. The lower performance along the Z direction is due to the geometrical configuration. The proportion between lateral and vertical resolutions depends on the actual angle formed by the optical axis of the two cameras. These method capabilities are valid for the configuration used for experiments. In fact, resolution adjustments can be made through actual system parameters, especially camera signal to noise ratio and optical magnification. In the latter case, the measurement volume is also affected.

The conversion from method resolution to method precision or uncertainty is application dependent since systematic errors might be introduced by passing from stereo vision coordinate system to application coordinate system. Such systematic errors would be avoided by choosing the stereo vision coordinate system as application position reference. In this case, measurement precision and uncertainty will be equal to resolution as defined by the 3σ values.

4. Conclusions

This paper presents a stereo vision configuration suited for 3D positioning of a labeled target versus the six degrees of freedom with subvoxel resolution. In a stereo vision configuration, subvoxel capability is precious since it releases antagonism between resolution, depth of focus and field of observation extension. In our demonstration, the resolution obtained is in the range of micrometers which usually requires the use of microscope objectives. By achieving this with a 12mm focal length lens, we make this micrometer range of resolution compatible with an extended resolved space, about 10cm along each spatial direction. In the present configuration, the PPP is assumed to be present in each recorded image. This restriction can be avoided by using a position encryption technique in the pattern design for accommodating with any field of observation as it was demonstrated in other works [13, 14].

Experiments reported here were carried out using postprocessing based on Matlab routines. We are unable to perform video rate measurements at the moment but a few hertz bandwidth could be achieved by choosing a more time-efficient programming language. Furthermore the current development of FPGA-based image processing [15] is very promising for future high-rate reconstruction of target position.

Acknowledgments

The authors acknowledge the Administrative Department of Science, Technology and Innovation of the Colombia (Colciencias) and the Universidad Industrial de Santander-Colombia, for Financial Aid to make the first author’s doctoral studies and thus to allow the realization of this work.

References and links

1. T. Kanade and C. Zitnick, “A cooperative algorithm for stereo matching and occlusion detection,” IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 675–684 (2000). [CrossRef]

2. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis. 47(1), 7–42 (2002). [CrossRef]

3. A. Saxena, S. H. Chung, and A. Y. Ng, “3-D depth reconstruction from a single still image,” Int. J. Comput. Vis. 76(1), 53–69 (2008). [CrossRef]

4. W. Matusik, C. Buehler, R. Raskar, S. J. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM Press/Addison-Wesley Publishing Co., 2000), pp. 369–374.

5. C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” ACM SIGGRAPH 2004 Papers, (ACM, 2004), pp. 600–608.

6. P. Sandoz, J. C. Ravassard, S. Dembélé, and A. Janex, “Phase-sensitive vision technique for high accuracy position measurement of moving targets,” IEEE Transactions on Instrumentation and Measurement 49(44), 867–872 (2000). [CrossRef]

7. P. Sandoz, V. Bonnans, and T. Gharbi, “High-accuracy position and orientation measurement of extended two-dimensional surfaces by a phase-sensitive vision method,” Appl. Opt. 41(26), 5503–5511 (2002). [CrossRef] [PubMed]

8. P. Sandoz, B. Trolard, D. Marsaut, and T. Gharbi, “Microstructured surface element for high-accuracy position measurement by vision and phase measurement,” Ibero-American Conf. RIAO-OPTILAS, Proc. SPIE 5622, 606–611 (2004).

9. P. Sandoz, J. M. Friedt, and E. Carry, “In-plane rigid-body vibration mode characterization with a nanometer resolution by stroboscopic imaging of a microstructured pattern,” Rev. Sci. Instrum. 78, 023706 (2007). [CrossRef] [PubMed]

10. J. Y. Bouguet, Camera calibration toolbox for matlab (2008). http://www.vision.caltech.edu/bouguetj/calibdoc.

11. J. A. Galeano-Zea, P. Sandoz, and L. Robert, “Position encryption of extended surfaces for subpixel localization of small-sized fields of observation,” in Proc. IEEE on International Symposium on Optomechatronic Technologies, (IEEE, 2009), pp. 21–27.

12. R. J. Hansman, “Characteristics of instrumentation,” in The Measurement, Instrumentation, and Sensors Handbook, J. G. Webster, ed. (Springer-Verlag, 1999).

13. P. Sandoz, R. Zeggari, L. Froelhy, J. L. Prétet, and C. Mougin, “Position referencing in optical microscopy thanks to sample holders with out-of-focus encoded patterns,” J. Microsc. 255(3), 293–303 (2007). [CrossRef]

14. J. A. Galeano-Zea, P. Sandoz, E. Gaiffe, J. L. Prétet, and C. Mougin, “Pseudo-periodic encryption of extended 2-D surfaces for high accurate recovery of any random zone by vision,” Int. J. Optomechatronics 4(1), 65–82 (2010). [CrossRef]

15. J. Batlle, J. Marti, P. Ridao, and J. Amat, “A new FPGA/DSP-based parallel architecture for real-time image processing,” Real-Time Imaging 8(5), 345–356 (2002). [CrossRef]

Parameter	Camera1 left	Camera2 right	Camera 1–2 transformation
Center x (pixels)	359.64004	325.96163	Alpha(degree)	27.7871
Center y (pixels)	217.76502	241.15900	Beta(degree)	1.9501
fλ_x(pixels)	2245.17990	2251.90151	Gamma(degree)	−1.2685
fλ_y(pixels)	2252.39709	2257.77543	T_x(mm)	243.17917
Skew	0	0	T_y(mm)	3.40766
k₁(mm)	−0.83951	−1.05472	T_z(mm)	61.78319

*Position*	Mean Value (mm)			Peak-Valley (μm)			Stand. Dev. (μm)
	X̄	Ȳ	Z̄	ΔX	ΔY	ΔZ	σ_X	σ_Y	σ_Z
Worst	−13.1337	−7.7699	524.1918	6.02	6.43	25.78	1.50	1.26	5.64
Mean	−13.1345	−7.7737	-	2.63	2.51	10.27	0.53	0.52	2.06
Best	−13.1357	−7.7794	523.1838	1.63	1.66	6.61	0.33	0.35	1.31
*Angle*	Mean Value (rad.)			Peak-Valley (10⁻³ · rad.)			Stand. Dev. (10 ⁻³ · rad).
	Θ̄_X	Θ̄_Y	Θ̄_Z	ΔΘ_X	ΔΘ_Y	ΔΘ_Z	σ_{Θ_X}	σ_{Θ_Y}	σ_{Θ_Z}
Worst	−1.5656	−1.5198	3.0904	1.576	1.260	1.232	0.193	0.177	0.176
Mean	−1.5656	−1.5198	3.0904	1.150	1.226	1.210	0.174	0.170	0.169
Best	−1.5656	−1.5198	3.0904	0.810	0.863	0.874	0.169	0.164	0.163

Parameter	Camera1 left	Camera2 right	Camera 1–2 transformation
Center x (pixels)	359.64004	325.96163	Alpha(degree)	27.7871
Center y (pixels)	217.76502	241.15900	Beta(degree)	1.9501
fλ_x(pixels)	2245.17990	2251.90151	Gamma(degree)	−1.2685
fλ_y(pixels)	2252.39709	2257.77543	T_x(mm)	243.17917
Skew	0	0	T_y(mm)	3.40766
k₁(mm)	−0.83951	−1.05472	T_z(mm)	61.78319

*Position*	Mean Value (mm)			Peak-Valley (μm)			Stand. Dev. (μm)
	X̄	Ȳ	Z̄	ΔX	ΔY	ΔZ	σ_X	σ_Y	σ_Z
Worst	−13.1337	−7.7699	524.1918	6.02	6.43	25.78	1.50	1.26	5.64
Mean	−13.1345	−7.7737	-	2.63	2.51	10.27	0.53	0.52	2.06
Best	−13.1357	−7.7794	523.1838	1.63	1.66	6.61	0.33	0.35	1.31
*Angle*	Mean Value (rad.)			Peak-Valley (10⁻³ · rad.)			Stand. Dev. (10 ⁻³ · rad).
	Θ̄_X	Θ̄_Y	Θ̄_Z	ΔΘ_X	ΔΘ_Y	ΔΘ_Z	σ_{Θ_X}	σ_{Θ_Y}	σ_{Θ_Z}
Worst	−1.5656	−1.5198	3.0904	1.576	1.260	1.232	0.193	0.177	0.176
Mean	−1.5656	−1.5198	3.0904	1.150	1.226	1.210	0.174	0.170	0.169
Best	−1.5656	−1.5198	3.0904	0.810	0.863	0.874	0.169	0.164	0.163

3D localization of a labeled target by means of a stereo vision configuration with subvoxel resolution

Abstract

1. Introduction

2. Measurement Principle

2.1. Basics of phase computations for in-plane position measurement

2.2. Stereo vision configuration and 3D work space calibration

3. Experiments and Results

3.1. Angular measurements

3.2. Method repeatability

3.3. Displacement reconstruction and method specifications

Measurement volume

Method resolution

4. Conclusions

Acknowledgments

References and links

Supplementary Material (4)

Cited By

Figures (8)

Tables (2)

Optics Express