## Abstract

This article presents a fast and accurate method to measure human faces for medical applications. To encode an object point, several random patterns are projected. A correlation technique, which takes only the area of one pixel into account, is used to locate the homologous points. It could be shown that band limited random patterns are helpful for noise reduction. The comparison of the point cloud of a measured plane with an ideal one showed a standard deviation less then 50*μ*m. Furthermore a depth difference of 20*μ*m is detectable.

© 2006 Optical Society of America

## 1. Introduction

The advantages of optical measurements like fast data acquisition, non-contact measurement or the possibility of soft tissue measurements are used in a wide range of technical, medical and security applications.

The aim of this work is to achieve a precise 3-D model of a human face for computer aided surgeries in dentistry. Due to the fact that mainly children are the patients, fast data acquisition is essential. The time for measurement should be less than 3*s* and the measurement accuracy better than 100μmm. Furthermore a cost-saving measuring setup is desired.

## 2. Method

Like human stereovision, photogrammetric techniques use the same basic principle to get 3-D information of the environment : images of the object are captured from two different perspectives. Pairs of image points resulting from the same object point are called homologous points. These points given, the object can be reconstructed using triangulation methods.

A sketch of the technical realization is given in Fig. 1. For image acquisition a convergent arrangement of two cameras is applied. The camera model which is used to describe the process of image capturing is the pinhole camera. For a precise reconstruction, all parameters of this model have to be known exactly. The parameters can be divided in intrinsic and extrinsic ones. The most important intrinsic parameter is the ratio of the distance projection centre — image plane and the pixel size. Further intrinsic parameters are the coordinates of the principal point, which is the perpendicularly projected projection centre (*x→ _{cl}* and x→

_{cr}) in the corresponding image plane. In addition to the pinhole model, anisotropy and shear have been taken into account, distortion is not yet included. The extrinsic six camera parameters (3 for the centre of projection and 3 for the angles of rotation) describe the position of the camera in an external world coordinate system by a simple Euclidian transformation. To reduce the number of parameters, the world coordinate system is identified with the system of the left camera. Actually, the intrinsic parameters of both cameras are determined by a previously calibration procedure using a planar calibration pattern [1].

From the located homologous points the Essential-Matrix is calculated with the normalized Eight-Point Algorithm [2]. The extrinsic parameters are calculated from this matrix using quaternions [3]. This procedure makes the arrangement insensitive against environmental changes because the relative orientation of the cameras is determined from the homologous points.

The major task of a stereophotogrammetric measurement system is the solution of the correspondence problem. The surface texture of many objects, like human faces, is too homogeneously for the detection of homologous points. This problem can be eliminated by projection of an intensity pattern onto the object. Commonly used area based correlation techniques [4] are limited in the reachable precision caused by a deformation between the two images. This problem is solved with a correlation technique, which reduces the correlation area to the size of one pixel [5]. This leads especially at higher profile gradients to a denser and more accurate point cloud. To realize this, a sequence of about 20 stochastically generated patterns is projected onto the object. Experiments showed that less patterns yield to more outliers and poorer accuracy, more do not improve the accuracy. As a result, every point of the object surface is encoded by an individual sequence of intensity values. The intensity structure of a point *u→ _{l}* in the left camera will now be compared with such ones lying on the related epipolar line

*ep*in the right camera to find the corresponding point

_{u→l}*u→*. The correlation coefficient

_{r}*ρ*between the intensity sequences is used as a matching criterion:

Whereas *l _{i}* is the intensity at the current position in picture

*i*of the left camera, and

*l→*the average intensity of the pixel over all

*N*images. The terms for the right camera are analogous. This implies that the transformation between the intensity of homologous points is only linear. Therefore nonlinearities, for example the gain of the cameras or angle dependent scattering may lead to systematic measurement errors. A point is accepted as a homologous one if

*ρ*exceeds a certain threshold (e.g.

*ρ*= 0.9). This threshold is essential to suppress remaining outliers, which mainly occur if an object point is only visible in one camera.

_{th}## 3. Experimental setup

The previously described stereo-photogrammetrical method is realized by an experimental setup, which uses two Fire-Wire-Cameras with 1.3MP (1280 × 960, pixel size = 4.65*μ*m,) and a commercially available XGA-projector (1024×768). The focal length of the camera lenses is 25mm and therefore a camera has a diagonally angle of view of about 2*θ* = 17°. The lateral resolution, caused by pixel size and focal length, is about 0.4mm whereas the longitudinal resolution covers a range of 0.4mm to 0.8mm, depending on the angle between optical axes of the cameras. In this setup the maximum allowed angle between the optical axes is limited by the nose of the person, because both sides of the nose need to be visible as well as possible in both cameras. We used an angle of 20° which leads to a longitudinal resolution of 0.8mm. The distance between one camera and the measured person is about 1.1m. The measurement volume is about 250×200×180 (H×W×D/mm).

## 4. Data processing and optimization

#### 4.1. Subpixel interpolation

It is self-evident that an object point, which is mapped in the middle of one pixel in the left camera, will not be mapped into the middle of one in the right camera. Thus, for precise measurements, the position of the corresponding pixel has to be located with subpixel accuracy. The taken images are sampled signals, so the original signal can be reconstructed at any subpixel position. Due to the long computing time the exact reconstruction using the sinc-interpolation is not possible. The simplest interpolation method is the bilinear one, which uses the 4 nearest neighbours to calculate the desired value. At first we interpolate the intensity values at a specified subpixel position in each image and afterwards compute the correlation coefficient *ρ* for this position. The position is shifted till *ρ* reaches a maximum. Figure 2 shows an example of a bilinear computed subpixel correlation function in a 2×2 sensor field. The central value (u = v = 100) corresponds to the maximum of the integer value based search. This example shows that a maximum can occur in any of the four quadrants. Therefore, four search algorithms are required. The function displayed in Fig. 3 is computed with a bicubic algorithm [6].

This interpolation method takes the 16 nearest neighbours into account to compute an interim value. As a result, the new correlation function has only a single maximum. Therefore, only one optimization algorithm is needed. The reason for this behaviour is that between two adjacent quadrants only 50% of the input data for the subpixel interpolation remain unchanged for the bilinear interpolation in comparison to 75% for the bicubic one. It should be mentioned here that a single maximum for the bicubic interpolation cannot be guaranteed, but is in practise true for most relevant cases. The disadvantage of the time consuming bicubic algorithm is compensated by the need of only a single search algorithm.

#### 4.2. Pattern structures

The disadvantages of the faster interpolation algorithms in comparison to the sinc-interpolation are their poorer transfer functions. As a result, the initially used binary patterns yield no reasonable results. Therefore, the patterns should be limited in their spatial frequency. Additionally the minimal frequency of the patterns should also be modified, because low frequencies lead to large homogeneous areas in the patterns, which produce flat correlation functions as shown in Fig. 2 and Fig. 3. The result of such flat correlation functions is higher noise. To avoid the negative influence of the pixelized patterns, the projector was defocused a little bit.

Figure 4 gives an example of a binary pattern, at which every 2×2 pixel block was randomly switched to black or white. The other two patterns are the fourier transformations of random spectrums. For the right image the high and low frequencies had been surpressed, for the middle one only the high frequencies.

## 5. Results

#### 5.1. Evaluation of the measurement method

To verify the accuracy of the measurement system, two well known objects have been tested. At first for quantification of the minimal resolvable height step a matt finished aluminium plate with milled grooves from 5*μ*m to 160*μ*m was used. To separate this feature of interest from deficiencies caused by imperfect calibration a two-dimensional polynomial fit of fourth order was subtracted. As a result, Fig. 5 shows both the resolved step height of 20*μ*m and the improved quality of measurement with the optimized illumination structures. No additional filtering of the data has been carried out.

To check the absolute measurement accuracy a matt finished calibrated granite plate was used. Height and width of this plate filled out the full field of view. All influences of imperfect determination of the intrinsic and extrinsic calibration parameters of the cameras, the uncon-sidered distortion and the computation of the coordinates with the subpixel interpolation can be seen. No filtering and no global fitting except of subtracting the best fit plane were applied to the results shown in Fig. 6. The absolute error of the full field is less than 0.3mm whereas the standard deviation (rms) is smaller than 50*μ*m. The ratio of rms error to the realized measurement field height of 250mm is better than 2∗10^{-4}.

#### 5.2. 3-D-Measurements of human faces

Figure 7 gives an example of a measured human face. The related point cloud consists of more than 7∗10^{5} points. The computing time is about 4min on a AMD XP2600+ with 1GB RAM. The detailed views were taken from the initial point cloud at the left side and show that also at regions which are difficult to measure (i.e. eyelashes or the glossy sclera) dense point clouds with low noise were achieved. The gaps in the point cloud under the eyes and on the nose are caused by missing homologous points in the right camera. Of course the pupil could not be measured because the cameras do not get an analyzable signal from there. It may seem that the iris under the cornea had also been measured. But of course the refraction of the cornea was not considered. The upper eyelashes are very hard to detect due to their small size and their movement during the measurement period.

## 6. Conclusion

This article shows that bandlimited projection patterns in combination with subpixel interpolation can improve the capability of 3-D measurements by stereo-photogrammetry. The experimentally determined values for the sensitivity concerning step height detections and the full field uncertainty of measurement are 20*μ*m and 50*μ*m (rms) respectively, whereas the absolute error is less than 0.3mm. The accuracy with respect to the lateral measuring range is about 2 ∗ 10^{-4}. The realized accuracy is sufficient for medical measurements of human faces. The short period of image acquisition (< 3 seconds), the low hardware requirements and the self calibration of the extrinsic parameters are additional advantages of this method. Until now the in the beginning mentioned assumption of a linear intensity transformation between homologous points does not lead to noticeable measurement errors.

In further developments the digital projector will be replaced by a cheaper analogue projection unit, which will allow non pixelized projection patterns. Furthermore, the distortion has to be added to the camera model to overcome the main geometry error. Therefore, stable calibration algorithms have to be implemented.

## Acknowledgments

This project was supported by the Thuringia ministry of science, research and culture under the topic: ‘3-D shape measurement for function orientated diagnostic and therapy in dentistry’.

## References and links

**1. **Y. Ma, S. Soatto, and J. Kosecka, *An Invitation to 3-D Vision* (Springer, 2003)

**2. **R. I. Hartley, “In Defense of the Eight-Point Algorithm,” in *IEEE Transactions on Pattern Analysis and Machine Intelligence*, Vol. 19 , No. 6, pp. 580–593, (1997)

**3. **O. Faugeras, *Three-Dimensional Computer Vision (Artificial Intelligence)* (MIT Press, 1993)

**4. **F. Devernay, O. Bantiche, and E. Coste-Manire, “Structured light on dynamic scenes using standard stereoscopy algorithms,” in *Rapport de recherche de l’INRIA*, No. 4477, (June 2002), http://www.inria.fr/rrrt/rr-4477.html

**5. **P. Albrecht and B. Michaelis, “Stereo Photogrammetry with Improved Spatial Resolution,” *in 14th International Conference on Pattern Recognition*, pp. 845–849, (1998)

**6. **I. E. Abdou and K. Y. Wong, “Analysis of Linear Interpolation Schemes for Bi-Level Image Applications” in *IBM Journal of Research and Development*, Vol. **26**, No. 6, pp. 667–680, (1982)