Abstract

We present a method for three-dimensional (3D) tracking of a human finger from a monocular sequence of images. To recover the third dimension from the two-dimensional images, we use the fact that the motion of the human arm is highly constrained owing to the dependencies between elbow and forearm and the physical constraints on joint angles. We use these anthropometric constraints to derive a 3D trajectory of a gesticulating arm. The system is fully automated and does not require human intervention. The system presented can be used as a visualization tool, as a user-input interface, or as part of some gesture-analysis system in which 3D information is important.

© 2004 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. T. Baudel and M. Beaudouin-Lafon, “Charade: remote control of objects using free-hand gestures,” Commun. ACM 36(7), 28–35 (1993).
    [CrossRef]
  2. R. Kjeldsen and J. Kender, “Towards the use of gesture in traditional user interfaces,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 151–156.
    [CrossRef]
  3. M. Krueger, Artificial Reality II (Addison-Wesley, Reading, Mass., 1991).
  4. C. Maggioni, “GestureComputer—new ways of operating a computer,” in Proceedings of the International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1995), pp. 166–171.
  5. J. Segen and S. Kumar, “Human-computer interaction using gesture recognition and 3D hand tracking,” in International Conference on Image Processing (ICIP’98), Volume 3, (Institute of Electrical and Electronics Engineers, New York, 1998), pp. 188–192.
  6. J. Rehg and T. Kanade, “Visual tracking of high DOF articulated structures: an application to human hand tracking,” in Proceedings of the Third European Conference on Computer Vision (Springer-Verlag, Stockholm, 1994), Vol. 2, pp. 35–46.
  7. J. Segen and S. Kumar, “Shadow gestures: 3D hand pose estimation using a single camera,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Institute of Electrical and Electronics Engineers, New York, 1999), Vol. 1, pp. 479–485.
  8. V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual interpretation of hand gestures for human-computer interaction: a review,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997).
    [CrossRef]
  9. D. Gavrila, “The visual analysis of human movement: a survey,” Comput. Vision Image Understand. 73, 82–98 (1999).
    [CrossRef]
  10. T. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vision Image Understand. 81, 231–268 (2001).
    [CrossRef]
  11. R. Kjeldsen and J. Kender, “Finding skin in color images,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 312–317.
    [CrossRef]
  12. B. D. Zarit, B. J. Super, and F. K. H. Quek, “Comparison of five color models in skin pixel classification,” in Proceedings of International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (Institute of Electrical and Electronics Engineers, New York, 1999), pp. 58–63.

2001

T. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vision Image Understand. 81, 231–268 (2001).
[CrossRef]

1999

D. Gavrila, “The visual analysis of human movement: a survey,” Comput. Vision Image Understand. 73, 82–98 (1999).
[CrossRef]

B. D. Zarit, B. J. Super, and F. K. H. Quek, “Comparison of five color models in skin pixel classification,” in Proceedings of International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (Institute of Electrical and Electronics Engineers, New York, 1999), pp. 58–63.

J. Segen and S. Kumar, “Shadow gestures: 3D hand pose estimation using a single camera,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Institute of Electrical and Electronics Engineers, New York, 1999), Vol. 1, pp. 479–485.

1998

J. Segen and S. Kumar, “Human-computer interaction using gesture recognition and 3D hand tracking,” in International Conference on Image Processing (ICIP’98), Volume 3, (Institute of Electrical and Electronics Engineers, New York, 1998), pp. 188–192.

1997

V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual interpretation of hand gestures for human-computer interaction: a review,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997).
[CrossRef]

1996

R. Kjeldsen and J. Kender, “Towards the use of gesture in traditional user interfaces,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 151–156.
[CrossRef]

R. Kjeldsen and J. Kender, “Finding skin in color images,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 312–317.
[CrossRef]

1995

C. Maggioni, “GestureComputer—new ways of operating a computer,” in Proceedings of the International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1995), pp. 166–171.

1994

J. Rehg and T. Kanade, “Visual tracking of high DOF articulated structures: an application to human hand tracking,” in Proceedings of the Third European Conference on Computer Vision (Springer-Verlag, Stockholm, 1994), Vol. 2, pp. 35–46.

1993

T. Baudel and M. Beaudouin-Lafon, “Charade: remote control of objects using free-hand gestures,” Commun. ACM 36(7), 28–35 (1993).
[CrossRef]

1991

M. Krueger, Artificial Reality II (Addison-Wesley, Reading, Mass., 1991).

Baudel, T.

T. Baudel and M. Beaudouin-Lafon, “Charade: remote control of objects using free-hand gestures,” Commun. ACM 36(7), 28–35 (1993).
[CrossRef]

Beaudouin-Lafon, M.

T. Baudel and M. Beaudouin-Lafon, “Charade: remote control of objects using free-hand gestures,” Commun. ACM 36(7), 28–35 (1993).
[CrossRef]

Gavrila, D.

D. Gavrila, “The visual analysis of human movement: a survey,” Comput. Vision Image Understand. 73, 82–98 (1999).
[CrossRef]

Granum, E.

T. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vision Image Understand. 81, 231–268 (2001).
[CrossRef]

Huang, T. S.

V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual interpretation of hand gestures for human-computer interaction: a review,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997).
[CrossRef]

Kanade, T.

J. Rehg and T. Kanade, “Visual tracking of high DOF articulated structures: an application to human hand tracking,” in Proceedings of the Third European Conference on Computer Vision (Springer-Verlag, Stockholm, 1994), Vol. 2, pp. 35–46.

Kender, J.

R. Kjeldsen and J. Kender, “Towards the use of gesture in traditional user interfaces,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 151–156.
[CrossRef]

R. Kjeldsen and J. Kender, “Finding skin in color images,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 312–317.
[CrossRef]

Kjeldsen, R.

R. Kjeldsen and J. Kender, “Finding skin in color images,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 312–317.
[CrossRef]

R. Kjeldsen and J. Kender, “Towards the use of gesture in traditional user interfaces,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 151–156.
[CrossRef]

Krueger, M.

M. Krueger, Artificial Reality II (Addison-Wesley, Reading, Mass., 1991).

Kumar, S.

J. Segen and S. Kumar, “Shadow gestures: 3D hand pose estimation using a single camera,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Institute of Electrical and Electronics Engineers, New York, 1999), Vol. 1, pp. 479–485.

J. Segen and S. Kumar, “Human-computer interaction using gesture recognition and 3D hand tracking,” in International Conference on Image Processing (ICIP’98), Volume 3, (Institute of Electrical and Electronics Engineers, New York, 1998), pp. 188–192.

Maggioni, C.

C. Maggioni, “GestureComputer—new ways of operating a computer,” in Proceedings of the International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1995), pp. 166–171.

Moeslund, T.

T. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vision Image Understand. 81, 231–268 (2001).
[CrossRef]

Pavlovic, V. I.

V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual interpretation of hand gestures for human-computer interaction: a review,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997).
[CrossRef]

Quek, F. K. H.

B. D. Zarit, B. J. Super, and F. K. H. Quek, “Comparison of five color models in skin pixel classification,” in Proceedings of International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (Institute of Electrical and Electronics Engineers, New York, 1999), pp. 58–63.

Rehg, J.

J. Rehg and T. Kanade, “Visual tracking of high DOF articulated structures: an application to human hand tracking,” in Proceedings of the Third European Conference on Computer Vision (Springer-Verlag, Stockholm, 1994), Vol. 2, pp. 35–46.

Segen, J.

J. Segen and S. Kumar, “Shadow gestures: 3D hand pose estimation using a single camera,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Institute of Electrical and Electronics Engineers, New York, 1999), Vol. 1, pp. 479–485.

J. Segen and S. Kumar, “Human-computer interaction using gesture recognition and 3D hand tracking,” in International Conference on Image Processing (ICIP’98), Volume 3, (Institute of Electrical and Electronics Engineers, New York, 1998), pp. 188–192.

Sharma, R.

V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual interpretation of hand gestures for human-computer interaction: a review,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997).
[CrossRef]

Super, B. J.

B. D. Zarit, B. J. Super, and F. K. H. Quek, “Comparison of five color models in skin pixel classification,” in Proceedings of International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (Institute of Electrical and Electronics Engineers, New York, 1999), pp. 58–63.

Zarit, B. D.

B. D. Zarit, B. J. Super, and F. K. H. Quek, “Comparison of five color models in skin pixel classification,” in Proceedings of International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (Institute of Electrical and Electronics Engineers, New York, 1999), pp. 58–63.

Commun. ACM

T. Baudel and M. Beaudouin-Lafon, “Charade: remote control of objects using free-hand gestures,” Commun. ACM 36(7), 28–35 (1993).
[CrossRef]

Comput. Vision Image Understand.

D. Gavrila, “The visual analysis of human movement: a survey,” Comput. Vision Image Understand. 73, 82–98 (1999).
[CrossRef]

T. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vision Image Understand. 81, 231–268 (2001).
[CrossRef]

IEEE Trans. Pattern Anal. Mach. Intell.

V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual interpretation of hand gestures for human-computer interaction: a review,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997).
[CrossRef]

Other

R. Kjeldsen and J. Kender, “Finding skin in color images,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 312–317.
[CrossRef]

B. D. Zarit, B. J. Super, and F. K. H. Quek, “Comparison of five color models in skin pixel classification,” in Proceedings of International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (Institute of Electrical and Electronics Engineers, New York, 1999), pp. 58–63.

R. Kjeldsen and J. Kender, “Towards the use of gesture in traditional user interfaces,” in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1996), pp. 151–156.
[CrossRef]

M. Krueger, Artificial Reality II (Addison-Wesley, Reading, Mass., 1991).

C. Maggioni, “GestureComputer—new ways of operating a computer,” in Proceedings of the International Conference on Automatic Face and Gesture Recognition (Institute of Electrical and Electronics Engineers, New York, 1995), pp. 166–171.

J. Segen and S. Kumar, “Human-computer interaction using gesture recognition and 3D hand tracking,” in International Conference on Image Processing (ICIP’98), Volume 3, (Institute of Electrical and Electronics Engineers, New York, 1998), pp. 188–192.

J. Rehg and T. Kanade, “Visual tracking of high DOF articulated structures: an application to human hand tracking,” in Proceedings of the Third European Conference on Computer Vision (Springer-Verlag, Stockholm, 1994), Vol. 2, pp. 35–46.

J. Segen and S. Kumar, “Shadow gestures: 3D hand pose estimation using a single camera,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Institute of Electrical and Electronics Engineers, New York, 1999), Vol. 1, pp. 479–485.

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (22)

Fig. 1
Fig. 1

(a) Stick model of human upper body. (b) Axes conventions used in this paper.

Fig. 2
Fig. 2

Block diagram of the system.

Fig. 3
Fig. 3

Example of skin image and training mask pair.

Fig. 4
Fig. 4

Example of dot product calculations. Left, K = 3; center and right, K = 2.

Fig. 5
Fig. 5

(a) Contour of the arm, with the highest dot product marked. (b) Contour of the arm and head regions, with the highest dot product marked.

Fig. 6
Fig. 6

Example where the elbow was incorrectly identified as the fingertip with highest dot product point (left image) and corrected with anthropometric constraints (right image).

Fig. 7
Fig. 7

Upper-body skeletons as obtained by the system.

Fig. 8
Fig. 8

Example of occlusion. The first pair (top) shows the person and the output of skin detection just before the start of occlusion. The second pair (middle) shows the occlusion state (head is occluded by the arm). The third pair (bottom) shows the person and corresponding output just after occlusion.

Fig. 9
Fig. 9

Unsmoothed (left) and smoothed (right) trajectories of recorded finger positions.

Fig. 10
Fig. 10

Image showing user’s fully extended arm.

Fig. 11
Fig. 11

Ambiguity due to degenerated observation of a moving point in a spherical trajectory.

Fig. 12
Fig. 12

Ambiguity as viewed after the application of physical constraints. Ambiguous portion is shown by dotted outer curve.

Fig. 13
Fig. 13

Various views of a user’s 3D pose.

Fig. 14
Fig. 14

Frames from a semicircular motion.

Fig. 15
Fig. 15

3D trajectory output from a semicircular motion.

Fig. 16
Fig. 16

Experimental setup and axes conventions as viewed from the top.

Fig. 17
Fig. 17

Person performing a triangular gesture as seen from (a) the front-view camera and (b) from the side-view camera. (c) From left to right: XY, XZ, and YZ projections of the obtained trajectory and the same trajectory from an arbitrary view with a graphical doll interface (axes conventions are the same as in Fig. 16).

Fig. 18
Fig. 18

Person performing a spiral gesture as seen from (a) the front-view camera and (b) the side-view camera. (c) From left to right: XY, XZ, and YZ projections of the obtained trajectory and the same trajectory from an arbitrary view with a graphical doll interface (axes conventions are the same as shown in Fig. 17).

Fig. 19
Fig. 19

Person performing a circular gesture (left) and the gesture trajectory (right).

Fig. 20
Fig. 20

Two people performing a saddle-point gesture (left) and the gesture trajectories (right).

Fig. 21
Fig. 21

Person performing a spiral gesture (left) and the gesture trajectory (right).

Fig. 22
Fig. 22

Validation of results for (a) circular gesture, (b) spiral gesture, (c) triangular gesture, (d) parabolic gesture. First column shows the 2D projections of the system-generated trajectories, which are most similar to those obtained from the side-view camera. Second column shows the trajectories manually obtained from the side-view camera. Third column shows both trajectories superimposed on each other, with the column 1 trajectories shown as broken lines.

Equations (4)

Equations on this page are rendered with MathJax. Learn more.

cx2+cy2+Z2=cr2, cx2+y2+Z2=cr2, Z=±cr2-l21/2, where l=x2+y2, Z=±kr2-l21/2, where k=c.
Zelbow=±krH2-lH21/2, Zfinger=Zelbow±krL2-lL21/2,
Zelbow=krH2-lH21/2.
Zfinger=Zelbow+krL2-lL21/2=krH2-lH21/2+rL2-lL21/2.

Metrics