Abstract

We address the problem of body pose tracking in a scenario of multiple camera setup with the aim of recovering body motion robustly and accurately. The tracking is performed on three-dimensional (3D) space using 3D data, including colored volume and 3D optical flow, which are reconstructed at each time step. We introduce strategies to compute multiple camera-based 3D optical flow and have attained efficient and robust 3D motion estimation. Body pose estimation starts with a prediction using 3D optical flow and then is changed to a lower-dimensional global optimization problem. Our method utilizes a voxel subject-specific body model, exploits multiple 3D image cues, and incorporates physical constraints into a stochastic particle-based search initialized from the deterministic prediction and stochastic sampling. It leads to a robust 3D pose tracker. Experiments on publicly available sequences show the robustness and accuracy of our approach.

© 2012 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture by annealed particle filtering,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 126–133.
  2. C. Theobalt, J. Carranza, M. A. Magnor, and H. P. Seidel, “Enhancing silhouette-based human motion capture with 3D motion fields,” in IEEE Pacific Conference on Computer Graphics and Applications (IEEE, 2003), pp. 185–193.
  3. I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, “Human body model acquisition and tracking using voxel data,” Int. J. Comput. Vis. 53, 199–223 (2003).
    [CrossRef]
  4. R. Kehl, M. Bray, and L. V. Gool, “Full body tracking from multiple views using stochastic sampling,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 2, pp. 129–136.
  5. L. Mundermann, S. Corazza, and T. P. Andriacchi, “Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–6.
  6. J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.
  7. J. MacCormick and M. Isard, “Partitioned sampling, articulated objects, and interface-quality hand tracking,” in European Conference on Computer Vision, Vol. 1843 of Lecture Notes in Computer Science (Springer, 2000), pp. 3–19.
  8. C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Discriminative density propagation for 3D human motion estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 1, pp. 390–397.
  9. M. Bray, E. Koller-Meier, and L. V. Gool, “Smart particle filtering for high-dimensional tracking,” Comput. Vis. Image Understanding 106, 116–129 (2007).
    [CrossRef]
  10. Z. Zhang, H. S. Seah, C. K. Quah, and J. Sun, “A markerless motion capture system with automatic subject-specific body model acquisition and robust pose tracking from 3D data,” in IEEE International Conference on Image Processing (IEEE, 2011), pp. 525–528.
  11. A. Laurentini, “The visual hull: a new tool for contour-based image understanding,” in 7th Scandinavian Conference on Image Analysis (Springer, 1991), pp. 993–1002.
  12. R. Szeliski, “Rapid octree construction from image sequences,” CVGIP Image Understanding 58, 23–32 (1993).
    [CrossRef]
  13. G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A real time system for robust 3D voxel reconstruction of human motions,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 714–720.
  14. W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 2000), pp. 369–374.
  15. W. Matusik, C. Buehler, and L. McMillan, “Polyhedral visual hulls for real-time rendering,” in Proceedings of the 12th Eurographics Workshop on Rendering Techniques (Springer, 2001), pp. 115–126.
  16. S. Lazebnik, E. Boyer, and J. Ponce, “On computing exact visual hull of solids bounded by smooth surfaces,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2001), Vol. 1, pp. I156–I161.
  17. A. Ladikos, S. Benhimane, and N. Navab, “Efficient visual hull computation for real-time 3D reconstruction using CUDA,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2008), pp. 1–8.
  18. S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
    [CrossRef]
  19. M. Gong and Y.-H. Yang, “Disparity flow estimation using orthogonal reliability-based dynamic programming,” in IEEE International Conference on Pattern Recognition (IEEE, 2006), pp. 70–73.
  20. R. L. Carceroni and K. N. Kutulakos, “Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance,” Int. J. Comput. Vis. 49, 175–214 (2002).
    [CrossRef]
  21. T. Can, A. O. Karali, and T. Aytac, “Detection and tracking of sea-surface targets in infrared and visual band videos using the bag-of-features technique with scale-invariant feature transform,” Appl. Opt. 50, 6302–6312 (2011).
    [CrossRef]
  22. C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel, “Combining 3D flow fields with silhouette-based human motion capture for immersive video,” Graph. Models 22, 540–547 (2004).
  23. J.-Y. Bouguet, “Pyramidal implementation of the Lucas-Kanade feature tracker,” Microprocessor Research Labs Tech. Rep. (2000).
  24. K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Q. Appl. Math. 2, 164–168(1994).
  25. Z. Zhang, H. S. Seah, and J. Sun, “A hybrid particle swarm optimization with cooperative method for multi-object tracking,” presented at the IEEE Congress on Evolutionary Computing, Brisbane, Australia, 10–15June2012.
  26. NVIDIA, NVIDIA CUDA programming guide, http://www.nvidia.com/object/cudahomenew.html .
  27. J. Marzat, Y. Dumortier, and A. Ducrot, “Real-time dense and accurate parallel optical flow using CUDA,” 7th International Conference WSCG (2009).
  28. L. Sigal and M. J. Black, “HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion,” Tech. Rep. (Brown University, 2006).
  29. T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” in IEEE International Conference on Computer Vision (IEEE, 1999), pp. 1–19.

2011 (1)

2007 (1)

M. Bray, E. Koller-Meier, and L. V. Gool, “Smart particle filtering for high-dimensional tracking,” Comput. Vis. Image Understanding 106, 116–129 (2007).
[CrossRef]

2005 (1)

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
[CrossRef]

2004 (1)

C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel, “Combining 3D flow fields with silhouette-based human motion capture for immersive video,” Graph. Models 22, 540–547 (2004).

2003 (1)

I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, “Human body model acquisition and tracking using voxel data,” Int. J. Comput. Vis. 53, 199–223 (2003).
[CrossRef]

2002 (1)

R. L. Carceroni and K. N. Kutulakos, “Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance,” Int. J. Comput. Vis. 49, 175–214 (2002).
[CrossRef]

1994 (1)

K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Q. Appl. Math. 2, 164–168(1994).

1993 (1)

R. Szeliski, “Rapid octree construction from image sequences,” CVGIP Image Understanding 58, 23–32 (1993).
[CrossRef]

Andriacchi, T. P.

L. Mundermann, S. Corazza, and T. P. Andriacchi, “Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–6.

Aytac, T.

Baker, S.

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
[CrossRef]

Benhimane, S.

A. Ladikos, S. Benhimane, and N. Navab, “Efficient visual hull computation for real-time 3D reconstruction using CUDA,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2008), pp. 1–8.

Black, M. J.

L. Sigal and M. J. Black, “HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion,” Tech. Rep. (Brown University, 2006).

Blake, A.

J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture by annealed particle filtering,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 126–133.

Bouguet, J.-Y.

G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A real time system for robust 3D voxel reconstruction of human motions,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 714–720.

J.-Y. Bouguet, “Pyramidal implementation of the Lucas-Kanade feature tracker,” Microprocessor Research Labs Tech. Rep. (2000).

Boyer, E.

S. Lazebnik, E. Boyer, and J. Ponce, “On computing exact visual hull of solids bounded by smooth surfaces,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2001), Vol. 1, pp. I156–I161.

Bray, M.

M. Bray, E. Koller-Meier, and L. V. Gool, “Smart particle filtering for high-dimensional tracking,” Comput. Vis. Image Understanding 106, 116–129 (2007).
[CrossRef]

R. Kehl, M. Bray, and L. V. Gool, “Full body tracking from multiple views using stochastic sampling,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 2, pp. 129–136.

Buehler, C.

W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 2000), pp. 369–374.

W. Matusik, C. Buehler, and L. McMillan, “Polyhedral visual hulls for real-time rendering,” in Proceedings of the 12th Eurographics Workshop on Rendering Techniques (Springer, 2001), pp. 115–126.

Can, T.

Carceroni, R. L.

R. L. Carceroni and K. N. Kutulakos, “Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance,” Int. J. Comput. Vis. 49, 175–214 (2002).
[CrossRef]

Carranza, J.

C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel, “Combining 3D flow fields with silhouette-based human motion capture for immersive video,” Graph. Models 22, 540–547 (2004).

C. Theobalt, J. Carranza, M. A. Magnor, and H. P. Seidel, “Enhancing silhouette-based human motion capture with 3D motion fields,” in IEEE Pacific Conference on Computer Graphics and Applications (IEEE, 2003), pp. 185–193.

Cheung, G. K.

G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A real time system for robust 3D voxel reconstruction of human motions,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 714–720.

Collins, R.

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
[CrossRef]

Corazza, S.

L. Mundermann, S. Corazza, and T. P. Andriacchi, “Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–6.

Cosman, P.

I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, “Human body model acquisition and tracking using voxel data,” Int. J. Comput. Vis. 53, 199–223 (2003).
[CrossRef]

Davis, L. S.

T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” in IEEE International Conference on Computer Vision (IEEE, 1999), pp. 1–19.

de Aguiar, E.

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.

Deutscher, J.

J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture by annealed particle filtering,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 126–133.

Ducrot, A.

J. Marzat, Y. Dumortier, and A. Ducrot, “Real-time dense and accurate parallel optical flow using CUDA,” 7th International Conference WSCG (2009).

Dumortier, Y.

J. Marzat, Y. Dumortier, and A. Ducrot, “Real-time dense and accurate parallel optical flow using CUDA,” 7th International Conference WSCG (2009).

Gall, J.

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.

Gong, M.

M. Gong and Y.-H. Yang, “Disparity flow estimation using orthogonal reliability-based dynamic programming,” in IEEE International Conference on Pattern Recognition (IEEE, 2006), pp. 70–73.

Gool, L. V.

M. Bray, E. Koller-Meier, and L. V. Gool, “Smart particle filtering for high-dimensional tracking,” Comput. Vis. Image Understanding 106, 116–129 (2007).
[CrossRef]

R. Kehl, M. Bray, and L. V. Gool, “Full body tracking from multiple views using stochastic sampling,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 2, pp. 129–136.

Gortler, S.

W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 2000), pp. 369–374.

Harwood, D.

T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” in IEEE International Conference on Computer Vision (IEEE, 1999), pp. 1–19.

Holler, M.

G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A real time system for robust 3D voxel reconstruction of human motions,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 714–720.

Horprasert, T.

T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” in IEEE International Conference on Computer Vision (IEEE, 1999), pp. 1–19.

Hunter, E.

I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, “Human body model acquisition and tracking using voxel data,” Int. J. Comput. Vis. 53, 199–223 (2003).
[CrossRef]

Isard, M.

J. MacCormick and M. Isard, “Partitioned sampling, articulated objects, and interface-quality hand tracking,” in European Conference on Computer Vision, Vol. 1843 of Lecture Notes in Computer Science (Springer, 2000), pp. 3–19.

Kanade, T.

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
[CrossRef]

G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A real time system for robust 3D voxel reconstruction of human motions,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 714–720.

Kanaujia, A.

C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Discriminative density propagation for 3D human motion estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 1, pp. 390–397.

Karali, A. O.

Kehl, R.

R. Kehl, M. Bray, and L. V. Gool, “Full body tracking from multiple views using stochastic sampling,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 2, pp. 129–136.

Koller-Meier, E.

M. Bray, E. Koller-Meier, and L. V. Gool, “Smart particle filtering for high-dimensional tracking,” Comput. Vis. Image Understanding 106, 116–129 (2007).
[CrossRef]

Kutulakos, K. N.

R. L. Carceroni and K. N. Kutulakos, “Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance,” Int. J. Comput. Vis. 49, 175–214 (2002).
[CrossRef]

Ladikos, A.

A. Ladikos, S. Benhimane, and N. Navab, “Efficient visual hull computation for real-time 3D reconstruction using CUDA,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2008), pp. 1–8.

Laurentini, A.

A. Laurentini, “The visual hull: a new tool for contour-based image understanding,” in 7th Scandinavian Conference on Image Analysis (Springer, 1991), pp. 993–1002.

Lazebnik, S.

S. Lazebnik, E. Boyer, and J. Ponce, “On computing exact visual hull of solids bounded by smooth surfaces,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2001), Vol. 1, pp. I156–I161.

Levenberg, K.

K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Q. Appl. Math. 2, 164–168(1994).

Li, Z.

C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Discriminative density propagation for 3D human motion estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 1, pp. 390–397.

MacCormick, J.

J. MacCormick and M. Isard, “Partitioned sampling, articulated objects, and interface-quality hand tracking,” in European Conference on Computer Vision, Vol. 1843 of Lecture Notes in Computer Science (Springer, 2000), pp. 3–19.

Magnor, M.

C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel, “Combining 3D flow fields with silhouette-based human motion capture for immersive video,” Graph. Models 22, 540–547 (2004).

Magnor, M. A.

C. Theobalt, J. Carranza, M. A. Magnor, and H. P. Seidel, “Enhancing silhouette-based human motion capture with 3D motion fields,” in IEEE Pacific Conference on Computer Graphics and Applications (IEEE, 2003), pp. 185–193.

Marzat, J.

J. Marzat, Y. Dumortier, and A. Ducrot, “Real-time dense and accurate parallel optical flow using CUDA,” 7th International Conference WSCG (2009).

Matusik, W.

W. Matusik, C. Buehler, and L. McMillan, “Polyhedral visual hulls for real-time rendering,” in Proceedings of the 12th Eurographics Workshop on Rendering Techniques (Springer, 2001), pp. 115–126.

W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 2000), pp. 369–374.

McMillan, L.

W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 2000), pp. 369–374.

W. Matusik, C. Buehler, and L. McMillan, “Polyhedral visual hulls for real-time rendering,” in Proceedings of the 12th Eurographics Workshop on Rendering Techniques (Springer, 2001), pp. 115–126.

Metaxas, D.

C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Discriminative density propagation for 3D human motion estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 1, pp. 390–397.

Mikic, I.

I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, “Human body model acquisition and tracking using voxel data,” Int. J. Comput. Vis. 53, 199–223 (2003).
[CrossRef]

Mundermann, L.

L. Mundermann, S. Corazza, and T. P. Andriacchi, “Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–6.

Navab, N.

A. Ladikos, S. Benhimane, and N. Navab, “Efficient visual hull computation for real-time 3D reconstruction using CUDA,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2008), pp. 1–8.

Ponce, J.

S. Lazebnik, E. Boyer, and J. Ponce, “On computing exact visual hull of solids bounded by smooth surfaces,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2001), Vol. 1, pp. I156–I161.

Quah, C. K.

Z. Zhang, H. S. Seah, C. K. Quah, and J. Sun, “A markerless motion capture system with automatic subject-specific body model acquisition and robust pose tracking from 3D data,” in IEEE International Conference on Image Processing (IEEE, 2011), pp. 525–528.

Rander, P.

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
[CrossRef]

Raskar, R.

W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 2000), pp. 369–374.

Reid, I.

J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture by annealed particle filtering,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 126–133.

Rosenhahn, B.

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.

Seah, H. S.

Z. Zhang, H. S. Seah, C. K. Quah, and J. Sun, “A markerless motion capture system with automatic subject-specific body model acquisition and robust pose tracking from 3D data,” in IEEE International Conference on Image Processing (IEEE, 2011), pp. 525–528.

Z. Zhang, H. S. Seah, and J. Sun, “A hybrid particle swarm optimization with cooperative method for multi-object tracking,” presented at the IEEE Congress on Evolutionary Computing, Brisbane, Australia, 10–15June2012.

Seidel, H. P.

C. Theobalt, J. Carranza, M. A. Magnor, and H. P. Seidel, “Enhancing silhouette-based human motion capture with 3D motion fields,” in IEEE Pacific Conference on Computer Graphics and Applications (IEEE, 2003), pp. 185–193.

Seidel, H.-P.

C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel, “Combining 3D flow fields with silhouette-based human motion capture for immersive video,” Graph. Models 22, 540–547 (2004).

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.

Sigal, L.

L. Sigal and M. J. Black, “HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion,” Tech. Rep. (Brown University, 2006).

Sminchisescu, C.

C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Discriminative density propagation for 3D human motion estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 1, pp. 390–397.

Stoll, C.

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.

Sun, J.

Z. Zhang, H. S. Seah, C. K. Quah, and J. Sun, “A markerless motion capture system with automatic subject-specific body model acquisition and robust pose tracking from 3D data,” in IEEE International Conference on Image Processing (IEEE, 2011), pp. 525–528.

Z. Zhang, H. S. Seah, and J. Sun, “A hybrid particle swarm optimization with cooperative method for multi-object tracking,” presented at the IEEE Congress on Evolutionary Computing, Brisbane, Australia, 10–15June2012.

Szeliski, R.

R. Szeliski, “Rapid octree construction from image sequences,” CVGIP Image Understanding 58, 23–32 (1993).
[CrossRef]

Theobalt, C.

C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel, “Combining 3D flow fields with silhouette-based human motion capture for immersive video,” Graph. Models 22, 540–547 (2004).

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.

C. Theobalt, J. Carranza, M. A. Magnor, and H. P. Seidel, “Enhancing silhouette-based human motion capture with 3D motion fields,” in IEEE Pacific Conference on Computer Graphics and Applications (IEEE, 2003), pp. 185–193.

Trivedi, M.

I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, “Human body model acquisition and tracking using voxel data,” Int. J. Comput. Vis. 53, 199–223 (2003).
[CrossRef]

Vedula, S.

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
[CrossRef]

Yang, Y.-H.

M. Gong and Y.-H. Yang, “Disparity flow estimation using orthogonal reliability-based dynamic programming,” in IEEE International Conference on Pattern Recognition (IEEE, 2006), pp. 70–73.

Zhang, Z.

Z. Zhang, H. S. Seah, C. K. Quah, and J. Sun, “A markerless motion capture system with automatic subject-specific body model acquisition and robust pose tracking from 3D data,” in IEEE International Conference on Image Processing (IEEE, 2011), pp. 525–528.

Z. Zhang, H. S. Seah, and J. Sun, “A hybrid particle swarm optimization with cooperative method for multi-object tracking,” presented at the IEEE Congress on Evolutionary Computing, Brisbane, Australia, 10–15June2012.

Appl. Opt. (1)

Comput. Vis. Image Understanding (1)

M. Bray, E. Koller-Meier, and L. V. Gool, “Smart particle filtering for high-dimensional tracking,” Comput. Vis. Image Understanding 106, 116–129 (2007).
[CrossRef]

CVGIP Image Understanding (1)

R. Szeliski, “Rapid octree construction from image sequences,” CVGIP Image Understanding 58, 23–32 (1993).
[CrossRef]

Graph. Models (1)

C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel, “Combining 3D flow fields with silhouette-based human motion capture for immersive video,” Graph. Models 22, 540–547 (2004).

IEEE Trans. Pattern Anal. Machine Intell. (1)

S. Vedula, S. Baker, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE Trans. Pattern Anal. Machine Intell. 27, 475–480 (2005).
[CrossRef]

Int. J. Comput. Vis. (2)

I. Mikic, M. Trivedi, E. Hunter, and P. Cosman, “Human body model acquisition and tracking using voxel data,” Int. J. Comput. Vis. 53, 199–223 (2003).
[CrossRef]

R. L. Carceroni and K. N. Kutulakos, “Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance,” Int. J. Comput. Vis. 49, 175–214 (2002).
[CrossRef]

Q. Appl. Math. (1)

K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” Q. Appl. Math. 2, 164–168(1994).

Other (21)

Z. Zhang, H. S. Seah, and J. Sun, “A hybrid particle swarm optimization with cooperative method for multi-object tracking,” presented at the IEEE Congress on Evolutionary Computing, Brisbane, Australia, 10–15June2012.

NVIDIA, NVIDIA CUDA programming guide, http://www.nvidia.com/object/cudahomenew.html .

J. Marzat, Y. Dumortier, and A. Ducrot, “Real-time dense and accurate parallel optical flow using CUDA,” 7th International Conference WSCG (2009).

L. Sigal and M. J. Black, “HumanEva: synchronized video and motion capture dataset for evaluation of articulated human motion,” Tech. Rep. (Brown University, 2006).

T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” in IEEE International Conference on Computer Vision (IEEE, 1999), pp. 1–19.

R. Kehl, M. Bray, and L. V. Gool, “Full body tracking from multiple views using stochastic sampling,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 2, pp. 129–136.

L. Mundermann, S. Corazza, and T. P. Andriacchi, “Accurately measuring human movement using articulated ICP with soft-joint constraints and a repository of articulated models,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–6.

J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, and H.-P. Seidel, “Motion capture using joint skeleton tracking and surface estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 1746–1753.

J. MacCormick and M. Isard, “Partitioned sampling, articulated objects, and interface-quality hand tracking,” in European Conference on Computer Vision, Vol. 1843 of Lecture Notes in Computer Science (Springer, 2000), pp. 3–19.

C. Sminchisescu, A. Kanaujia, Z. Li, and D. Metaxas, “Discriminative density propagation for 3D human motion estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2005), Vol. 1, pp. 390–397.

Z. Zhang, H. S. Seah, C. K. Quah, and J. Sun, “A markerless motion capture system with automatic subject-specific body model acquisition and robust pose tracking from 3D data,” in IEEE International Conference on Image Processing (IEEE, 2011), pp. 525–528.

A. Laurentini, “The visual hull: a new tool for contour-based image understanding,” in 7th Scandinavian Conference on Image Analysis (Springer, 1991), pp. 993–1002.

J. Deutscher, A. Blake, and I. Reid, “Articulated body motion capture by annealed particle filtering,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 126–133.

C. Theobalt, J. Carranza, M. A. Magnor, and H. P. Seidel, “Enhancing silhouette-based human motion capture with 3D motion fields,” in IEEE Pacific Conference on Computer Graphics and Applications (IEEE, 2003), pp. 185–193.

M. Gong and Y.-H. Yang, “Disparity flow estimation using orthogonal reliability-based dynamic programming,” in IEEE International Conference on Pattern Recognition (IEEE, 2006), pp. 70–73.

J.-Y. Bouguet, “Pyramidal implementation of the Lucas-Kanade feature tracker,” Microprocessor Research Labs Tech. Rep. (2000).

G. K. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler, “A real time system for robust 3D voxel reconstruction of human motions,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2000), Vol. 2, pp. 714–720.

W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based visual hulls,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 2000), pp. 369–374.

W. Matusik, C. Buehler, and L. McMillan, “Polyhedral visual hulls for real-time rendering,” in Proceedings of the 12th Eurographics Workshop on Rendering Techniques (Springer, 2001), pp. 115–126.

S. Lazebnik, E. Boyer, and J. Ponce, “On computing exact visual hull of solids bounded by smooth surfaces,” in IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2001), Vol. 1, pp. I156–I161.

A. Ladikos, S. Benhimane, and N. Navab, “Efficient visual hull computation for real-time 3D reconstruction using CUDA,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2008), pp. 1–8.

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1.

Approach overview.

Fig. 2.
Fig. 2.

Voxel-based subject-specific body model. (a) The body model comprises a skeleton model and a colored voxel-based shape model. (b) Another representation that we often use for more clear result visualization. (c) The model has five open kinematic chains and 36 degrees of freedom.

Fig. 3.
Fig. 3.

View-independent 3D scene flow of one frame in a “Wheel” sequence. For visualization, each scene flow is represented by one 3D arrow in blue (small motion) to red (large motion), of which the length and direction are the magnitude and direction of the scene flow vector, respectively. Green denotes that no valid scene flow is recovered at that voxel.

Fig. 4.
Fig. 4.

(a) Flow-reconstruction error of each frame on the “Wheel” sequence. (b) One frame example: the reconstructed voxel points are shown in green, and the points obtained by “flowing” these voxels at the last frame are shown in red.

Fig. 5.
Fig. 5.

Typical tracking failures. The reconstructed volume is shown by the green point cloud, and the body model is at the estimated posture configuration. Note that only the skeleton of the body model is shown for comparison. (a) The left forearm is protruding into the torso. (b) The pose estimation loses accuracy in the parameters of the left forearm.

Fig. 6.
Fig. 6.

(a) Volumetric reconstruction time for different voxel resolutions. The ratio of (b) invalid scene flows and (c) flow-reconstruction errors with voxel resolutions N=100 and N=128.

Fig. 7.
Fig. 7.

Scene flow computation at different LK window sizes w with pyramidal level l=3. (a) w=40 and (b) w=70. In this example frame, the motion of the left calf is very large. It is seen that setting w=40 results in many noisy scene flows, while with w=70 the reconstruction is much more robust.

Fig. 8.
Fig. 8.

(a) Reconstructed scene flow and the result of the tracker using scene flow. (b) Result obtained by hierarchical optimization without using scene flow information.

Fig. 9.
Fig. 9.

Average joint position errors of the three methods on each sequence.

Fig. 10.
Fig. 10.

Example tracking results. Each sample gives the volume data (green), scene flow estimation, and the tracked body pose where only the skeleton of the body model is shown. Top two rows: “Dance” sequence, where every 40th frame between frame 196 and frame 440 are shown. Middle row: “Wheel” sequence (frames 70, 95, 110, 143, 166). Bottom row: “Handstand” sequence (frames 75, 90, 103, 127, 264, 390).

Fig. 11.
Fig. 11.

(a) Relative joint position error of HumanEva-II sequence S2 (frames 1–50). (b) One example estimated scene flow and the tracked result.

Tables (1)

Tables Icon

Table 1. Algorithm Parameters and Timings of GPU and CPU Versions of Scene Flow Computation on the “Wheel “Public Multi-Image Sequence [6]

Equations (15)

Equations on this page are rendered with MathJax. Learn more.

q^w=M(ϕ)in1M(θni)q^l,
dujdt=ujpidpidt,
Bdpidt=U.
efr=1nt*si0pi*F(pi*,Vt+1),
F(θj)=kBji(qik(θj)qik(xt))sik2,
Ek=1nkinkqikF(qik,Vt+1)2,
Ev(x^t+1)=1Nmi=1Nmqi(x^t+1)F(qi(x^t+1),Vt+1)2.
λi={δqiVvol,elseδ1Bothers,elseδ2Fvothers,else1,
E^v(x^t+1)=1Nmi=1Nmλiqi(x^t+1)F(qi(x^t+1),Vt+1)2.
Ea(x^t+1)=12bwbc=12(1i=1Kpc,iqc,i(x^t+1)),
Es(x^t+1)=||12(xt1+x^t+1)xt||2.
E(x^t+1)=E^v(x^t+1)+waEa(x^t+1)+wsEs(x^t+1).
Xi,k=Qi,k*+N(0,1)|Pm,kPg,k|+α(Xj,kXk,k).
XiN(xt+1*,Σpred),
XiN(xt,Σprev).

Metrics