Abstract

The estimate of the scene information, such as the region of ground/non-ground, the relative depth of the ground and the unevenness of ground, is important for applications such as video surveillance, mapbuilding and etc. Previous research in this field is based on specific assumptions which are difficult to satisfy in practical situations. In this paper a practical algorithm is proposed to estimate the scene information in monocular video. With the pedestrian detection results for a period of time, the Pedestrian-Scene Map (PS Map), consisting of the average width of a pedestrian and occurrence probability of a pedestrian at each position of the scene, is learned by integrating the pedestrian samples with different sizes at different positions of the scene. Furthermore, the relative depth of ground region, the ground/non-ground region and the unevenness of ground can be measured with PS Map. Experimental results illustrated the proposed method’s effectiveness with stationary uncalibrated camera for unconstrained environment.

© 2008 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |

  1. D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," International Journal of Computer Vision 47(1-3), 7-42 (2002).
    [CrossRef]
  2. D. Forsyth and J. Ponce, in Computer Vision : A Modern Approach, vol. Prentice Hall (2003).
  3. R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, "Shape from shading: A survey," IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690-706 (1999).
    [CrossRef]
  4. A. Criminisi, I. Reid, and A. Zisserman, "Single view metrology," International Journal of Computer Vision 40, 123-148 (2000).
    [CrossRef]
  5. D. Hoiem, A. Efros, and M. Hebert, "Geometric Context from a Single Image," Proceedings of the IEEE International Conference on Computer Vision 2, 1284 -1291 (2005).
  6. D. Hoiem, A. Efros, and M. Hebert, "Putting Objects in Perspective," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137 - 2144 (2006).
  7. M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, "Statistical modeling and performance characterization of a real-time dual camera surveillance system," Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335-342 (2000).
    [CrossRef]
  8. S. G. Jeong and et al, "Real-Time Lane Detection for Autonomous Vehicle," IEEE International Symposium on Industrial Electronics Proceedings (ISIE 2001) pp. 1466-1471 (2001).
  9. N. Krahnstoever and P. R. S. Mendonca, "Bayesian autocalibration for surveillance," Proceedings of the IEEE International Conference on Computer Vision 2, 1858-1865 (2005).
  10. A. Saxena, S. H. Chung, and Y. N. Andrew, "3-D Depth Reconstruction from a Single Still Image," International Journal of Computer Vision 2007, http://ai.stanford.edu/ asaxena/learningdepth/.
  11. "Terminology relating to traveled Surface characteristics annual book of ASTM Standards," American society for testing and material(ASTM). (1999)
  12. "High Capacity Laser Profilograph," http://www.cedex.es/cec/documenti/survey.htm.
  13. S. Se and M. Brady, "Vision-based Detection of Stair-cases," Proceedings of Fourth Asian Conference on Computer Vision ACCV pp. 535-540 (2000).
  14. V. Nair and J. Clark, "An unsupervised, online learning framework for moving object detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317 - 324 (2004).
  15. Z. Zhou and M. li, "Tri-training: exploiting unlabeled data using three classifiers," IEEE Transactions on Knowledge and Data Engineering 17(11), 1529-1541 (2005).
    [CrossRef]
  16. P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511-518 (2001).
  17. W. H. Ittelson, "Size as a cue to distance: static localization," American Journal of Psychology 64, 54-67 (1951).
    [CrossRef] [PubMed]
  18. A. Yonas, L. Pettersen, and C. E. Granrud, "Infants’ sensitivity to familiar size as information for distance," Child Development 53, 1285-1290 (1982).
    [CrossRef] [PubMed]

2005 (3)

D. Hoiem, A. Efros, and M. Hebert, "Geometric Context from a Single Image," Proceedings of the IEEE International Conference on Computer Vision 2, 1284 -1291 (2005).

N. Krahnstoever and P. R. S. Mendonca, "Bayesian autocalibration for surveillance," Proceedings of the IEEE International Conference on Computer Vision 2, 1858-1865 (2005).

Z. Zhou and M. li, "Tri-training: exploiting unlabeled data using three classifiers," IEEE Transactions on Knowledge and Data Engineering 17(11), 1529-1541 (2005).
[CrossRef]

2004 (1)

V. Nair and J. Clark, "An unsupervised, online learning framework for moving object detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317 - 324 (2004).

2002 (1)

D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," International Journal of Computer Vision 47(1-3), 7-42 (2002).
[CrossRef]

2001 (1)

P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511-518 (2001).

2000 (2)

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, "Statistical modeling and performance characterization of a real-time dual camera surveillance system," Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335-342 (2000).
[CrossRef]

A. Criminisi, I. Reid, and A. Zisserman, "Single view metrology," International Journal of Computer Vision 40, 123-148 (2000).
[CrossRef]

1999 (1)

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, "Shape from shading: A survey," IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690-706 (1999).
[CrossRef]

1982 (1)

A. Yonas, L. Pettersen, and C. E. Granrud, "Infants’ sensitivity to familiar size as information for distance," Child Development 53, 1285-1290 (1982).
[CrossRef] [PubMed]

1951 (1)

W. H. Ittelson, "Size as a cue to distance: static localization," American Journal of Psychology 64, 54-67 (1951).
[CrossRef] [PubMed]

American Journal of Psychology (1)

W. H. Ittelson, "Size as a cue to distance: static localization," American Journal of Psychology 64, 54-67 (1951).
[CrossRef] [PubMed]

Child Development (1)

A. Yonas, L. Pettersen, and C. E. Granrud, "Infants’ sensitivity to familiar size as information for distance," Child Development 53, 1285-1290 (1982).
[CrossRef] [PubMed]

IEEE Trans on Pattern Analysis and Machine Intelligence (1)

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, "Shape from shading: A survey," IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690-706 (1999).
[CrossRef]

IEEE Transactions on Knowledge and Data Engineering (1)

Z. Zhou and M. li, "Tri-training: exploiting unlabeled data using three classifiers," IEEE Transactions on Knowledge and Data Engineering 17(11), 1529-1541 (2005).
[CrossRef]

International Journal of Computer Vision (2)

A. Criminisi, I. Reid, and A. Zisserman, "Single view metrology," International Journal of Computer Vision 40, 123-148 (2000).
[CrossRef]

D. Scharstein and R. Szeliski, "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms," International Journal of Computer Vision 47(1-3), 7-42 (2002).
[CrossRef]

Proceedings of International Conference on Computer Vision and Pattern Recognition (1)

P. Viola and M. Jones, "Rapid Object Detection Using a Boosted Cascade of Simple Features," Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511-518 (2001).

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1)

V. Nair and J. Clark, "An unsupervised, online learning framework for moving object detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317 - 324 (2004).

Proceedings of the IEEE International Conference on Computer Vision (2)

N. Krahnstoever and P. R. S. Mendonca, "Bayesian autocalibration for surveillance," Proceedings of the IEEE International Conference on Computer Vision 2, 1858-1865 (2005).

D. Hoiem, A. Efros, and M. Hebert, "Geometric Context from a Single Image," Proceedings of the IEEE International Conference on Computer Vision 2, 1284 -1291 (2005).

Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (1)

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, "Statistical modeling and performance characterization of a real-time dual camera surveillance system," Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335-342 (2000).
[CrossRef]

Other (7)

S. G. Jeong and et al, "Real-Time Lane Detection for Autonomous Vehicle," IEEE International Symposium on Industrial Electronics Proceedings (ISIE 2001) pp. 1466-1471 (2001).

D. Forsyth and J. Ponce, in Computer Vision : A Modern Approach, vol. Prentice Hall (2003).

D. Hoiem, A. Efros, and M. Hebert, "Putting Objects in Perspective," Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137 - 2144 (2006).

A. Saxena, S. H. Chung, and Y. N. Andrew, "3-D Depth Reconstruction from a Single Still Image," International Journal of Computer Vision 2007, http://ai.stanford.edu/ asaxena/learningdepth/.

"Terminology relating to traveled Surface characteristics annual book of ASTM Standards," American society for testing and material(ASTM). (1999)

"High Capacity Laser Profilograph," http://www.cedex.es/cec/documenti/survey.htm.

S. Se and M. Brady, "Vision-based Detection of Stair-cases," Proceedings of Fourth Asian Conference on Computer Vision ACCV pp. 535-540 (2000).

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1.

The relationship between the world coordinate and the image coordinate.

Fig. 2.
Fig. 2.

The pedestrian bounding boxes for three points at different positions in the scene.

Fig. 3.
Fig. 3.

The flow chart for Tri-training algorithm [15].)

Fig. 4.
Fig. 4.

A square on the ground in the real world and corresponding trapezoid in the image.

Fig. 5.
Fig. 5.

(a): The corridor with stairs. (b): Receiver operating characteristic of the classifiers on the test set. (c): the estimated occurrence probability of pedestrians in the scene p(x,y). (d): the estimated average width of pedestrians in the scene w(x,y).

Fig. 6.
Fig. 6.

Experimental results for the estimate of the corridor. (a): the corridor scene with the real ground region marked with green line. (b): coarse result of estimated ground/non-ground region. (c): final result of estimated ground/non-ground region. (d): the corridor scene with the real unevenness region marked with red line. (e): coarse result of estimated unevenness region. (f): final result of estimated unevenness region. (g): the real depth of ground relative to the bottom of the scene. (h): coarse result of estimated depth of ground relative to the bottom of the scene. (i): final result of estimated depth of ground relative to the bottom of the scene.

Tables (1)

Tables Icon

Table 1. Errors of the estimated scene information corresponding to Fig. 6.

Equations (22)

Equations on this page are rendered with MathJax. Learn more.

y c ( x r , y r , z r ) z r = y i f ,
x i x r = f z r .
Δ x i = f z r Δ x r .
w i ( x i , y i ) = w r f 1 z r .
D ( x i , y i ) 1 w i ( x i , y i ) ,
p ( x i , y i ) = ψ { e [ s ( x r , y r , z r ) ] } .
e [ s ( x r , y r ) ] = ψ 1 [ p ( x i , y i ) ] ,
S r z r 3 f 2 y c S i .
ψ ( s ( x r , y r ) ) z r 3 ,
e [ s ( x r , y r ) ] = ψ 1 [ p ( x i , y i ) ] p ( x i , y i ) z r 3 .
e [ s ( x r , y r ) ] = p ( x i , y i ) w i 3 ( x i , y i ) .
G ( x i , y i ) = { 255 if p ( x i , y i ) 0 0 if p ( x i , y i ) = 0 ,
Error rate of estimated G ( x , y ) = pixel number of mislabeled in G ( x , y ) Total pixel number in G ( x , y ) ,
Error rate of estimated E ( x , y ) = Pixel number of mislabeled in E ( x , y ) Total pixel number in E ( x , y ) ,
Average   error of estimated D ( x , y ) = i = 1 N ( d ei d ri ) 2 N ,
{ y i 1 = f z r 1 y c , y i 2 = f z r 2 y c .
{ x i A = f z r A x r A , x i B = f z r B x r B , z r A = z r B .
S i = ( x i B x i A ) ( y i 2 y i 1 ) .
S i = f 2 y c z rA z r 1 z r 2 S r ,
y i A y i 1 = y i 2 y i A ,
z r A = 2 z r 1 z r 2 z r 1 + z r 2 .
S i = f 2 y c ( z r 1 + z r 2 ) 2 z r 1 2 z r 2 2 S r .

Metrics