Abstract

The estimate of the scene information, such as the region of ground/non-ground, the relative depth of the ground and the unevenness of ground, is important for applications such as video surveillance, mapbuilding and etc. Previous research in this field is based on specific assumptions which are difficult to satisfy in practical situations. In this paper a practical algorithm is proposed to estimate the scene information in monocular video. With the pedestrian detection results for a period of time, the Pedestrian-Scene Map (PS Map), consisting of the average width of a pedestrian and occurrence probability of a pedestrian at each position of the scene, is learned by integrating the pedestrian samples with different sizes at different positions of the scene. Furthermore, the relative depth of ground region, the ground/non-ground region and the unevenness of ground can be measured with PS Map. Experimental results illustrated the proposed method’s effectiveness with stationary uncalibrated camera for unconstrained environment.

© 2008 Optical Society of America

Full Article  |  PDF Article
OSA Recommended Articles
Supervised preserving projection for learning scene information based on time-of-flight imaging sensor

Yi Jiang, Yong Liu, Yunqi Lei, and Qicong Wang
Appl. Opt. 52(21) 5279-5288 (2013)

Estimation of contour motion and deformation for nonrigid object tracking

Jie Shao, Fatih Porikli, and Rama Chellappa
J. Opt. Soc. Am. A 24(8) 2109-2121 (2007)

Moving target detection in thermal infrared imagery using spatiotemporal information

Aparna Akula, Ripul Ghosh, Satish Kumar, and H. K. Sardana
J. Opt. Soc. Am. A 30(8) 1492-1501 (2013)

References

  • View by:
  • |
  • |
  • |

  1. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision 47(1–3), 7–42 (2002).
    [Crossref]
  2. D. Forsyth and J. Ponce, in Computer Vision : A Modern Approach, vol. Prentice Hall (2003).
  3. R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999).
    [Crossref]
  4. A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer Vision 40, 123–148 (2000).
    [Crossref]
  5. D. Hoiem, A. Efros, and M. Hebert, “Geometric Context from a Single Image,” Proceedings of the IEEE International Conference on Computer Vision 2, 1284–1291 (2005).
  6. D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137–2144 (2006).
  7. M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, “Statistical modeling and performance characterization of a real-time dual camera surveillance system,” Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335–342 (2000).
    [Crossref]
  8. S. G. Jeong and et al, “Real-Time Lane Detection for Autonomous Vehicle,” IEEE International Symposium on Industrial Electronics Proceedings (ISIE 2001) pp. 1466–1471 (2001).
  9. N. Krahnstoever and P. R. S. Mendonca, “Bayesian autocalibration for surveillance,” Proceedings of the IEEE International Conference on Computer Vision 2, 1858–1865 (2005).
  10. A. Saxena, S. H. Chung, and Y. N. Andrew, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision 2007, http://ai.stanford.edu/asaxena/learningdepth/.
  11. “Terminology relating to traveled Surface characteristics annual book of ASTM Standards,” American society for testing and material(ASTM). (1999)
  12. “High Capacity Laser Profilograph,” http://www.cedex.es/cec/documenti/survey.htm.
  13. S. Se and M. Brady, “Vision-based Detection of Stair-cases,” Proceedings of Fourth Asian Conference on Computer Vision ACCV pp. 535–540 (2000).
  14. V. Nair and J. Clark, “An unsupervised, online learning framework for moving object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317–324 (2004).
  15. Z. Zhou and M. li, “Tri-training: exploiting unlabeled data using three classifiers,” IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005).
    [Crossref]
  16. P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511–518 (2001).
  17. W. H. Ittelson, “Size as a cue to distance: static localization,” American Journal of Psychology 64, 54–67 (1951).
    [Crossref] [PubMed]
  18. A. Yonas, L. Pettersen, and C. E. Granrud, “Infants’ sensitivity to familiar size as information for distance,” Child Development 53, 1285–1290 (1982).
    [Crossref] [PubMed]

2006 (1)

D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137–2144 (2006).

2005 (3)

N. Krahnstoever and P. R. S. Mendonca, “Bayesian autocalibration for surveillance,” Proceedings of the IEEE International Conference on Computer Vision 2, 1858–1865 (2005).

D. Hoiem, A. Efros, and M. Hebert, “Geometric Context from a Single Image,” Proceedings of the IEEE International Conference on Computer Vision 2, 1284–1291 (2005).

Z. Zhou and M. li, “Tri-training: exploiting unlabeled data using three classifiers,” IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005).
[Crossref]

2004 (1)

V. Nair and J. Clark, “An unsupervised, online learning framework for moving object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317–324 (2004).

2002 (1)

D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision 47(1–3), 7–42 (2002).
[Crossref]

2001 (1)

P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511–518 (2001).

2000 (2)

A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer Vision 40, 123–148 (2000).
[Crossref]

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, “Statistical modeling and performance characterization of a real-time dual camera surveillance system,” Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335–342 (2000).
[Crossref]

1999 (1)

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999).
[Crossref]

1982 (1)

A. Yonas, L. Pettersen, and C. E. Granrud, “Infants’ sensitivity to familiar size as information for distance,” Child Development 53, 1285–1290 (1982).
[Crossref] [PubMed]

1951 (1)

W. H. Ittelson, “Size as a cue to distance: static localization,” American Journal of Psychology 64, 54–67 (1951).
[Crossref] [PubMed]

Andrew, Y. N.

A. Saxena, S. H. Chung, and Y. N. Andrew, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision 2007, http://ai.stanford.edu/asaxena/learningdepth/.

Brady, M.

S. Se and M. Brady, “Vision-based Detection of Stair-cases,” Proceedings of Fourth Asian Conference on Computer Vision ACCV pp. 535–540 (2000).

Chung, S. H.

A. Saxena, S. H. Chung, and Y. N. Andrew, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision 2007, http://ai.stanford.edu/asaxena/learningdepth/.

Clark, J.

V. Nair and J. Clark, “An unsupervised, online learning framework for moving object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317–324 (2004).

Comaniciu, D.

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, “Statistical modeling and performance characterization of a real-time dual camera surveillance system,” Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335–342 (2000).
[Crossref]

Criminisi, A.

A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer Vision 40, 123–148 (2000).
[Crossref]

Cryer, J. E.

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999).
[Crossref]

Efros, A.

D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137–2144 (2006).

D. Hoiem, A. Efros, and M. Hebert, “Geometric Context from a Single Image,” Proceedings of the IEEE International Conference on Computer Vision 2, 1284–1291 (2005).

Forsyth, D.

D. Forsyth and J. Ponce, in Computer Vision : A Modern Approach, vol. Prentice Hall (2003).

Granrud, C. E.

A. Yonas, L. Pettersen, and C. E. Granrud, “Infants’ sensitivity to familiar size as information for distance,” Child Development 53, 1285–1290 (1982).
[Crossref] [PubMed]

Greiffenhagen, M.

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, “Statistical modeling and performance characterization of a real-time dual camera surveillance system,” Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335–342 (2000).
[Crossref]

Hebert, M.

D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137–2144 (2006).

D. Hoiem, A. Efros, and M. Hebert, “Geometric Context from a Single Image,” Proceedings of the IEEE International Conference on Computer Vision 2, 1284–1291 (2005).

Hoiem, D.

D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137–2144 (2006).

D. Hoiem, A. Efros, and M. Hebert, “Geometric Context from a Single Image,” Proceedings of the IEEE International Conference on Computer Vision 2, 1284–1291 (2005).

Ittelson, W. H.

W. H. Ittelson, “Size as a cue to distance: static localization,” American Journal of Psychology 64, 54–67 (1951).
[Crossref] [PubMed]

Jeong, S. G.

S. G. Jeong and et al, “Real-Time Lane Detection for Autonomous Vehicle,” IEEE International Symposium on Industrial Electronics Proceedings (ISIE 2001) pp. 1466–1471 (2001).

Jones, M.

P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511–518 (2001).

Krahnstoever, N.

N. Krahnstoever and P. R. S. Mendonca, “Bayesian autocalibration for surveillance,” Proceedings of the IEEE International Conference on Computer Vision 2, 1858–1865 (2005).

li, M.

Z. Zhou and M. li, “Tri-training: exploiting unlabeled data using three classifiers,” IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005).
[Crossref]

Mendonca, P. R. S.

N. Krahnstoever and P. R. S. Mendonca, “Bayesian autocalibration for surveillance,” Proceedings of the IEEE International Conference on Computer Vision 2, 1858–1865 (2005).

Nair, V.

V. Nair and J. Clark, “An unsupervised, online learning framework for moving object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317–324 (2004).

Niemann, H.

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, “Statistical modeling and performance characterization of a real-time dual camera surveillance system,” Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335–342 (2000).
[Crossref]

Pettersen, L.

A. Yonas, L. Pettersen, and C. E. Granrud, “Infants’ sensitivity to familiar size as information for distance,” Child Development 53, 1285–1290 (1982).
[Crossref] [PubMed]

Ponce, J.

D. Forsyth and J. Ponce, in Computer Vision : A Modern Approach, vol. Prentice Hall (2003).

Ramesh, V.

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, “Statistical modeling and performance characterization of a real-time dual camera surveillance system,” Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335–342 (2000).
[Crossref]

Reid, I.

A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer Vision 40, 123–148 (2000).
[Crossref]

Saxena, A.

A. Saxena, S. H. Chung, and Y. N. Andrew, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision 2007, http://ai.stanford.edu/asaxena/learningdepth/.

Scharstein, D.

D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision 47(1–3), 7–42 (2002).
[Crossref]

Se, S.

S. Se and M. Brady, “Vision-based Detection of Stair-cases,” Proceedings of Fourth Asian Conference on Computer Vision ACCV pp. 535–540 (2000).

Shah, M.

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999).
[Crossref]

Szeliski, R.

D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision 47(1–3), 7–42 (2002).
[Crossref]

Tsai, P. S.

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999).
[Crossref]

Viola, P.

P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511–518 (2001).

Yonas, A.

A. Yonas, L. Pettersen, and C. E. Granrud, “Infants’ sensitivity to familiar size as information for distance,” Child Development 53, 1285–1290 (1982).
[Crossref] [PubMed]

Zhang, R.

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999).
[Crossref]

Zhou, Z.

Z. Zhou and M. li, “Tri-training: exploiting unlabeled data using three classifiers,” IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005).
[Crossref]

Zisserman, A.

A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer Vision 40, 123–148 (2000).
[Crossref]

American Journal of Psychology (1)

W. H. Ittelson, “Size as a cue to distance: static localization,” American Journal of Psychology 64, 54–67 (1951).
[Crossref] [PubMed]

Child Development (1)

A. Yonas, L. Pettersen, and C. E. Granrud, “Infants’ sensitivity to familiar size as information for distance,” Child Development 53, 1285–1290 (1982).
[Crossref] [PubMed]

Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (1)

D. Hoiem, A. Efros, and M. Hebert, “Putting Objects in Perspective,” Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on 2, 2137–2144 (2006).

IEEE Trans on Pattern Analysis and Machine Intelligence (1)

R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans on Pattern Analysis and Machine Intelligence 21(8), 690–706 (1999).
[Crossref]

IEEE Transactions on Knowledge and Data Engineering (1)

Z. Zhou and M. li, “Tri-training: exploiting unlabeled data using three classifiers,” IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005).
[Crossref]

International Journal of Computer Vision (2)

A. Criminisi, I. Reid, and A. Zisserman, “Single view metrology,” International Journal of Computer Vision 40, 123–148 (2000).
[Crossref]

D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International Journal of Computer Vision 47(1–3), 7–42 (2002).
[Crossref]

Proceedings of International Conference on Computer Vision and Pattern Recognition (1)

P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features,” Proceedings of International Conference on Computer Vision and Pattern Recognition 1, 511–518 (2001).

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1)

V. Nair and J. Clark, “An unsupervised, online learning framework for moving object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2, 317–324 (2004).

Proceedings of the IEEE International Conference on Computer Vision (2)

D. Hoiem, A. Efros, and M. Hebert, “Geometric Context from a Single Image,” Proceedings of the IEEE International Conference on Computer Vision 2, 1284–1291 (2005).

N. Krahnstoever and P. R. S. Mendonca, “Bayesian autocalibration for surveillance,” Proceedings of the IEEE International Conference on Computer Vision 2, 1858–1865 (2005).

Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (1)

M. Greiffenhagen, V. Ramesh, D. Comaniciu, and H. Niemann, “Statistical modeling and performance characterization of a real-time dual camera surveillance system,” Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition 2, 335–342 (2000).
[Crossref]

Other (6)

S. G. Jeong and et al, “Real-Time Lane Detection for Autonomous Vehicle,” IEEE International Symposium on Industrial Electronics Proceedings (ISIE 2001) pp. 1466–1471 (2001).

A. Saxena, S. H. Chung, and Y. N. Andrew, “3-D Depth Reconstruction from a Single Still Image,” International Journal of Computer Vision 2007, http://ai.stanford.edu/asaxena/learningdepth/.

“Terminology relating to traveled Surface characteristics annual book of ASTM Standards,” American society for testing and material(ASTM). (1999)

“High Capacity Laser Profilograph,” http://www.cedex.es/cec/documenti/survey.htm.

S. Se and M. Brady, “Vision-based Detection of Stair-cases,” Proceedings of Fourth Asian Conference on Computer Vision ACCV pp. 535–540 (2000).

D. Forsyth and J. Ponce, in Computer Vision : A Modern Approach, vol. Prentice Hall (2003).

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1.

The relationship between the world coordinate and the image coordinate.

Fig. 2.
Fig. 2.

The pedestrian bounding boxes for three points at different positions in the scene.

Fig. 3.
Fig. 3.

The flow chart for Tri-training algorithm [15].)

Fig. 4.
Fig. 4.

A square on the ground in the real world and corresponding trapezoid in the image.

Fig. 5.
Fig. 5.

(a): The corridor with stairs. (b): Receiver operating characteristic of the classifiers on the test set. (c): the estimated occurrence probability of pedestrians in the scene p(x,y). (d): the estimated average width of pedestrians in the scene w(x,y).

Fig. 6.
Fig. 6.

Experimental results for the estimate of the corridor. (a): the corridor scene with the real ground region marked with green line. (b): coarse result of estimated ground/non-ground region. (c): final result of estimated ground/non-ground region. (d): the corridor scene with the real unevenness region marked with red line. (e): coarse result of estimated unevenness region. (f): final result of estimated unevenness region. (g): the real depth of ground relative to the bottom of the scene. (h): coarse result of estimated depth of ground relative to the bottom of the scene. (i): final result of estimated depth of ground relative to the bottom of the scene.

Tables (1)

Tables Icon

Table 1. Errors of the estimated scene information corresponding to Fig. 6.

Equations (22)

Equations on this page are rendered with MathJax. Learn more.

y c ( x r , y r , z r ) z r = y i f ,
x i x r = f z r .
Δ x i = f z r Δ x r .
w i ( x i , y i ) = w r f 1 z r .
D ( x i , y i ) 1 w i ( x i , y i ) ,
p ( x i , y i ) = ψ { e [ s ( x r , y r , z r ) ] } .
e [ s ( x r , y r ) ] = ψ 1 [ p ( x i , y i ) ] ,
S r z r 3 f 2 y c S i .
ψ ( s ( x r , y r ) ) z r 3 ,
e [ s ( x r , y r ) ] = ψ 1 [ p ( x i , y i ) ] p ( x i , y i ) z r 3 .
e [ s ( x r , y r ) ] = p ( x i , y i ) w i 3 ( x i , y i ) .
G ( x i , y i ) = { 255 if p ( x i , y i ) 0 0 if p ( x i , y i ) = 0 ,
Error rate of estimated G ( x , y ) = pixel number of mislabeled in G ( x , y ) Total pixel number in G ( x , y ) ,
Error rate of estimated E ( x , y ) = Pixel number of mislabeled in E ( x , y ) Total pixel number in E ( x , y ) ,
Average   error of estimated D ( x , y ) = i = 1 N ( d ei d ri ) 2 N ,
{ y i 1 = f z r 1 y c , y i 2 = f z r 2 y c .
{ x i A = f z r A x r A , x i B = f z r B x r B , z r A = z r B .
S i = ( x i B x i A ) ( y i 2 y i 1 ) .
S i = f 2 y c z rA z r 1 z r 2 S r ,
y i A y i 1 = y i 2 y i A ,
z r A = 2 z r 1 z r 2 z r 1 + z r 2 .
S i = f 2 y c ( z r 1 + z r 2 ) 2 z r 1 2 z r 2 2 S r .

Metrics