Abstract

This paper describes a real-time foreground segmentation method in monocular video sequences for video teleconferencing. Background subtraction is widely used in foreground segmentation for static cameras. However, the results are usually not accurate enough for background substitution tasks. In this paper, we propose a novel strategy for fast and accurate foreground segmentation. The strategy consists of two steps: initial foreground segmentation and fine foreground segmentation. The key to our algorithm consists of two steps. In the first step, a moving object is roughly segmented using the background subtraction method. In order to update the initial foreground segmentation results in the second step, a region-based segmentation method and a foreground history map (FHM)–based segmentation representing the combination of temporal and spatial information were developed. The segmentation accuracy of the proposed algorithm was evaluated with respect to the ground truth, which was the manually cropped foreground. The experimental results showed that the proposed algorithm improved the accuracy of segmentation with respect to Horprasert’s well-known algorithm.

© 2010 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2005), pp. 407-414.
  2. A. Criminisi, J. Shotton, A. Blake, and P. H. S. Torr, “Gaze manipulation for one-to-one teleconferencing,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2003), pp. 191-198.
    [CrossRef]
  3. H. Luo and A. Eleftheriadis, “Model based segmentation and tracking of head-and-shoulder video objects for real time multimedia services,” IEEE Trans. Multimedia 5, 379-389 (2003).
    [CrossRef]
  4. L. Zhao and L. S. Davis, “Closely coupled object detection and segmentation,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2005), pp. 454-461.
  5. S. Mills and K. Novins, “Motion segmentation in long image sequences,” in Proceedings of the 11th British Machine Vision Conference (Academic, 2000), pp.162-171.
  6. C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 747-757 (2000).
    [CrossRef]
  7. J.-H. Ahn and H. Byun, “Human silhouette extraction method using region based background subtraction,” Lect. Notes Comput. Sci. 4418, 412-420 (2007).
    [CrossRef]
  8. C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp. 246-252.
  9. T. Horprasert, D. Harwood, and L. Davis, “A statistical approach for real time robust background subtraction and shadow detection,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp.1-19.
  10. J. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” Int. J. Comput. Vis. 12, 43-77 (1994).
    [CrossRef]
  11. D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603-619 (2002).
    [CrossRef]
  12. B. C. Ko and J.-Y. Nam, “Object-of-interest image segmentation using human attention and semantic region clustering,” J. Opt. Soc. Am. A 23, 2462-2470 (2006).
    [CrossRef]

2007

J.-H. Ahn and H. Byun, “Human silhouette extraction method using region based background subtraction,” Lect. Notes Comput. Sci. 4418, 412-420 (2007).
[CrossRef]

2006

2003

H. Luo and A. Eleftheriadis, “Model based segmentation and tracking of head-and-shoulder video objects for real time multimedia services,” IEEE Trans. Multimedia 5, 379-389 (2003).
[CrossRef]

2002

D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603-619 (2002).
[CrossRef]

2000

C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 747-757 (2000).
[CrossRef]

1994

J. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” Int. J. Comput. Vis. 12, 43-77 (1994).
[CrossRef]

Ahn, J.-H.

J.-H. Ahn and H. Byun, “Human silhouette extraction method using region based background subtraction,” Lect. Notes Comput. Sci. 4418, 412-420 (2007).
[CrossRef]

Barron, J. J. L.

J. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” Int. J. Comput. Vis. 12, 43-77 (1994).
[CrossRef]

Beauchemin, S. S.

J. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” Int. J. Comput. Vis. 12, 43-77 (1994).
[CrossRef]

Blake, A.

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2005), pp. 407-414.

A. Criminisi, J. Shotton, A. Blake, and P. H. S. Torr, “Gaze manipulation for one-to-one teleconferencing,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2003), pp. 191-198.
[CrossRef]

Byun, H.

J.-H. Ahn and H. Byun, “Human silhouette extraction method using region based background subtraction,” Lect. Notes Comput. Sci. 4418, 412-420 (2007).
[CrossRef]

Comaniciu, D.

D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603-619 (2002).
[CrossRef]

Criminisi, A.

A. Criminisi, J. Shotton, A. Blake, and P. H. S. Torr, “Gaze manipulation for one-to-one teleconferencing,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2003), pp. 191-198.
[CrossRef]

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2005), pp. 407-414.

Cross, G.

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2005), pp. 407-414.

Davis, L.

T. Horprasert, D. Harwood, and L. Davis, “A statistical approach for real time robust background subtraction and shadow detection,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp.1-19.

Davis, L. S.

L. Zhao and L. S. Davis, “Closely coupled object detection and segmentation,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2005), pp. 454-461.

Eleftheriadis, A.

H. Luo and A. Eleftheriadis, “Model based segmentation and tracking of head-and-shoulder video objects for real time multimedia services,” IEEE Trans. Multimedia 5, 379-389 (2003).
[CrossRef]

Fleet, D. J.

J. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” Int. J. Comput. Vis. 12, 43-77 (1994).
[CrossRef]

Grimson, W.

C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp. 246-252.

Grimson, W. E. L.

C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 747-757 (2000).
[CrossRef]

Harwood, D.

T. Horprasert, D. Harwood, and L. Davis, “A statistical approach for real time robust background subtraction and shadow detection,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp.1-19.

Horprasert, T.

T. Horprasert, D. Harwood, and L. Davis, “A statistical approach for real time robust background subtraction and shadow detection,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp.1-19.

Ko, B. C.

Kolmogorov, V.

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2005), pp. 407-414.

Luo, H.

H. Luo and A. Eleftheriadis, “Model based segmentation and tracking of head-and-shoulder video objects for real time multimedia services,” IEEE Trans. Multimedia 5, 379-389 (2003).
[CrossRef]

Meer, P.

D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603-619 (2002).
[CrossRef]

Mills, S.

S. Mills and K. Novins, “Motion segmentation in long image sequences,” in Proceedings of the 11th British Machine Vision Conference (Academic, 2000), pp.162-171.

Nam, J.-Y.

Novins, K.

S. Mills and K. Novins, “Motion segmentation in long image sequences,” in Proceedings of the 11th British Machine Vision Conference (Academic, 2000), pp.162-171.

Rother, C.

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2005), pp. 407-414.

Shotton, J.

A. Criminisi, J. Shotton, A. Blake, and P. H. S. Torr, “Gaze manipulation for one-to-one teleconferencing,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2003), pp. 191-198.
[CrossRef]

Stauffer, C.

C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 747-757 (2000).
[CrossRef]

C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp. 246-252.

Torr, P. H. S.

A. Criminisi, J. Shotton, A. Blake, and P. H. S. Torr, “Gaze manipulation for one-to-one teleconferencing,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2003), pp. 191-198.
[CrossRef]

Zhao, L.

L. Zhao and L. S. Davis, “Closely coupled object detection and segmentation,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2005), pp. 454-461.

IEEE Trans. Multimedia

H. Luo and A. Eleftheriadis, “Model based segmentation and tracking of head-and-shoulder video objects for real time multimedia services,” IEEE Trans. Multimedia 5, 379-389 (2003).
[CrossRef]

IEEE Trans. Pattern Anal. Mach. Intell.

C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intell. 22, 747-757 (2000).
[CrossRef]

D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603-619 (2002).
[CrossRef]

Int. J. Comput. Vis.

J. J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” Int. J. Comput. Vis. 12, 43-77 (1994).
[CrossRef]

J. Opt. Soc. Am. A

Lect. Notes Comput. Sci.

J.-H. Ahn and H. Byun, “Human silhouette extraction method using region based background subtraction,” Lect. Notes Comput. Sci. 4418, 412-420 (2007).
[CrossRef]

Other

C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp. 246-252.

T. Horprasert, D. Harwood, and L. Davis, “A statistical approach for real time robust background subtraction and shadow detection,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1999), pp.1-19.

L. Zhao and L. S. Davis, “Closely coupled object detection and segmentation,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2005), pp. 454-461.

S. Mills and K. Novins, “Motion segmentation in long image sequences,” in Proceedings of the 11th British Machine Vision Conference (Academic, 2000), pp.162-171.

V. Kolmogorov, A. Criminisi, A. Blake, G. Cross, and C. Rother, “Bi-layer segmentation of binocular stereo video,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2005), pp. 407-414.

A. Criminisi, J. Shotton, A. Blake, and P. H. S. Torr, “Gaze manipulation for one-to-one teleconferencing,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2003), pp. 191-198.
[CrossRef]

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1
Fig. 1

Overview of the proposed algorithm.

Fig. 2
Fig. 2

Initial foreground segmentation using Horprasert’s background subtraction method: (a) original image; (b) brightness distortion likelihood log ( f b ( α p | p ) ) (truncated by 255); (c) chromaticity distortion likelihood η log ( f c ( γ p | p ) ) , where η = 5 (truncated by 255); (d) result of initial foreground segmentation (foreground pixel is white and background pixel is black).

Fig. 3
Fig. 3

Results of region-based foreground segmentation: (a) result of mean-shift segmentation; (b) detection of the uncertain region (foreground is white, background is black, and the uncertain region is gray); (c) re-label of the uncertain region (re-estimated foreground region is red online and the re-estimated background region is blue online); (d) results of the region-based foreground segmentation.

Fig. 4
Fig. 4

An FHM: an object moves toward the right (scaled by 255). In an FHM, the pixel intensity is a foreground probability, where brighter values correspond to a high foreground probability ( α = 0.2 ) .

Fig. 5
Fig. 5

Performance evaluation criteria for the foreground segmentation performance: (a) original image, (b) ground truth, (c) foreground segmentation results, (d) error pixels (red online for under-segmented pixels and blue online for over-segmented pixels).

Fig. 6
Fig. 6

Comparison of the foreground segmentation results obtained with the three algorithms: (a) original image, (b) pixel-based initial foreground segmentation, (c) region-based foreground segmentation, (d) FHM-based foreground segmentation.

Fig. 7
Fig. 7

Percent of segmentation accuracy at every tenth frame. (a)–(e) show the percent of segmentation accuracy for each set of test data; (a) Data 1, (b) Data 2, (c) Data 3, (d) Data 4, (e) Data 5. The solid red curve indicates the accuracy of the final segmentation result. The proposed FHM-based method is quite robust.

Fig. 8
Fig. 8

Comparisons of percent foreground segmentation accuracies.

Fig. 9
Fig. 9

Final segmentation results: (a) Data 1: frame 230, 260, 410, 690. (b) Data 2: frame 290, 330, 400, 470. (c) Data 3: frame 180, 240, 260, 390. (d) Data 4: frame 260, 410, 500, 810. (e) Data 5: frame 230, 400, 510, 650.  

Tables (2)

Tables Icon

Table 1 Comparison of the Foreground Segmentation Results Obtained with the Three Sub-Algorithms a

Tables Icon

Table 2 Comparison of the Foreground Segmentation Results Obtained with Horprasert’s Algorithm, Stauffer’s Algorithm, and the Proposed Algorithm a

Equations (14)

Equations on this page are rendered with MathJax. Learn more.

l ( p ) = log ( f b ( α p | p ) ) η log ( f c ( γ p | p ) ) ,
f D ( p ) = { F if l ( p ) > θ p B otherwise } .
P c ( R i ) = | ( 1 2 n i ( F ) n i ( F ) + n i ( B ) ) × 2 | ,
P SFP ( R i ) = 1 n i p R i l ( p ) ,
R i U and n i is the number of pixels in R i ,
f D ( R i ) = { F if P SFP ( R i ) > θ R i B otherwise } , R i U .
FHM t ( p ) = α FHM t 1 ( p ) + ( 1 α ) O region t ( p ) ,
p FHM 0 ( p ) = 0 ,
P TSFP t ( p ) = β × P SFP t ( p ) + ( 1 β ) FHM t ( p )
P SFP t ( p ) = P SFP t ( p R ) ,
D final ( p ) = { F if P t TSFP ( p ) > θ f B otherwise } .
E u = ( G ( G R ) ) n G ,
E o = ( R ( G R ) ) n R ,
A = 1 ( E u + E o ) ,

Metrics