Abstract

A hierarchical classifier (cascade) is proposed for target detection. In building an optimal cascade we considered three heuristics: (1) use of a frontier-following approximation, (2) controlling error rates, and (3) weighting. Simulations of synthetic data with various underlying distributions were carried out. We found that a weighting heuristic is optimal in terms of both computational complexity and error rates. We initiate a systematic comparison of several potential heuristics that can be utilized in building a hierarchical model. A range of discussions regarding the implications and the promises of cascade architecture as well as of techniques that can be integrated into this framework is provided. The optimum heuristic—weighting algorithms—was applied to an IR data set. It was found that these algorithms outperform some state-of-the-art approaches that utilize the same type of simple classifier.

© 2004 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees (Wadsworth, Belmont, Calif., 1984).
  2. J. R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, Los Altos, Calif., 1993).
  3. K. P. Bennett, N. Cristianini, J. Shawe-Taylor, D. Wu, “Enlarging the margins in perceptron decision trees,” Mach. Learning 41, 295–313 (2000).
    [CrossRef]
  4. A. Said, W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol. 6, 243–250 (1996).
    [CrossRef]
  5. J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Process. 41, 3445–3462 (1993).
    [CrossRef]
  6. P. J. Burt, E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun. 9, 532–540 (1983).
    [CrossRef]
  7. M. Elad, Y. Hel-Or, R. Keshet, “Pattern detection using a maximal rejection classifier,” Pattern Recogn. Lett. 23, 1459–1471 (2002).
    [CrossRef]
  8. Y. Hel-Or, H. Hel-Or, “Real time pattern matching using projection kernels,” interdisciplinary Tech. Rep. CS-2002-1 (2002), http://www.faculty.idc.ac.il/toky/Publications/publications.htm .
  9. Y. Hel-Or, H. Hel-Or, “Generalized pattern matching using orbit decomposition,” presented at the International Conference on Image Processing, Barcelona, Spain (September 2003), http://www.faculty.idc.ac.il/toky/Publications/publications.htm .
  10. P. Viola, M. Jones, “Robust real-time object detection,” presented at the ICCV Workshop on Statistical and Computation Theories of Vision, Vancouver, B.C., Canada (July 2001).
  11. B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.
  12. T. Hastie, J. Friedman, R. Tibshirani, Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer-Verlag, Berlin, 2001).
    [CrossRef]
  13. N. Cristianini, J. Shawe-Taylor, Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge U. Press, New York, 2000).
    [CrossRef]
  14. V. Vapnik, The Nature of Statistical Learning Theory (Springer-Verlag, Berlin, 1995).
    [CrossRef]
  15. J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Ann. Statist. 29, 1189–1232 (2001).
    [CrossRef]
  16. J. H. Friedman, “Getting Started with MART in R,” tutorial (April2002), http://www-stat.stanford.edu/∼jhf/ .
  17. J. Friedman, T. Hastie, R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Ann. Statist. 28, 337–407 (2000).
    [CrossRef]
  18. R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Ann. Statist. 26, 1651–1686 (1998).
    [CrossRef]
  19. M. Zhu, T. Hastie, “Feature extraction for non-parametric discriminant analysis,” J. Comput. Graphic. Statist. (to be published).

2002 (1)

M. Elad, Y. Hel-Or, R. Keshet, “Pattern detection using a maximal rejection classifier,” Pattern Recogn. Lett. 23, 1459–1471 (2002).
[CrossRef]

2001 (1)

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Ann. Statist. 29, 1189–1232 (2001).
[CrossRef]

2000 (2)

J. Friedman, T. Hastie, R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Ann. Statist. 28, 337–407 (2000).
[CrossRef]

K. P. Bennett, N. Cristianini, J. Shawe-Taylor, D. Wu, “Enlarging the margins in perceptron decision trees,” Mach. Learning 41, 295–313 (2000).
[CrossRef]

1998 (1)

R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Ann. Statist. 26, 1651–1686 (1998).
[CrossRef]

1996 (1)

A. Said, W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol. 6, 243–250 (1996).
[CrossRef]

1993 (1)

J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Process. 41, 3445–3462 (1993).
[CrossRef]

1983 (1)

P. J. Burt, E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun. 9, 532–540 (1983).
[CrossRef]

Adelson, E. H.

P. J. Burt, E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun. 9, 532–540 (1983).
[CrossRef]

Bartlett, P.

R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Ann. Statist. 26, 1651–1686 (1998).
[CrossRef]

Bennett, K. P.

K. P. Bennett, N. Cristianini, J. Shawe-Taylor, D. Wu, “Enlarging the margins in perceptron decision trees,” Mach. Learning 41, 295–313 (2000).
[CrossRef]

Breiman, L.

L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees (Wadsworth, Belmont, Calif., 1984).

Burt, P. J.

P. J. Burt, E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun. 9, 532–540 (1983).
[CrossRef]

Cristianini, N.

K. P. Bennett, N. Cristianini, J. Shawe-Taylor, D. Wu, “Enlarging the margins in perceptron decision trees,” Mach. Learning 41, 295–313 (2000).
[CrossRef]

N. Cristianini, J. Shawe-Taylor, Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge U. Press, New York, 2000).
[CrossRef]

Elad, M.

M. Elad, Y. Hel-Or, R. Keshet, “Pattern detection using a maximal rejection classifier,” Pattern Recogn. Lett. 23, 1459–1471 (2002).
[CrossRef]

Freund, Y.

R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Ann. Statist. 26, 1651–1686 (1998).
[CrossRef]

Friedman, J.

J. Friedman, T. Hastie, R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Ann. Statist. 28, 337–407 (2000).
[CrossRef]

T. Hastie, J. Friedman, R. Tibshirani, Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer-Verlag, Berlin, 2001).
[CrossRef]

Friedman, J. H.

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Ann. Statist. 29, 1189–1232 (2001).
[CrossRef]

L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees (Wadsworth, Belmont, Calif., 1984).

Hastie, T.

J. Friedman, T. Hastie, R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Ann. Statist. 28, 337–407 (2000).
[CrossRef]

T. Hastie, J. Friedman, R. Tibshirani, Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer-Verlag, Berlin, 2001).
[CrossRef]

M. Zhu, T. Hastie, “Feature extraction for non-parametric discriminant analysis,” J. Comput. Graphic. Statist. (to be published).

Heisele, B.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Hel-Or, Y.

M. Elad, Y. Hel-Or, R. Keshet, “Pattern detection using a maximal rejection classifier,” Pattern Recogn. Lett. 23, 1459–1471 (2002).
[CrossRef]

Jones, M.

P. Viola, M. Jones, “Robust real-time object detection,” presented at the ICCV Workshop on Statistical and Computation Theories of Vision, Vancouver, B.C., Canada (July 2001).

Keshet, R.

M. Elad, Y. Hel-Or, R. Keshet, “Pattern detection using a maximal rejection classifier,” Pattern Recogn. Lett. 23, 1459–1471 (2002).
[CrossRef]

Lee, W. S.

R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Ann. Statist. 26, 1651–1686 (1998).
[CrossRef]

Mukherjee, S.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Olshen, R. A.

L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees (Wadsworth, Belmont, Calif., 1984).

Pearlman, W. A.

A. Said, W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol. 6, 243–250 (1996).
[CrossRef]

Poggio, T.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Quinlan, J. R.

J. R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, Los Altos, Calif., 1993).

Said, A.

A. Said, W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol. 6, 243–250 (1996).
[CrossRef]

Schapire, R. E.

R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Ann. Statist. 26, 1651–1686 (1998).
[CrossRef]

Serre, T.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Shapiro, J. M.

J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Process. 41, 3445–3462 (1993).
[CrossRef]

Shawe-Taylor, J.

K. P. Bennett, N. Cristianini, J. Shawe-Taylor, D. Wu, “Enlarging the margins in perceptron decision trees,” Mach. Learning 41, 295–313 (2000).
[CrossRef]

N. Cristianini, J. Shawe-Taylor, Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge U. Press, New York, 2000).
[CrossRef]

Stone, C. J.

L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees (Wadsworth, Belmont, Calif., 1984).

Tibshirani, R.

J. Friedman, T. Hastie, R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Ann. Statist. 28, 337–407 (2000).
[CrossRef]

T. Hastie, J. Friedman, R. Tibshirani, Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer-Verlag, Berlin, 2001).
[CrossRef]

Vapnik, V.

V. Vapnik, The Nature of Statistical Learning Theory (Springer-Verlag, Berlin, 1995).
[CrossRef]

Viola, P.

P. Viola, M. Jones, “Robust real-time object detection,” presented at the ICCV Workshop on Statistical and Computation Theories of Vision, Vancouver, B.C., Canada (July 2001).

Wu, D.

K. P. Bennett, N. Cristianini, J. Shawe-Taylor, D. Wu, “Enlarging the margins in perceptron decision trees,” Mach. Learning 41, 295–313 (2000).
[CrossRef]

Zhu, M.

M. Zhu, T. Hastie, “Feature extraction for non-parametric discriminant analysis,” J. Comput. Graphic. Statist. (to be published).

Ann. Statist. (3)

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Ann. Statist. 29, 1189–1232 (2001).
[CrossRef]

J. Friedman, T. Hastie, R. Tibshirani, “Additive logistic regression: a statistical view of boosting,” Ann. Statist. 28, 337–407 (2000).
[CrossRef]

R. E. Schapire, Y. Freund, P. Bartlett, W. S. Lee, “Boosting the margin: a new explanation for the effectiveness of voting methods,” Ann. Statist. 26, 1651–1686 (1998).
[CrossRef]

IEEE Trans. Circuits Syst. Video Technol. (1)

A. Said, W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Technol. 6, 243–250 (1996).
[CrossRef]

IEEE Trans. Commun. (1)

P. J. Burt, E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun. 9, 532–540 (1983).
[CrossRef]

IEEE Trans. Signal Process. (1)

J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans. Signal Process. 41, 3445–3462 (1993).
[CrossRef]

Mach. Learning (1)

K. P. Bennett, N. Cristianini, J. Shawe-Taylor, D. Wu, “Enlarging the margins in perceptron decision trees,” Mach. Learning 41, 295–313 (2000).
[CrossRef]

Pattern Recogn. Lett. (1)

M. Elad, Y. Hel-Or, R. Keshet, “Pattern detection using a maximal rejection classifier,” Pattern Recogn. Lett. 23, 1459–1471 (2002).
[CrossRef]

Other (11)

Y. Hel-Or, H. Hel-Or, “Real time pattern matching using projection kernels,” interdisciplinary Tech. Rep. CS-2002-1 (2002), http://www.faculty.idc.ac.il/toky/Publications/publications.htm .

Y. Hel-Or, H. Hel-Or, “Generalized pattern matching using orbit decomposition,” presented at the International Conference on Image Processing, Barcelona, Spain (September 2003), http://www.faculty.idc.ac.il/toky/Publications/publications.htm .

P. Viola, M. Jones, “Robust real-time object detection,” presented at the ICCV Workshop on Statistical and Computation Theories of Vision, Vancouver, B.C., Canada (July 2001).

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001) (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

T. Hastie, J. Friedman, R. Tibshirani, Elements of Statistical Learning: Data Mining, Inference and Prediction (Springer-Verlag, Berlin, 2001).
[CrossRef]

N. Cristianini, J. Shawe-Taylor, Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge U. Press, New York, 2000).
[CrossRef]

V. Vapnik, The Nature of Statistical Learning Theory (Springer-Verlag, Berlin, 1995).
[CrossRef]

L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees (Wadsworth, Belmont, Calif., 1984).

J. R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, Los Altos, Calif., 1993).

M. Zhu, T. Hastie, “Feature extraction for non-parametric discriminant analysis,” J. Comput. Graphic. Statist. (to be published).

J. H. Friedman, “Getting Started with MART in R,” tutorial (April2002), http://www-stat.stanford.edu/∼jhf/ .

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1
Fig. 1

A set of linear classifiers can quickly find the convex hull of a target region.

Fig. 2
Fig. 2

Target region with holes. A radial basis SVM will find a good classifier in such a situation, whereas a set of linear classifiers will not.

Fig. 3
Fig. 3

Illustration of the frontier.

Fig. 4
Fig. 4

Illustrations of two artificial data sets with theoretical boundaries. (a) Example 1: two Gaussian distributions with different variances. (b) Example 2: two mixed uniform distributions, viewed from the first two dimensions, with different mixing parameters.

Fig. 5
Fig. 5

Comparison of the three algorithms for Example 1: (a) f curves for training data, (b) c-ROC curves for testing data. The controlled error rate (CER) algorithm always is the worst, and the weighting is nearly as good as the propagating algorithm. For the testing data, the similarity of the weighting and the propagating algorithms implies their good generalization properties in this setting.

Fig. 6
Fig. 6

Comparison of the three algorithms described for Example 1 in Fig. 5 for Example 2: (a) f curves for training data, (b) c-ROC curves for testing data. Similar conclusions to those for Example 1 can be drawn here.

Fig. 7
Fig. 7

Filled circles, point on the f curve by means of propagating on the training data; and open circles, points on the c-ROC curve for the same classifiers and the testing data. Curves have been drawn to connect corresponding pairs. (a) Example 1, (b) Example 2. The way in which the error rates change as the trained classifiers, which are trained by the training data, are applied to the testing data is shown.

Fig. 8
Fig. 8

Infrared image data: (a) f curve; c-ROC curve, comparison with the MRC.

Fig. 9
Fig. 9

The comparison among MRC, weighting, and SVM algorithms for infrared data. In this case the SVM yields the best performance. The weighting algorithm uses a single-coordinate-based classifier: x j > a. This demonstrates the importance of using a more-complex classifier at each stage.

Equations (35)

Equations on this page are rendered with MathJax. Learn more.

Input Data xf1x+b1<0 C10f2x+b2<0 C10f3x+b3<0 C10C2.
fjx+bj0,
fjx=xj,
fjx=cTx.
fjx=cTx+xTMx.
fjx=k=1N αkKx, xk,
fjx+bj0, all j;
y=1f1xt1,f2xt2,f3xt3,,fLxtL0otherwise,
xjc,
N=N1+N2,N2i-2=N2i-1+N2i, i=2, 3,, L,N2i-2j=N2i-1j+N2ij, i=2, 3,, L, j=0, 1.
i=1L N2i-11.
N2L0.
N=N0+N1 N20+N21=N2, N1=N10+N11 N40+N41=N4, N3=N30+N31 N60+N61=N6, N5=N50+N51  N2L0+N2L1=N2L, N2L-1=N2L-10+N2L-110′.1
E10h=i=1L N2i-11, E01h=N2L0.
fk; L=minhLE01h,subject toE10hk,
E01h=fE10h; L.
E10h*k, L=k,E01h*k, L=fk; L.
xja.
R1h=x:hx=1.
r*A, k1=argminhHA,k1 E01h.
mink1k E01hk1, l+E01r*R1hk1, l,k-k1.
Ai=R1hi, l, i=0, 1,, K.
r*Ai, k, k=0, 1,, K-i,
h˜0x=1, x.
fk=minL fk; L.
h*k=argminh*k,LL=1,2,3,E01h*k, L.
s*A, λ=argminhA, E01h+λE10h,
h˜0x=1, x.
x12+x22=209log307.
fiX=αiXA1/|A|1-αi+αiXA.
fix0
fix=12πi1/2exp-12 xT i-1 x, i=0, 1.
PrY=i|X=x=fixPrY=ii=0,1 fixPrY=i, i=0, 1.
logPrY=1|X=xPrY=0|X=x=logf1xf0x+logPY=1PY=0 =-12 xT 1-1 x+12 xT 0-1 x+log01/211/2+log37 =-920x12+x22+log3070.
x12+x22209log307.

Metrics