Abstract

Models of visual attention have focused predominantly on bottom-up approaches that ignored structured contextual and scene information. I propose a model of contextual cueing for attention guidance based on the global scene configuration. It is shown that the statistics of low-level features across the whole image can be used to prime the presence or absence of objects in the scene and to predict their location, scale, and appearance before exploring the image. In this scheme, visual context information can become available early in the visual processing chain, which allows modulation of the saliency of image regions and provides an efficient shortcut for object detection and recognition.

© 2003 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. L. Itti, C. Koch, E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998).
    [CrossRef]
  2. T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention,” Int. J. Comput. Vision 11, 283–318 (1993).
    [CrossRef]
  3. A. Treisman, G. Gelade, “A feature integration theory of attention,” Cogn. Psychol. 12, 97–136 (1980).
    [CrossRef] [PubMed]
  4. A. Shashua, S. Ullman, “Structural saliency: the detection of globally salient structures using a locally connected network,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 1988), pp. 321–327.
  5. J. M. Wolfe, “Guided search 2.0. A revised model of visual search,” Psychon. Bull. Rev. 1, 202–228 (1994).
    [CrossRef] [PubMed]
  6. R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.
  7. B. Moghaddam, A. Pentland, “Probabilistic visual learning for object representation,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 696–710 (1997).
    [CrossRef]
  8. A. L. Yarbus, Eye Movements and Vision (Plenum, New York, 1967).
  9. I. Biederman, R. J. Mezzanotte, J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982).
    [CrossRef] [PubMed]
  10. S. E. Palmer, “The effects of contextual scenes on the identification of objects,” Memory Cognit. 3, 519–526 (1975).
    [CrossRef]
  11. R. A. Rensink, J. K. O’Regan, J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997).
    [CrossRef]
  12. R. A. Rensink, “The dynamic representation of scenes,” Visual Cogn. 7, 17–42 (2000).
    [CrossRef]
  13. J. M. Henderson, A. Hollingworth, “High-level scene perception,” Annu. Rev. Psychol. 50, 243–271 (1999).
    [CrossRef] [PubMed]
  14. P. De Graef, D. Christiaens, G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990).
    [CrossRef] [PubMed]
  15. M. M. Chun, Y. Jiang, “Contextual cueing: implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol. 36, 28–71 (1998).
    [CrossRef] [PubMed]
  16. H. Arsenio, A. Oliva, J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002).
    [CrossRef]
  17. P. G. Schyns, A. Oliva, “From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition,” Psychol. Sci. 5, 195–200 (1994).
    [CrossRef]
  18. S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996).
    [CrossRef] [PubMed]
  19. M. C. Potter, E. I. Levy, “Recognition memory for a rapid sequence of pictures,” J. Exp. Psychol. 81, 10–15 (1969).
    [CrossRef] [PubMed]
  20. M. C. Potter, “Meaning in visual search,” Science 187, 965–966 (1975).
    [CrossRef] [PubMed]
  21. T. Sanocki, W. Epstein, “Priming spatial layout of scenes,” Psychol. Sci. 8, 374–378 (1997).
    [CrossRef]
  22. A. Oliva, P. G. Schyns, “Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli,” Cogn. Psychol. 34, 72–107 (1997).
    [CrossRef] [PubMed]
  23. A. Oliva, P. G. Schyns, “Diagnostic color blobs mediate scene recognition,” Cogn. Psychol. 41, 176–210 (2000).
    [CrossRef] [PubMed]
  24. A. Oliva, A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” Int. J. Comput. Vision 42, 145–175 (2001).
    [CrossRef]
  25. D. Noton, L. W. Stark, “Scanpaths in eye movements during pattern perception,” Science 171, 308–311 (1971).
    [CrossRef] [PubMed]
  26. D. A. Chernyak, L. W. Stark, “Top-down guided eye movements,” IEEE Trans. Syst. Man Cybern. 31, 514–522 (2001).
    [CrossRef]
  27. T. M. Strat, M. A. Fischler, “Context-based vision: recognizing objects using information from both 2-D and 3-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell. 13, 1050–1065 (1991).
    [CrossRef]
  28. J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
    [CrossRef]
  29. A. Torralba, P. Sinha, “Statistical context priming for object detection: scale selection and focus of attention,” in Proceedings of the International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 763–770.
  30. A. Torralba, “Contextual modulation of target saliency,” in Advances in Neural Information Processing Systems, T. G. Dietterich, S. Becker, Z. Ghahramani, eds. (MIT Press, Cambridge, Mass., 2002), Vol. 14, pp. 1303–1310.
  31. C. Koch, S. Ullman, “Shifts in visual attention: towards the underlying circuitry,” Hum. Neurobiol. 4, 219–227 (1985).
  32. D. Parkhurst, K. Law, E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002).
    [CrossRef] [PubMed]
  33. J. M. Wolfe, “Visual search,” in Attention, H. Pashler, ed. (University College London Press, London, 1998).
  34. D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A 4, 2379–2394 (1987).
    [CrossRef] [PubMed]
  35. B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature 381, 607–609 (1996).
    [CrossRef] [PubMed]
  36. B. Schiele, J. L. Crowley, “Recognition without correspondence using multidimensional receptive field histograms,” Int. J. Comput. Vision 36, 31–50 (2000).
    [CrossRef]
  37. C. Carson, S. Belongie, H. Greenspan, J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.
  38. M. M. Gorkani, R. W. Picard, “Texture orientation for sorting photos at a glance,” in Proceedings of the IEEE International Conference on Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 1994), Vol. 1, pp. 459–464.
  39. M. Szummer, R. W. Picard, “Indoor-outdoor image classification,” in Proceedings of the IEEE International Workshop on Content-Based Access of Image and Video Databases (IEEE Computer Society Press, Los Alamitos, Calif., 1998), pp. 42–51.
  40. A. Jepson, W. Richards, D. Knill, “Modal structures and reliable inference,” in Perception as Bayesian Inference, D. Knill, W. Richards eds. (Cambridge U. Press, Cambridge, UK, 1996), pp. 63–92.
  41. A. Treisman, “Properties, parts and objects,” in Handbook of Human Perception and Performance, K. R. Boff, L. Kaufman, J. P. Thomas, eds. (Wiley, New York, 1986), pp. 35.1–35.70.
  42. B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.
  43. P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision andPattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 511–518.
  44. S. Ullman, M. Vidal-Naquet, E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002).
    [PubMed]
  45. M. Riesenhuber, T. Poggio, “Hierarchical models of object recognition in cortex,” Nat. Neurosci. 2, 1019–1025 (1999).
    [CrossRef] [PubMed]
  46. S. Edelman, “Computational theories of object recognition,” Trends Cogn. Sci. 1, 296–304 (1997).
    [CrossRef] [PubMed]
  47. A. Torralba, A. Oliva, “Depth perception from familiar structure,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1226–1238 (2002).
    [CrossRef]
  48. M. P. Eckstein, J. S. Whiting, “Visual signal detection in structured backgrounds I. Effect of number of possible spatial locations and signal contrast,” J. Opt. Soc. Am. A13, 1777–1787 (1996).
    [CrossRef]
  49. M. Swain, D. Ballard, “Color indexing,” Int. J. Comput. Vision 7, 11–32 (1991).
    [CrossRef]
  50. R. Rosenholtz, “A simple saliency model predicts a number of motion popout phenomena,” Vision Res. 39, 3157–3163 (1999).
    [CrossRef]
  51. A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).
  52. N. Gershenfeld, The Nature of Mathematical Modeling (Cambridge U. Press, Cambridge, UK, 1999).
  53. M. I. Jordan, R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6, 181–214 (1994).
    [CrossRef]

2002

H. Arsenio, A. Oliva, J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002).
[CrossRef]

D. Parkhurst, K. Law, E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002).
[CrossRef] [PubMed]

S. Ullman, M. Vidal-Naquet, E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002).
[PubMed]

A. Torralba, A. Oliva, “Depth perception from familiar structure,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1226–1238 (2002).
[CrossRef]

2001

A. Oliva, A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” Int. J. Comput. Vision 42, 145–175 (2001).
[CrossRef]

D. A. Chernyak, L. W. Stark, “Top-down guided eye movements,” IEEE Trans. Syst. Man Cybern. 31, 514–522 (2001).
[CrossRef]

2000

A. Oliva, P. G. Schyns, “Diagnostic color blobs mediate scene recognition,” Cogn. Psychol. 41, 176–210 (2000).
[CrossRef] [PubMed]

R. A. Rensink, “The dynamic representation of scenes,” Visual Cogn. 7, 17–42 (2000).
[CrossRef]

B. Schiele, J. L. Crowley, “Recognition without correspondence using multidimensional receptive field histograms,” Int. J. Comput. Vision 36, 31–50 (2000).
[CrossRef]

1999

J. M. Henderson, A. Hollingworth, “High-level scene perception,” Annu. Rev. Psychol. 50, 243–271 (1999).
[CrossRef] [PubMed]

M. Riesenhuber, T. Poggio, “Hierarchical models of object recognition in cortex,” Nat. Neurosci. 2, 1019–1025 (1999).
[CrossRef] [PubMed]

R. Rosenholtz, “A simple saliency model predicts a number of motion popout phenomena,” Vision Res. 39, 3157–3163 (1999).
[CrossRef]

1998

L. Itti, C. Koch, E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998).
[CrossRef]

M. M. Chun, Y. Jiang, “Contextual cueing: implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol. 36, 28–71 (1998).
[CrossRef] [PubMed]

1997

T. Sanocki, W. Epstein, “Priming spatial layout of scenes,” Psychol. Sci. 8, 374–378 (1997).
[CrossRef]

A. Oliva, P. G. Schyns, “Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli,” Cogn. Psychol. 34, 72–107 (1997).
[CrossRef] [PubMed]

B. Moghaddam, A. Pentland, “Probabilistic visual learning for object representation,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 696–710 (1997).
[CrossRef]

R. A. Rensink, J. K. O’Regan, J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997).
[CrossRef]

S. Edelman, “Computational theories of object recognition,” Trends Cogn. Sci. 1, 296–304 (1997).
[CrossRef] [PubMed]

1996

B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature 381, 607–609 (1996).
[CrossRef] [PubMed]

M. P. Eckstein, J. S. Whiting, “Visual signal detection in structured backgrounds I. Effect of number of possible spatial locations and signal contrast,” J. Opt. Soc. Am. A13, 1777–1787 (1996).
[CrossRef]

S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996).
[CrossRef] [PubMed]

1995

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

1994

P. G. Schyns, A. Oliva, “From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition,” Psychol. Sci. 5, 195–200 (1994).
[CrossRef]

J. M. Wolfe, “Guided search 2.0. A revised model of visual search,” Psychon. Bull. Rev. 1, 202–228 (1994).
[CrossRef] [PubMed]

M. I. Jordan, R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6, 181–214 (1994).
[CrossRef]

1993

T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention,” Int. J. Comput. Vision 11, 283–318 (1993).
[CrossRef]

1991

T. M. Strat, M. A. Fischler, “Context-based vision: recognizing objects using information from both 2-D and 3-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell. 13, 1050–1065 (1991).
[CrossRef]

M. Swain, D. Ballard, “Color indexing,” Int. J. Comput. Vision 7, 11–32 (1991).
[CrossRef]

1990

P. De Graef, D. Christiaens, G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990).
[CrossRef] [PubMed]

1987

1985

C. Koch, S. Ullman, “Shifts in visual attention: towards the underlying circuitry,” Hum. Neurobiol. 4, 219–227 (1985).

1982

I. Biederman, R. J. Mezzanotte, J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982).
[CrossRef] [PubMed]

1980

A. Treisman, G. Gelade, “A feature integration theory of attention,” Cogn. Psychol. 12, 97–136 (1980).
[CrossRef] [PubMed]

1977

A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).

1975

S. E. Palmer, “The effects of contextual scenes on the identification of objects,” Memory Cognit. 3, 519–526 (1975).
[CrossRef]

M. C. Potter, “Meaning in visual search,” Science 187, 965–966 (1975).
[CrossRef] [PubMed]

1971

D. Noton, L. W. Stark, “Scanpaths in eye movements during pattern perception,” Science 171, 308–311 (1971).
[CrossRef] [PubMed]

1969

M. C. Potter, E. I. Levy, “Recognition memory for a rapid sequence of pictures,” J. Exp. Psychol. 81, 10–15 (1969).
[CrossRef] [PubMed]

Arsenio, H.

H. Arsenio, A. Oliva, J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002).
[CrossRef]

Ballard, D.

M. Swain, D. Ballard, “Color indexing,” Int. J. Comput. Vision 7, 11–32 (1991).
[CrossRef]

Ballard, D. H.

R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.

Belongie, S.

C. Carson, S. Belongie, H. Greenspan, J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.

Biederman, I.

I. Biederman, R. J. Mezzanotte, J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982).
[CrossRef] [PubMed]

Carson, C.

C. Carson, S. Belongie, H. Greenspan, J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.

Chernyak, D. A.

D. A. Chernyak, L. W. Stark, “Top-down guided eye movements,” IEEE Trans. Syst. Man Cybern. 31, 514–522 (2001).
[CrossRef]

Christiaens, D.

P. De Graef, D. Christiaens, G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990).
[CrossRef] [PubMed]

Chun, M. M.

M. M. Chun, Y. Jiang, “Contextual cueing: implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol. 36, 28–71 (1998).
[CrossRef] [PubMed]

Clark, J. J.

R. A. Rensink, J. K. O’Regan, J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997).
[CrossRef]

Crowley, J. L.

B. Schiele, J. L. Crowley, “Recognition without correspondence using multidimensional receptive field histograms,” Int. J. Comput. Vision 36, 31–50 (2000).
[CrossRef]

Culhane, S. M.

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

d’Ydewalle, G.

P. De Graef, D. Christiaens, G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990).
[CrossRef] [PubMed]

Davis, N.

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

De Graef, P.

P. De Graef, D. Christiaens, G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990).
[CrossRef] [PubMed]

Dempster, A. P.

A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).

Eckstein, M. P.

M. P. Eckstein, J. S. Whiting, “Visual signal detection in structured backgrounds I. Effect of number of possible spatial locations and signal contrast,” J. Opt. Soc. Am. A13, 1777–1787 (1996).
[CrossRef]

Edelman, S.

S. Edelman, “Computational theories of object recognition,” Trends Cogn. Sci. 1, 296–304 (1997).
[CrossRef] [PubMed]

Epstein, W.

T. Sanocki, W. Epstein, “Priming spatial layout of scenes,” Psychol. Sci. 8, 374–378 (1997).
[CrossRef]

Field, D. J.

B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature 381, 607–609 (1996).
[CrossRef] [PubMed]

D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A 4, 2379–2394 (1987).
[CrossRef] [PubMed]

Fischler, M. A.

T. M. Strat, M. A. Fischler, “Context-based vision: recognizing objects using information from both 2-D and 3-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell. 13, 1050–1065 (1991).
[CrossRef]

Fize, D.

S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996).
[CrossRef] [PubMed]

Gelade, G.

A. Treisman, G. Gelade, “A feature integration theory of attention,” Cogn. Psychol. 12, 97–136 (1980).
[CrossRef] [PubMed]

Gershenfeld, N.

N. Gershenfeld, The Nature of Mathematical Modeling (Cambridge U. Press, Cambridge, UK, 1999).

Gorkani, M. M.

M. M. Gorkani, R. W. Picard, “Texture orientation for sorting photos at a glance,” in Proceedings of the IEEE International Conference on Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 1994), Vol. 1, pp. 459–464.

Greenspan, H.

C. Carson, S. Belongie, H. Greenspan, J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.

Hayhoe, M. M.

R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.

Heisele, B.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Henderson, J. M.

J. M. Henderson, A. Hollingworth, “High-level scene perception,” Annu. Rev. Psychol. 50, 243–271 (1999).
[CrossRef] [PubMed]

Hollingworth, A.

J. M. Henderson, A. Hollingworth, “High-level scene perception,” Annu. Rev. Psychol. 50, 243–271 (1999).
[CrossRef] [PubMed]

Itti, L.

L. Itti, C. Koch, E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998).
[CrossRef]

Jacobs, R. A.

M. I. Jordan, R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6, 181–214 (1994).
[CrossRef]

Jepson, A.

A. Jepson, W. Richards, D. Knill, “Modal structures and reliable inference,” in Perception as Bayesian Inference, D. Knill, W. Richards eds. (Cambridge U. Press, Cambridge, UK, 1996), pp. 63–92.

Jiang, Y.

M. M. Chun, Y. Jiang, “Contextual cueing: implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol. 36, 28–71 (1998).
[CrossRef] [PubMed]

Jones, M.

P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision andPattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 511–518.

Jordan, M. I.

M. I. Jordan, R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6, 181–214 (1994).
[CrossRef]

Knill, D.

A. Jepson, W. Richards, D. Knill, “Modal structures and reliable inference,” in Perception as Bayesian Inference, D. Knill, W. Richards eds. (Cambridge U. Press, Cambridge, UK, 1996), pp. 63–92.

Koch, C.

L. Itti, C. Koch, E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998).
[CrossRef]

C. Koch, S. Ullman, “Shifts in visual attention: towards the underlying circuitry,” Hum. Neurobiol. 4, 219–227 (1985).

Lai, Y. H.

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

Laird, N. M.

A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).

Law, K.

D. Parkhurst, K. Law, E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002).
[CrossRef] [PubMed]

Levy, E. I.

M. C. Potter, E. I. Levy, “Recognition memory for a rapid sequence of pictures,” J. Exp. Psychol. 81, 10–15 (1969).
[CrossRef] [PubMed]

Lindeberg, T.

T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention,” Int. J. Comput. Vision 11, 283–318 (1993).
[CrossRef]

Malik, J.

C. Carson, S. Belongie, H. Greenspan, J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.

Marlot, C.

S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996).
[CrossRef] [PubMed]

Mezzanotte, R. J.

I. Biederman, R. J. Mezzanotte, J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982).
[CrossRef] [PubMed]

Moghaddam, B.

B. Moghaddam, A. Pentland, “Probabilistic visual learning for object representation,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 696–710 (1997).
[CrossRef]

Mukherjee, S.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Niebur, E.

D. Parkhurst, K. Law, E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002).
[CrossRef] [PubMed]

L. Itti, C. Koch, E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998).
[CrossRef]

Noton, D.

D. Noton, L. W. Stark, “Scanpaths in eye movements during pattern perception,” Science 171, 308–311 (1971).
[CrossRef] [PubMed]

Nuflo, F.

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

O’Regan, J. K.

R. A. Rensink, J. K. O’Regan, J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997).
[CrossRef]

Oliva, A.

H. Arsenio, A. Oliva, J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002).
[CrossRef]

A. Torralba, A. Oliva, “Depth perception from familiar structure,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1226–1238 (2002).
[CrossRef]

A. Oliva, A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” Int. J. Comput. Vision 42, 145–175 (2001).
[CrossRef]

A. Oliva, P. G. Schyns, “Diagnostic color blobs mediate scene recognition,” Cogn. Psychol. 41, 176–210 (2000).
[CrossRef] [PubMed]

A. Oliva, P. G. Schyns, “Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli,” Cogn. Psychol. 34, 72–107 (1997).
[CrossRef] [PubMed]

P. G. Schyns, A. Oliva, “From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition,” Psychol. Sci. 5, 195–200 (1994).
[CrossRef]

Olshausen, B. A.

B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature 381, 607–609 (1996).
[CrossRef] [PubMed]

Palmer, S. E.

S. E. Palmer, “The effects of contextual scenes on the identification of objects,” Memory Cognit. 3, 519–526 (1975).
[CrossRef]

Parkhurst, D.

D. Parkhurst, K. Law, E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002).
[CrossRef] [PubMed]

Pentland, A.

B. Moghaddam, A. Pentland, “Probabilistic visual learning for object representation,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 696–710 (1997).
[CrossRef]

Picard, R. W.

M. Szummer, R. W. Picard, “Indoor-outdoor image classification,” in Proceedings of the IEEE International Workshop on Content-Based Access of Image and Video Databases (IEEE Computer Society Press, Los Alamitos, Calif., 1998), pp. 42–51.

M. M. Gorkani, R. W. Picard, “Texture orientation for sorting photos at a glance,” in Proceedings of the IEEE International Conference on Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 1994), Vol. 1, pp. 459–464.

Poggio, T.

M. Riesenhuber, T. Poggio, “Hierarchical models of object recognition in cortex,” Nat. Neurosci. 2, 1019–1025 (1999).
[CrossRef] [PubMed]

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Potter, M. C.

M. C. Potter, “Meaning in visual search,” Science 187, 965–966 (1975).
[CrossRef] [PubMed]

M. C. Potter, E. I. Levy, “Recognition memory for a rapid sequence of pictures,” J. Exp. Psychol. 81, 10–15 (1969).
[CrossRef] [PubMed]

Rabinowitz, J. C.

I. Biederman, R. J. Mezzanotte, J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982).
[CrossRef] [PubMed]

Rao, R. P. N.

R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.

Rensink, R. A.

R. A. Rensink, “The dynamic representation of scenes,” Visual Cogn. 7, 17–42 (2000).
[CrossRef]

R. A. Rensink, J. K. O’Regan, J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997).
[CrossRef]

Richards, W.

A. Jepson, W. Richards, D. Knill, “Modal structures and reliable inference,” in Perception as Bayesian Inference, D. Knill, W. Richards eds. (Cambridge U. Press, Cambridge, UK, 1996), pp. 63–92.

Riesenhuber, M.

M. Riesenhuber, T. Poggio, “Hierarchical models of object recognition in cortex,” Nat. Neurosci. 2, 1019–1025 (1999).
[CrossRef] [PubMed]

Rosenholtz, R.

R. Rosenholtz, “A simple saliency model predicts a number of motion popout phenomena,” Vision Res. 39, 3157–3163 (1999).
[CrossRef]

Rubin, D. B.

A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).

Sali, E.

S. Ullman, M. Vidal-Naquet, E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002).
[PubMed]

Sanocki, T.

T. Sanocki, W. Epstein, “Priming spatial layout of scenes,” Psychol. Sci. 8, 374–378 (1997).
[CrossRef]

Schiele, B.

B. Schiele, J. L. Crowley, “Recognition without correspondence using multidimensional receptive field histograms,” Int. J. Comput. Vision 36, 31–50 (2000).
[CrossRef]

Schyns, P. G.

A. Oliva, P. G. Schyns, “Diagnostic color blobs mediate scene recognition,” Cogn. Psychol. 41, 176–210 (2000).
[CrossRef] [PubMed]

A. Oliva, P. G. Schyns, “Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli,” Cogn. Psychol. 34, 72–107 (1997).
[CrossRef] [PubMed]

P. G. Schyns, A. Oliva, “From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition,” Psychol. Sci. 5, 195–200 (1994).
[CrossRef]

Serre, T.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

Shashua, A.

A. Shashua, S. Ullman, “Structural saliency: the detection of globally salient structures using a locally connected network,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 1988), pp. 321–327.

Sinha, P.

A. Torralba, P. Sinha, “Statistical context priming for object detection: scale selection and focus of attention,” in Proceedings of the International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 763–770.

Stark, L. W.

D. A. Chernyak, L. W. Stark, “Top-down guided eye movements,” IEEE Trans. Syst. Man Cybern. 31, 514–522 (2001).
[CrossRef]

D. Noton, L. W. Stark, “Scanpaths in eye movements during pattern perception,” Science 171, 308–311 (1971).
[CrossRef] [PubMed]

Strat, T. M.

T. M. Strat, M. A. Fischler, “Context-based vision: recognizing objects using information from both 2-D and 3-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell. 13, 1050–1065 (1991).
[CrossRef]

Swain, M.

M. Swain, D. Ballard, “Color indexing,” Int. J. Comput. Vision 7, 11–32 (1991).
[CrossRef]

Szummer, M.

M. Szummer, R. W. Picard, “Indoor-outdoor image classification,” in Proceedings of the IEEE International Workshop on Content-Based Access of Image and Video Databases (IEEE Computer Society Press, Los Alamitos, Calif., 1998), pp. 42–51.

Thorpe, S.

S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996).
[CrossRef] [PubMed]

Torralba, A.

A. Torralba, A. Oliva, “Depth perception from familiar structure,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1226–1238 (2002).
[CrossRef]

A. Oliva, A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” Int. J. Comput. Vision 42, 145–175 (2001).
[CrossRef]

A. Torralba, “Contextual modulation of target saliency,” in Advances in Neural Information Processing Systems, T. G. Dietterich, S. Becker, Z. Ghahramani, eds. (MIT Press, Cambridge, Mass., 2002), Vol. 14, pp. 1303–1310.

A. Torralba, P. Sinha, “Statistical context priming for object detection: scale selection and focus of attention,” in Proceedings of the International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 763–770.

Treisman, A.

A. Treisman, G. Gelade, “A feature integration theory of attention,” Cogn. Psychol. 12, 97–136 (1980).
[CrossRef] [PubMed]

A. Treisman, “Properties, parts and objects,” in Handbook of Human Perception and Performance, K. R. Boff, L. Kaufman, J. P. Thomas, eds. (Wiley, New York, 1986), pp. 35.1–35.70.

Tsotsos, J. K.

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

Ullman, S.

S. Ullman, M. Vidal-Naquet, E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002).
[PubMed]

C. Koch, S. Ullman, “Shifts in visual attention: towards the underlying circuitry,” Hum. Neurobiol. 4, 219–227 (1985).

A. Shashua, S. Ullman, “Structural saliency: the detection of globally salient structures using a locally connected network,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 1988), pp. 321–327.

Vidal-Naquet, M.

S. Ullman, M. Vidal-Naquet, E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002).
[PubMed]

Viola, P.

P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision andPattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 511–518.

Wai, W. Y. K.

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

Whiting, J. S.

M. P. Eckstein, J. S. Whiting, “Visual signal detection in structured backgrounds I. Effect of number of possible spatial locations and signal contrast,” J. Opt. Soc. Am. A13, 1777–1787 (1996).
[CrossRef]

Wolfe, J. M.

H. Arsenio, A. Oliva, J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002).
[CrossRef]

J. M. Wolfe, “Guided search 2.0. A revised model of visual search,” Psychon. Bull. Rev. 1, 202–228 (1994).
[CrossRef] [PubMed]

J. M. Wolfe, “Visual search,” in Attention, H. Pashler, ed. (University College London Press, London, 1998).

Yarbus, A. L.

A. L. Yarbus, Eye Movements and Vision (Plenum, New York, 1967).

Zelinsky, G. J.

R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.

Annu. Rev. Psychol.

J. M. Henderson, A. Hollingworth, “High-level scene perception,” Annu. Rev. Psychol. 50, 243–271 (1999).
[CrossRef] [PubMed]

Artif. Intell.

J. K. Tsotsos, S. M. Culhane, W. Y. K. Wai, Y. H. Lai, N. Davis, F. Nuflo, “Modeling visual-attention via selective tuning,” Artif. Intell. 78, 507–545 (1995).
[CrossRef]

Cogn. Psychol.

A. Oliva, P. G. Schyns, “Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli,” Cogn. Psychol. 34, 72–107 (1997).
[CrossRef] [PubMed]

A. Oliva, P. G. Schyns, “Diagnostic color blobs mediate scene recognition,” Cogn. Psychol. 41, 176–210 (2000).
[CrossRef] [PubMed]

M. M. Chun, Y. Jiang, “Contextual cueing: implicit learning and memory of visual context guides spatial attention,” Cogn. Psychol. 36, 28–71 (1998).
[CrossRef] [PubMed]

I. Biederman, R. J. Mezzanotte, J. C. Rabinowitz, “Scene perception: detecting and judging objects undergoing relational violations,” Cogn. Psychol. 14, 143–177 (1982).
[CrossRef] [PubMed]

A. Treisman, G. Gelade, “A feature integration theory of attention,” Cogn. Psychol. 12, 97–136 (1980).
[CrossRef] [PubMed]

Hum. Neurobiol.

C. Koch, S. Ullman, “Shifts in visual attention: towards the underlying circuitry,” Hum. Neurobiol. 4, 219–227 (1985).

IEEE Trans. Pattern Anal. Mach. Intell.

T. M. Strat, M. A. Fischler, “Context-based vision: recognizing objects using information from both 2-D and 3-D imagery,” IEEE Trans. Pattern Anal. Mach. Intell. 13, 1050–1065 (1991).
[CrossRef]

L. Itti, C. Koch, E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998).
[CrossRef]

B. Moghaddam, A. Pentland, “Probabilistic visual learning for object representation,” IEEE Trans. Pattern Anal. Mach. Intell. 19, 696–710 (1997).
[CrossRef]

A. Torralba, A. Oliva, “Depth perception from familiar structure,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 1226–1238 (2002).
[CrossRef]

IEEE Trans. Syst. Man Cybern.

D. A. Chernyak, L. W. Stark, “Top-down guided eye movements,” IEEE Trans. Syst. Man Cybern. 31, 514–522 (2001).
[CrossRef]

Int. J. Comput. Vision

A. Oliva, A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” Int. J. Comput. Vision 42, 145–175 (2001).
[CrossRef]

T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention,” Int. J. Comput. Vision 11, 283–318 (1993).
[CrossRef]

M. Swain, D. Ballard, “Color indexing,” Int. J. Comput. Vision 7, 11–32 (1991).
[CrossRef]

B. Schiele, J. L. Crowley, “Recognition without correspondence using multidimensional receptive field histograms,” Int. J. Comput. Vision 36, 31–50 (2000).
[CrossRef]

J. Exp. Psychol.

M. C. Potter, E. I. Levy, “Recognition memory for a rapid sequence of pictures,” J. Exp. Psychol. 81, 10–15 (1969).
[CrossRef] [PubMed]

J. Opt. Soc. Am.

M. P. Eckstein, J. S. Whiting, “Visual signal detection in structured backgrounds I. Effect of number of possible spatial locations and signal contrast,” J. Opt. Soc. Am. A13, 1777–1787 (1996).
[CrossRef]

J. Opt. Soc. Am. A

J. R. Stat. Soc. Ser. B.

A. P. Dempster, N. M. Laird, D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B. 39, 1–38 (1977).

J. Vision

H. Arsenio, A. Oliva, J. M. Wolfe, “Exorcising ‘ghosts’ in repeated visual search,” J. Vision 2, 733a (2002).
[CrossRef]

Memory Cognit.

S. E. Palmer, “The effects of contextual scenes on the identification of objects,” Memory Cognit. 3, 519–526 (1975).
[CrossRef]

Nat. Neurosci.

S. Ullman, M. Vidal-Naquet, E. Sali, “Visual features of intermediate complexity and their use in classification,” Nat. Neurosci. 5, 682–687 (2002).
[PubMed]

M. Riesenhuber, T. Poggio, “Hierarchical models of object recognition in cortex,” Nat. Neurosci. 2, 1019–1025 (1999).
[CrossRef] [PubMed]

Nature

B. A. Olshausen, D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature 381, 607–609 (1996).
[CrossRef] [PubMed]

S. Thorpe, D. Fize, C. Marlot, “Speed of processing in the human visual system,” Nature 381, 520–522 (1996).
[CrossRef] [PubMed]

Neural Comput.

M. I. Jordan, R. A. Jacobs, “Hierarchical mixtures of experts and the EM algorithm,” Neural Comput. 6, 181–214 (1994).
[CrossRef]

Psychol. Res.

P. De Graef, D. Christiaens, G. d’Ydewalle, “Perceptual effects of scene context on object identification,” Psychol. Res. 52, 317–329 (1990).
[CrossRef] [PubMed]

Psychol. Sci.

P. G. Schyns, A. Oliva, “From blobs to boundary edges: evidence for time and spatial scale dependent scene recognition,” Psychol. Sci. 5, 195–200 (1994).
[CrossRef]

R. A. Rensink, J. K. O’Regan, J. J. Clark, “To see or not to see: the need for attention to perceive changes in scenes,” Psychol. Sci. 8, 368–373 (1997).
[CrossRef]

T. Sanocki, W. Epstein, “Priming spatial layout of scenes,” Psychol. Sci. 8, 374–378 (1997).
[CrossRef]

Psychon. Bull. Rev.

J. M. Wolfe, “Guided search 2.0. A revised model of visual search,” Psychon. Bull. Rev. 1, 202–228 (1994).
[CrossRef] [PubMed]

Science

M. C. Potter, “Meaning in visual search,” Science 187, 965–966 (1975).
[CrossRef] [PubMed]

D. Noton, L. W. Stark, “Scanpaths in eye movements during pattern perception,” Science 171, 308–311 (1971).
[CrossRef] [PubMed]

Trends Cogn. Sci.

S. Edelman, “Computational theories of object recognition,” Trends Cogn. Sci. 1, 296–304 (1997).
[CrossRef] [PubMed]

Vision Res.

R. Rosenholtz, “A simple saliency model predicts a number of motion popout phenomena,” Vision Res. 39, 3157–3163 (1999).
[CrossRef]

D. Parkhurst, K. Law, E. Niebur, “Modeling the role of salience in the allocation of overt visual attention,” Vision Res. 42, 107–123 (2002).
[CrossRef] [PubMed]

Visual Cogn.

R. A. Rensink, “The dynamic representation of scenes,” Visual Cogn. 7, 17–42 (2000).
[CrossRef]

Other

R. P. N. Rao, G. J. Zelinsky, M. M. Hayhoe, D. H. Ballard, “Modeling saccadic targeting in visual search,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, eds. (MIT Press, Cambridge, Mass., 1996), Vol. 8, pp. 830–836.

A. L. Yarbus, Eye Movements and Vision (Plenum, New York, 1967).

A. Shashua, S. Ullman, “Structural saliency: the detection of globally salient structures using a locally connected network,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 1988), pp. 321–327.

J. M. Wolfe, “Visual search,” in Attention, H. Pashler, ed. (University College London Press, London, 1998).

A. Torralba, P. Sinha, “Statistical context priming for object detection: scale selection and focus of attention,” in Proceedings of the International Conference on Computer Vision (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 763–770.

A. Torralba, “Contextual modulation of target saliency,” in Advances in Neural Information Processing Systems, T. G. Dietterich, S. Becker, Z. Ghahramani, eds. (MIT Press, Cambridge, Mass., 2002), Vol. 14, pp. 1303–1310.

N. Gershenfeld, The Nature of Mathematical Modeling (Cambridge U. Press, Cambridge, UK, 1999).

C. Carson, S. Belongie, H. Greenspan, J. Malik, “Region-based image querying,” in Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Li-braries (IEEE Computer Society Press, Los Alamitos, Calif., 1997), pp. 42–49.

M. M. Gorkani, R. W. Picard, “Texture orientation for sorting photos at a glance,” in Proceedings of the IEEE International Conference on Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 1994), Vol. 1, pp. 459–464.

M. Szummer, R. W. Picard, “Indoor-outdoor image classification,” in Proceedings of the IEEE International Workshop on Content-Based Access of Image and Video Databases (IEEE Computer Society Press, Los Alamitos, Calif., 1998), pp. 42–51.

A. Jepson, W. Richards, D. Knill, “Modal structures and reliable inference,” in Perception as Bayesian Inference, D. Knill, W. Richards eds. (Cambridge U. Press, Cambridge, UK, 1996), pp. 63–92.

A. Treisman, “Properties, parts and objects,” in Handbook of Human Perception and Performance, K. R. Boff, L. Kaufman, J. P. Thomas, eds. (Wiley, New York, 1986), pp. 35.1–35.70.

B. Heisele, T. Serre, S. Mukherjee, T. Poggio, “Feature reduction and hierarchy of classifiers for fast object detection in video images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 2, pp. 18–24.

P. Viola, M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the IEEE Computer Society Conference on Computer Vision andPattern Recognition (IEEE Computer Society Press, Los Alamitos, Calif., 2001), Vol. 1, pp. 511–518.

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1
Fig. 1

Examples of scene/context influences in object recognition and search. (a) Examples with increasing contextual information but where the local target information remains constant. Observer recognition improves drastically as background information is added. (b) Scene information affects the efficiency of search for a target (a person). The context acts in two competing ways: (1) by introducing distractors but also (2) by offering more constraints on target location. Most previous models of attention focus on modeling masking effects (saliency maps).

Fig. 2
Fig. 2

Images that are similar in terms of global spatial properties have a tendency to be composed of similar objects with similar spatial arrangement.17,24 Since scene semantics may be available early in the visual processing, these regularities suggest that an efficient procedure for object search in a new scene is to see how objects were organized in similar environments.

Fig. 3
Fig. 3

Scheme that incorporates contextual information to select candidate target locations. The scheme consists of two parallel pathways: The first processes local image information, and the second encodes globally the pattern of activation of the feature maps. Contextual information is obtained by projecting the feature maps into the (holistic) principal components. In the task of looking for a person in the image, the saliency map, which is task independent, will select image regions that are salient in terms of local orientations and spatial frequencies. However, the contextual priming (task dependent) will drive attention to the image regions that can contain the target object (sidewalks for pedestrian). Combining context and saliency gives better candidates for the location of the target.

Fig. 4
Fig. 4

Contextual priming of superordinate object categories (1, people; 2, furniture; 3, vegetation; 4, vehicles). The heights of the bars show the model’s predictions of the likelihood P(o|vC) of finding members of these four categories in each scene.

Fig. 5
Fig. 5

Model results on context-driven focus of attention in the task of looking for faces (left) and vegetation (right). Examples of real-world scenes and the image regions with the largest likelihood P(x, o|vC)=P(x, o|vC)P(o|vC). The two foci of attention for each image show how the task (o=faces or o=trees) changes the way attention is deployed in the image in considering scene/context information. The factor P(o|vC) is included here to illustrate how attention is not driven to any image region when the target object o is inconsistent with the context (e.g., trees in an indoor scene or pedestrians on a highway).

Fig. 6
Fig. 6

Scale priming from a familiar context. (a) Examples of scenes and the model’s estimate of the size of a face at the center of focus of attention. (b) Scale estimation results plotted against ground truth.

Fig. 7
Fig. 7

Selection of prototypical object appearances based on contextual cues.

Fig. 8
Fig. 8

(a) Input image (color is not taken into account). The task is to look for pedestrians. (b) Bottom-up saliency map, S(x). (c) Context-driven focus of attention, Sc(x). The image region in the shadow is not relevant for the task, and saliency is suppressed. (d) Points that correspond to the largest salience S(x). (e) Image regions with the largest salience, including contextual priming, Sc(x).

Fig. 9
Fig. 9

The strength of contextual features in providing priors for objects depends on two factors: (1) how well the contextual features differentiate between different scenes and (2) how strong the relationship is between the object of interest and the scene.

Equations (13)

Equations on this page are rendered with MathJax. Learn more.

vk(x)=xi(x)gk(x-x),
P(O|vL, vC)=1P(vL|vC) P(vL|O, vC)P(O|vC).
P(O|vC)=P(t|x, vC, o)P(x|vC, o)P(o|vC).
an=xk|v(x, k)|ψn(x, k).
Sc(x)=S(x)P(x|o, vC)P(o|vC),
P(vL|vC)=i=1NbiG(vL; μi, Xi),
G(vL; μ, X)=exp[-1/2(vL-μ)TX-1(vL-μ)](2π)N/2|X|1/2.
hik(t)=bikG(vt; μik, Xik)i=1LbikG(vt; μik, Xik),
bik+1=t=1Nthik(t)i=1Lt=1Nthik(t),
μik+1=t=1Nthik(t)vtt=1Nthik(t),
Xik+1=t=1Nthik(t)(vt-μik+1)(vt-μik+1)Tt=1Nthik(t).
P(vC|o)=i=1NbiG(vC; ai, Ai),
P(x, vC|o)=i=1NbiG(x; xi, Xi)G(vC; vi, Vi).

Metrics