Abstract

This Letter presents a new computational model of visual saliency. A new definition for saliency is proposed: saliency is novelty, which guides the deployment of visual attention. We define novelty as coming from regions that contain dissimilarities from the global scene. Our approach consists of two stages: First, obtain a global perspective. The global representation is obtained with a visual vocabulary. A novelty factor for each visual word is introduced according to the “repetition suppression principle.” Second, obtain a local perspective. A local representation is achieved from the histogram of visual word occurrence. The metric of saliency is defined as the overall novelty factor of the visual words. Experimental results demonstrate good performance of the proposed model on complex scenes and fair consistency with human eye fixation data.

© 2012 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. O. Ben-Shahar and G. Ben-Yosef, J. Opt. Soc. Am. A 25, 1974 (2008).
    [CrossRef]
  2. B. C. Ko and J. Y. Nam, J. Opt. Soc. Am. A 23, 2462 (2006).
    [CrossRef]
  3. E. Vazquez, T. Gevers, M. Lucassen, J. van de Weijer, and R. Baldrich, J. Opt. Soc. Am. A 27, 613 (2010).
    [CrossRef]
  4. L. Itti, C. Koch, and E. Niebur, IEEE Trans. Pattern Anal. 20, 1254 (1998).
    [CrossRef]
  5. N. D. B. Bruce and J. K. Tsotsos, in Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, eds. (MIT, 2005), pp. 155–162.
  6. X. Hou and L. Zhang, in 2007 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 2280–2287.
  7. Y. N. Xu, Y. Zhao, C. F. Jin, Z. F. Qu, L. P. Liu, and X. D. Sun, Opt. Lett. 35, 475 (2010).
    [CrossRef]
  8. X. Hou, and L. Zhang, in Advances in Neural Information Processing Systems 21, D. Koller, Y. Bengio, D. Schuurmans, and L. Bottou, eds. (Curran, 2009), pp. 681–688.
  9. J. Forster, J. Exp. Psychol. Gen. 140, 364 (2011).
    [CrossRef]
  10. C. Ranganath and G. Rainer, Nat. Rev. Neurosci. 4, 193 (2003).
    [CrossRef]
  11. N. D. B. Bruce, Fixation data and AIM MATLAB code, http://www-sop.inria.fr/members/Neil.Bruce/#SOURCECODE .
  12. T. Ojala, M. Pietikäinen, and T. Mäenpää, IEEE Trans. Pattern Anal. Machine Intell. 24, 971 (2002).
    [CrossRef]

2011 (1)

J. Forster, J. Exp. Psychol. Gen. 140, 364 (2011).
[CrossRef]

2010 (2)

2008 (1)

2006 (1)

2003 (1)

C. Ranganath and G. Rainer, Nat. Rev. Neurosci. 4, 193 (2003).
[CrossRef]

2002 (1)

T. Ojala, M. Pietikäinen, and T. Mäenpää, IEEE Trans. Pattern Anal. Machine Intell. 24, 971 (2002).
[CrossRef]

1998 (1)

L. Itti, C. Koch, and E. Niebur, IEEE Trans. Pattern Anal. 20, 1254 (1998).
[CrossRef]

Baldrich, R.

Ben-Shahar, O.

Ben-Yosef, G.

Bruce, N. D. B.

N. D. B. Bruce and J. K. Tsotsos, in Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, eds. (MIT, 2005), pp. 155–162.

Forster, J.

J. Forster, J. Exp. Psychol. Gen. 140, 364 (2011).
[CrossRef]

Gevers, T.

Hou, X.

X. Hou, and L. Zhang, in Advances in Neural Information Processing Systems 21, D. Koller, Y. Bengio, D. Schuurmans, and L. Bottou, eds. (Curran, 2009), pp. 681–688.

X. Hou and L. Zhang, in 2007 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 2280–2287.

Itti, L.

L. Itti, C. Koch, and E. Niebur, IEEE Trans. Pattern Anal. 20, 1254 (1998).
[CrossRef]

Jin, C. F.

Ko, B. C.

Koch, C.

L. Itti, C. Koch, and E. Niebur, IEEE Trans. Pattern Anal. 20, 1254 (1998).
[CrossRef]

Liu, L. P.

Lucassen, M.

Mäenpää, T.

T. Ojala, M. Pietikäinen, and T. Mäenpää, IEEE Trans. Pattern Anal. Machine Intell. 24, 971 (2002).
[CrossRef]

Nam, J. Y.

Niebur, E.

L. Itti, C. Koch, and E. Niebur, IEEE Trans. Pattern Anal. 20, 1254 (1998).
[CrossRef]

Ojala, T.

T. Ojala, M. Pietikäinen, and T. Mäenpää, IEEE Trans. Pattern Anal. Machine Intell. 24, 971 (2002).
[CrossRef]

Pietikäinen, M.

T. Ojala, M. Pietikäinen, and T. Mäenpää, IEEE Trans. Pattern Anal. Machine Intell. 24, 971 (2002).
[CrossRef]

Qu, Z. F.

Rainer, G.

C. Ranganath and G. Rainer, Nat. Rev. Neurosci. 4, 193 (2003).
[CrossRef]

Ranganath, C.

C. Ranganath and G. Rainer, Nat. Rev. Neurosci. 4, 193 (2003).
[CrossRef]

Sun, X. D.

Tsotsos, J. K.

N. D. B. Bruce and J. K. Tsotsos, in Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, eds. (MIT, 2005), pp. 155–162.

van de Weijer, J.

Vazquez, E.

Xu, Y. N.

Zhang, L.

X. Hou, and L. Zhang, in Advances in Neural Information Processing Systems 21, D. Koller, Y. Bengio, D. Schuurmans, and L. Bottou, eds. (Curran, 2009), pp. 681–688.

X. Hou and L. Zhang, in 2007 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 2280–2287.

Zhao, Y.

IEEE Trans. Pattern Anal. (1)

L. Itti, C. Koch, and E. Niebur, IEEE Trans. Pattern Anal. 20, 1254 (1998).
[CrossRef]

IEEE Trans. Pattern Anal. Machine Intell. (1)

T. Ojala, M. Pietikäinen, and T. Mäenpää, IEEE Trans. Pattern Anal. Machine Intell. 24, 971 (2002).
[CrossRef]

J. Exp. Psychol. Gen. (1)

J. Forster, J. Exp. Psychol. Gen. 140, 364 (2011).
[CrossRef]

J. Opt. Soc. Am. A (3)

Nat. Rev. Neurosci. (1)

C. Ranganath and G. Rainer, Nat. Rev. Neurosci. 4, 193 (2003).
[CrossRef]

Opt. Lett. (1)

Other (4)

N. D. B. Bruce and J. K. Tsotsos, in Advances in Neural Information Processing Systems 18, Y. Weiss, B. Schölkopf, and J. Platt, eds. (MIT, 2005), pp. 155–162.

X. Hou and L. Zhang, in 2007 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), pp. 2280–2287.

X. Hou, and L. Zhang, in Advances in Neural Information Processing Systems 21, D. Koller, Y. Bengio, D. Schuurmans, and L. Bottou, eds. (Curran, 2009), pp. 681–688.

N. D. B. Bruce, Fixation data and AIM MATLAB code, http://www-sop.inria.fr/members/Neil.Bruce/#SOURCECODE .

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1.
Fig. 1.

Stage 1—global perspective. First, a visual vocabulary for the scene is built. Second, novelty factors are generated for each visual word following the “repetition suppression principle.”

Fig. 2.
Fig. 2.

Differences in visual word occurrence between the background and the object. The background shares many similarities with the global scene; thus, the distribution of visual word occurrence does not change much. The object has certain distinguishable properties, and thus large changes occur in the distribution of visual word occurrence.

Fig. 3.
Fig. 3.

Stage 2—local perspective. The saliency values of the object and the background vary a lot. Generally, objects preserve certain properties different from the background. Hence, the visual words for the object mostly carry high novelty factors, which result in high saliency. For the background, which has many similarities to the overall scene, visual words are those with low novelty factors, which results in low saliency.

Fig. 4.
Fig. 4.

Comparison between predictive fixations from our model and the ground truth. From left to right are the input images, the saliency maps generated by our model, and human eye fixations, respectively.

Fig. 5.
Fig. 5.

Saliency maps and object maps. (a1) Original image, of size 460×288; (a2) human-labeled map; (b)–(f) saliency maps from the models of Itti et al. [4], Bruce and Tsotsos [5], Hou and Zhang [6,8], and this work, respectively; (g)–(k) object maps for (b)–(f), respectively. The threshold of the object maps was set to 0.8.

Fig. 6.
Fig. 6.

Saliency maps and object maps. (a1) Original image of size 402×262; (a2) human-labeled map; (b)–(f) saliency maps from the models of Itti et al. [4], Bruce and Tsotsos [5], Hou and Zhang [6,8], and this work, respectively; (g)–(k) object maps for (b)–(f) respectively. The threshold of the object maps was set to 0.8.

Tables (1)

Tables Icon

Table 1. ROC Areas of Different Models

Equations (5)

Equations on this page are rendered with MathJax. Learn more.

I={frq(Wkf)},WkfΩ
Ω={Wkf}={[W1color,,WNcolorcolor];[W1texture,,WNtexturetexture]},
φkf=1/frq(Wkf).
Im={frqm(Wkf)},WkfΩ,
sal(Im)=fFNfk=1frqm(Wkf)·φkf,

Metrics