Abstract

A model is presented, consonant with current views regarding the neurophysiology and psychophysics of motion perception, that combines the outputs of a set of spatiotemporal motion-energy filters to extract optical flow. The output velocity is encoded as the peak in a distribution of velocity-tuned units that behave much like cells of the middle temporal area of the primate brain. The model appears to deal with the aperture problem as well as the human visual system since it extracts the correct velocity for patterns that have large differences in contrast at different spatial orientations, and it simulates psychophysical data on the coherence of sine-grating plaid patterns.

© 1987 Optical Society of America

Full Article  |  PDF Article
OSA Recommended Articles
Filter selection model for motion segmentation and velocity integration

Steven J. Nowlan and Terrence J. Sejnowski
J. Opt. Soc. Am. A 11(12) 3177-3200 (1994)

Model of human visual-motion sensing

Andrew B. Watson and Albert J. Ahumada
J. Opt. Soc. Am. A 2(2) 322-342 (1985)

Spectrum analysis of motion parallax in a 3D cluttered scene and application to egomotion

Richard Mann and Michael S. Langer
J. Opt. Soc. Am. A 22(9) 1717-1731 (2005)

References

  • View by:
  • |
  • |
  • |

  1. S. T. Barnard, W. B. Thomson, “Disparity analysis of images,”IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2, 333–340 (1980).
    [Crossref]
  2. B. K. P. Horn, B. G. Schunk, “Determining optical flow,” Artif. Intell. 17, 185–203 (1981).
    [Crossref]
  3. J. K. Kearney, W. B. Thompson, “An error analysis of gradient-based methods for optical flow estimation,” IEEE Trans. Pattern Analy. Mach. Intell. 19, 229–244 (1987).
    [Crossref]
  4. An error analysis of gradient-based methods3 confirms that a major problem with the approach is that large errors are made where the image is highly textured, where there is the greatest amount of motion information!
  5. A. B. Watson, A. J. Ahumada, “A look at motion in the frequency domain,” (NASA-Ames Research Center, Moffett Field, Calif., 1983).
  6. A. B. Watson, A. J. Ahumada, “Model of human visual-motion sensing,” J. Opt. Soc. Am. A, 2, 322–342 (1985).
    [Crossref] [PubMed]
  7. E. H. Adelson, J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A, 2, 284–299 (1985).
    [Crossref] [PubMed]
  8. J. P. H. van Santen, G. Sperling, “Elaborated Reichardt detectors,” J. Opt. Soc. Am. A, 2, 300–321 (1985).
    [Crossref] [PubMed]
  9. D. J. Fleet, “The early processing of spatio-temporal visual information,” M.S. thesis (University of Toronto, Toronto, Canada, 1984; available as ).
  10. D. J. Fleet, A. D. Jepson, “A cascaded filter approach to the construction of velocity selective mechanisms,” (University of Toronto, Toronto, Canada, 1984).
  11. E. C. Hildreth, “Computations underlying the measurement of visual motion,” Artif. Intell. 23, 309–355 (1984).
    [Crossref]
  12. M. Fahle, T. Poggio, “Visual hyperacuity: spatio-temporal interpolation in human vision,” Proc. R. Soc. London Ser. B 213, 451–477 (1981).
    [Crossref]
  13. In their earlier paper Watson and Ahumada5 also employ Gabor filters but not Gabor energy. In their later work6 they abandon Gabor filters. Adelson and Bergen7 do not use Gabor filters, but they do compute motion energy.
  14. D. Gabor, “Theory of communication,”J. Inst. Electr. Eng. 93, 429–457 (1946).
  15. J. G. Daugman, “Two-dimensional analysis of cortical receptive field profiles,” Vision Res. 20, 846–856 (1980).
    [Crossref]
  16. J. G. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters,” J. Opt. Soc. Am. A, 2, 1160–1169 (1985).
    [Crossref] [PubMed]
  17. Complexity is defined as the order of magnitude, O( ), of the number of operations required for a computation.
  18. P. Burt, “Fast algorithms for estimating local image properties,” Comput. Vision Graphics Image Process. 21, 368–382 (1983).
    [Crossref]
  19. Filters higher in the pyramid achieve their peak response for patterns with lower spatial frequency but with the same temporal frequency. Thus the lower-frequency filters have their greatest outputs for patterns moving at greater velocities. Psychophysical evidence (see Watson and Ahumada6 for references) suggests that human motion channels exhibit such a relationship between spatial frequency and velocity. This makes sense from a computational viewpoint since patterns containing only high spatial frequencies may move at only low velocities, whereas patterns containing only lower spatial frequencies may move at greater velocities (see the discussion in Subsection 2.A on sampling and temporal aliasing).
  20. P. E. Gill, W. Murray, M. H. Wright, Practical Optimization (Academic, New York, 1981).
  21. Equation (10) subtracts f(u, v) from a contrast-dependent value, ∑(mi)2. This was an arbitrary choice, and a number of other contrast-dependent values might be substituted. Further investigation may indicate which, if any, of these measures most closely models the physiology.
  22. R. A. Hummel, S. W. Zucker, “On the foundations of relaxing labelling processes,”IEEE Trans. Pattern Analy. Mach. Intell. 5, 267–287 (1983).
    [Crossref]
  23. D. Terzopoulos, “Regularization of inverse visual problems involving discontinuities,”IEEE Trans. Pattern Anal. Mach. Intell. 8, 413–424 (1986).
    [Crossref]
  24. If the stimulus has uncorrelated random phase, then the dc problem may be alleviated by using only sine-phase filters: a phase-independent motion energy can be computed from sine-phase filters alone by averaging their squared outputs within appropriately sized windows.
  25. Brownian fractal functions (see Mandelbrot26 for definitions and references) are characterized by similarity across scales and have an expected power spectrum that falls off as P(ω) = ω−β for some constant β. Fractals may be used to generate natural-looking textures.
  26. B. B. Mandlebrot, The Fractal Geometry of Nature (Freeman, New York, 1983).
  27. Special Interest Group in Computer Graphics, Institute of Electrical and Electronics Engineers.
  28. Since the image velocities in the Yosemite fly-through image sequence are as high as 5 pixels per frame, we must use three levels from the pyramid. In future research, I hope to develop a rule for automatically combining estimates from the different levels. For now, I simply pick the level that is most appropriate for a given image region: the level-zero estimate is chosen if the actual velocity is between 0 and 1.25 pixels per frame, the level-one estimate is chosen if it is between 1.25 and 2.5 pixels per frame, and the level-two estimate is chosen if it is between 2.5 and 5.0 pixels per frame. In the Yosemite fly-through image sequence, there are regions of low contrast adjacent to high-contrast regions (e.g., the face of El Capitan and the cloud region are of low contrast). This exacerbates the smoothing problem discussed in Subsection 4.B. For this image sequence, contrast was first equalized by computing the zero crossings (see Hildreth11 for references) of each image. The model was then applied to the resulting zero-crossing image sequence.
  29. Area MT is also known as V5.
  30. The maximum of Eq. (10) can be located to any precision by using a finer or coarser grid. Also, the grid need be only of limited extent, since bandpass filtering limits the range of possible velocities (as discussed in Subsection 2.A).
  31. Preliminary investigation indicates that absolute value may be substituted for squaring.
  32. The model does not always recover the correct pattern-flow velocity for sine-grating plaids; e.g., for plaids made up of gratings with equal contrasts, equal spatial frequencies, and equal speeds, the model’s estimates are in error (correct direction of motion but wrong speed) when the spatial frequencies of the gratings are not equal to the spatial-frequency tuning of the filters or when the angle between the gratings differs from 90 deg. The model might be more robust with respect to these factors if it utilized more motion-energy filters tuned to a greater number of orientations and spatial frequencies.
  33. E. H. Adelson, J. A. Movshon, “Phenomenal coherence of moving visual patterns,” Nature 300, 523–525 (1982).
    [Crossref] [PubMed]
  34. E. H. Adelson, Media-Technology Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 (personal communication).
  35. An important psychophysical experiment is to measure the perceived velocity of plaids with various relative angles, contrasts, and spatial frequencies.
  36. M. P. doCarmo, Differential Geometry of Curves and Surfaces (Prentice-Hall Inc., Englewood Cliffs, N. J., 1976).
  37. The normal-flow direction is perpendicular to the direction of minimum curvature, and the normal-flow speed is the dot product of any position along the ridge with the normal-flow direction.
  38. Different subjects were used to collect the data in Figs. 13(a) and 14(a). Thus the data in these two plots are inconsistent with each other, requiring that different curvature thresholds be used to generate Figs. 13(b) and 14(b). The eventual goal is to simulate all the data for one subject with one choice of parameters.
  39. W. T. Newsome, M. S. Gizzi, J. A. Movshon, “Spatial and temporal properties of neurons in macaque mt,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 106 (1983).
  40. J. A. Movshon, Department of Psychology, New York University, New York, New York 10003 (personal communication).
  41. K. Naka, W. A. H. Rushton, “S-potentials from luminosity units in the retina of fish (cyprinidiae),”J. Physiol. 188, 587–599 (1966).
  42. S. Marcelja, “Mathematical description of the response of simple cortical cells,” J. Opt. Soc. Am. A, 70, 297–1300 (1980).
  43. J. McLean-Palmer, J. Jones, L. Palmer, “New degrees of freedom in the structure of simple receptive fields,” Invest. Ophthalmol. Vis. Sci. Suppl. 26, 265 (1985).
  44. J. McLean-Palmer, Department of Neuroscience, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (personal communication).
  45. R. Emerson, M. Citron, W. Vaughn, S. Klein, “Substructure in directionally selective complex receptive fields of cat,” Invest. Ophthalmol. Vis. Sci. Suppl. 27, 16 (1986).
  46. D. E. Rummelhart, J. L. McClelland, eds., Parallel Distributed Processing: Explorations in the Microstructure of Cognition (MIT, Cambridge, Mass., 1986).
  47. A. Mikami, W. T. Newsome, R. H. Wurtz, “Mechanisms of direction and speed selectivity in the middle temporal visual area (mt) of the macaque monkey,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 107 (1983).
  48. D. J. Felleman, J. H. Kaas. Receptive-field properties of neurons in middle temporal visual area (mt) of owl monkeys.J. Neurophysiol. 52, 488–513 (1984).
    [PubMed]
  49. J. A. Movshon, E. H. Adelson, M. S. Gizzi, W. T. Newsome, “The analysis of moving visual patterns,” in Experimental Brain Research Supplementum II: Pattern Recognition Mechanisms, C. Chagas, R. Gattass, C. Gross, eds. (Springer-Verlag, New York, 1985), pp. 117–151.
  50. J. Allman (Division of Biology, California Institute of Technology, Pasadena, California 91125), in his talk at the 1986 Symposium on Computational Models in Human Vision at the Center for Visual Sciences, University of Rochester.

1987 (1)

J. K. Kearney, W. B. Thompson, “An error analysis of gradient-based methods for optical flow estimation,” IEEE Trans. Pattern Analy. Mach. Intell. 19, 229–244 (1987).
[Crossref]

1986 (2)

D. Terzopoulos, “Regularization of inverse visual problems involving discontinuities,”IEEE Trans. Pattern Anal. Mach. Intell. 8, 413–424 (1986).
[Crossref]

R. Emerson, M. Citron, W. Vaughn, S. Klein, “Substructure in directionally selective complex receptive fields of cat,” Invest. Ophthalmol. Vis. Sci. Suppl. 27, 16 (1986).

1985 (5)

A. B. Watson, A. J. Ahumada, “Model of human visual-motion sensing,” J. Opt. Soc. Am. A, 2, 322–342 (1985).
[Crossref] [PubMed]

E. H. Adelson, J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A, 2, 284–299 (1985).
[Crossref] [PubMed]

J. P. H. van Santen, G. Sperling, “Elaborated Reichardt detectors,” J. Opt. Soc. Am. A, 2, 300–321 (1985).
[Crossref] [PubMed]

J. G. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters,” J. Opt. Soc. Am. A, 2, 1160–1169 (1985).
[Crossref] [PubMed]

J. McLean-Palmer, J. Jones, L. Palmer, “New degrees of freedom in the structure of simple receptive fields,” Invest. Ophthalmol. Vis. Sci. Suppl. 26, 265 (1985).

1984 (2)

D. J. Felleman, J. H. Kaas. Receptive-field properties of neurons in middle temporal visual area (mt) of owl monkeys.J. Neurophysiol. 52, 488–513 (1984).
[PubMed]

E. C. Hildreth, “Computations underlying the measurement of visual motion,” Artif. Intell. 23, 309–355 (1984).
[Crossref]

1983 (4)

P. Burt, “Fast algorithms for estimating local image properties,” Comput. Vision Graphics Image Process. 21, 368–382 (1983).
[Crossref]

R. A. Hummel, S. W. Zucker, “On the foundations of relaxing labelling processes,”IEEE Trans. Pattern Analy. Mach. Intell. 5, 267–287 (1983).
[Crossref]

A. Mikami, W. T. Newsome, R. H. Wurtz, “Mechanisms of direction and speed selectivity in the middle temporal visual area (mt) of the macaque monkey,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 107 (1983).

W. T. Newsome, M. S. Gizzi, J. A. Movshon, “Spatial and temporal properties of neurons in macaque mt,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 106 (1983).

1982 (1)

E. H. Adelson, J. A. Movshon, “Phenomenal coherence of moving visual patterns,” Nature 300, 523–525 (1982).
[Crossref] [PubMed]

1981 (2)

B. K. P. Horn, B. G. Schunk, “Determining optical flow,” Artif. Intell. 17, 185–203 (1981).
[Crossref]

M. Fahle, T. Poggio, “Visual hyperacuity: spatio-temporal interpolation in human vision,” Proc. R. Soc. London Ser. B 213, 451–477 (1981).
[Crossref]

1980 (3)

S. T. Barnard, W. B. Thomson, “Disparity analysis of images,”IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2, 333–340 (1980).
[Crossref]

J. G. Daugman, “Two-dimensional analysis of cortical receptive field profiles,” Vision Res. 20, 846–856 (1980).
[Crossref]

S. Marcelja, “Mathematical description of the response of simple cortical cells,” J. Opt. Soc. Am. A, 70, 297–1300 (1980).

1966 (1)

K. Naka, W. A. H. Rushton, “S-potentials from luminosity units in the retina of fish (cyprinidiae),”J. Physiol. 188, 587–599 (1966).

1946 (1)

D. Gabor, “Theory of communication,”J. Inst. Electr. Eng. 93, 429–457 (1946).

Adelson, E. H.

E. H. Adelson, J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A, 2, 284–299 (1985).
[Crossref] [PubMed]

E. H. Adelson, J. A. Movshon, “Phenomenal coherence of moving visual patterns,” Nature 300, 523–525 (1982).
[Crossref] [PubMed]

E. H. Adelson, Media-Technology Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 (personal communication).

J. A. Movshon, E. H. Adelson, M. S. Gizzi, W. T. Newsome, “The analysis of moving visual patterns,” in Experimental Brain Research Supplementum II: Pattern Recognition Mechanisms, C. Chagas, R. Gattass, C. Gross, eds. (Springer-Verlag, New York, 1985), pp. 117–151.

Ahumada, A. J.

A. B. Watson, A. J. Ahumada, “Model of human visual-motion sensing,” J. Opt. Soc. Am. A, 2, 322–342 (1985).
[Crossref] [PubMed]

A. B. Watson, A. J. Ahumada, “A look at motion in the frequency domain,” (NASA-Ames Research Center, Moffett Field, Calif., 1983).

Allman, J.

J. Allman (Division of Biology, California Institute of Technology, Pasadena, California 91125), in his talk at the 1986 Symposium on Computational Models in Human Vision at the Center for Visual Sciences, University of Rochester.

Barnard, S. T.

S. T. Barnard, W. B. Thomson, “Disparity analysis of images,”IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2, 333–340 (1980).
[Crossref]

Bergen, J. R.

E. H. Adelson, J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A, 2, 284–299 (1985).
[Crossref] [PubMed]

Burt, P.

P. Burt, “Fast algorithms for estimating local image properties,” Comput. Vision Graphics Image Process. 21, 368–382 (1983).
[Crossref]

Citron, M.

R. Emerson, M. Citron, W. Vaughn, S. Klein, “Substructure in directionally selective complex receptive fields of cat,” Invest. Ophthalmol. Vis. Sci. Suppl. 27, 16 (1986).

Daugman, J. G.

J. G. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters,” J. Opt. Soc. Am. A, 2, 1160–1169 (1985).
[Crossref] [PubMed]

J. G. Daugman, “Two-dimensional analysis of cortical receptive field profiles,” Vision Res. 20, 846–856 (1980).
[Crossref]

doCarmo, M. P.

M. P. doCarmo, Differential Geometry of Curves and Surfaces (Prentice-Hall Inc., Englewood Cliffs, N. J., 1976).

Emerson, R.

R. Emerson, M. Citron, W. Vaughn, S. Klein, “Substructure in directionally selective complex receptive fields of cat,” Invest. Ophthalmol. Vis. Sci. Suppl. 27, 16 (1986).

Fahle, M.

M. Fahle, T. Poggio, “Visual hyperacuity: spatio-temporal interpolation in human vision,” Proc. R. Soc. London Ser. B 213, 451–477 (1981).
[Crossref]

Felleman, D. J.

D. J. Felleman, J. H. Kaas. Receptive-field properties of neurons in middle temporal visual area (mt) of owl monkeys.J. Neurophysiol. 52, 488–513 (1984).
[PubMed]

Fleet, D. J.

D. J. Fleet, “The early processing of spatio-temporal visual information,” M.S. thesis (University of Toronto, Toronto, Canada, 1984; available as ).

D. J. Fleet, A. D. Jepson, “A cascaded filter approach to the construction of velocity selective mechanisms,” (University of Toronto, Toronto, Canada, 1984).

Gabor, D.

D. Gabor, “Theory of communication,”J. Inst. Electr. Eng. 93, 429–457 (1946).

Gill, P. E.

P. E. Gill, W. Murray, M. H. Wright, Practical Optimization (Academic, New York, 1981).

Gizzi, M. S.

W. T. Newsome, M. S. Gizzi, J. A. Movshon, “Spatial and temporal properties of neurons in macaque mt,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 106 (1983).

J. A. Movshon, E. H. Adelson, M. S. Gizzi, W. T. Newsome, “The analysis of moving visual patterns,” in Experimental Brain Research Supplementum II: Pattern Recognition Mechanisms, C. Chagas, R. Gattass, C. Gross, eds. (Springer-Verlag, New York, 1985), pp. 117–151.

Hildreth, E. C.

E. C. Hildreth, “Computations underlying the measurement of visual motion,” Artif. Intell. 23, 309–355 (1984).
[Crossref]

Horn, B. K. P.

B. K. P. Horn, B. G. Schunk, “Determining optical flow,” Artif. Intell. 17, 185–203 (1981).
[Crossref]

Hummel, R. A.

R. A. Hummel, S. W. Zucker, “On the foundations of relaxing labelling processes,”IEEE Trans. Pattern Analy. Mach. Intell. 5, 267–287 (1983).
[Crossref]

Jepson, A. D.

D. J. Fleet, A. D. Jepson, “A cascaded filter approach to the construction of velocity selective mechanisms,” (University of Toronto, Toronto, Canada, 1984).

Jones, J.

J. McLean-Palmer, J. Jones, L. Palmer, “New degrees of freedom in the structure of simple receptive fields,” Invest. Ophthalmol. Vis. Sci. Suppl. 26, 265 (1985).

Kaas, J. H.

D. J. Felleman, J. H. Kaas. Receptive-field properties of neurons in middle temporal visual area (mt) of owl monkeys.J. Neurophysiol. 52, 488–513 (1984).
[PubMed]

Kearney, J. K.

J. K. Kearney, W. B. Thompson, “An error analysis of gradient-based methods for optical flow estimation,” IEEE Trans. Pattern Analy. Mach. Intell. 19, 229–244 (1987).
[Crossref]

Klein, S.

R. Emerson, M. Citron, W. Vaughn, S. Klein, “Substructure in directionally selective complex receptive fields of cat,” Invest. Ophthalmol. Vis. Sci. Suppl. 27, 16 (1986).

Mandlebrot, B. B.

B. B. Mandlebrot, The Fractal Geometry of Nature (Freeman, New York, 1983).

Marcelja, S.

S. Marcelja, “Mathematical description of the response of simple cortical cells,” J. Opt. Soc. Am. A, 70, 297–1300 (1980).

McLean-Palmer, J.

J. McLean-Palmer, J. Jones, L. Palmer, “New degrees of freedom in the structure of simple receptive fields,” Invest. Ophthalmol. Vis. Sci. Suppl. 26, 265 (1985).

J. McLean-Palmer, Department of Neuroscience, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (personal communication).

Mikami, A.

A. Mikami, W. T. Newsome, R. H. Wurtz, “Mechanisms of direction and speed selectivity in the middle temporal visual area (mt) of the macaque monkey,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 107 (1983).

Movshon, J. A.

W. T. Newsome, M. S. Gizzi, J. A. Movshon, “Spatial and temporal properties of neurons in macaque mt,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 106 (1983).

E. H. Adelson, J. A. Movshon, “Phenomenal coherence of moving visual patterns,” Nature 300, 523–525 (1982).
[Crossref] [PubMed]

J. A. Movshon, Department of Psychology, New York University, New York, New York 10003 (personal communication).

J. A. Movshon, E. H. Adelson, M. S. Gizzi, W. T. Newsome, “The analysis of moving visual patterns,” in Experimental Brain Research Supplementum II: Pattern Recognition Mechanisms, C. Chagas, R. Gattass, C. Gross, eds. (Springer-Verlag, New York, 1985), pp. 117–151.

Murray, W.

P. E. Gill, W. Murray, M. H. Wright, Practical Optimization (Academic, New York, 1981).

Naka, K.

K. Naka, W. A. H. Rushton, “S-potentials from luminosity units in the retina of fish (cyprinidiae),”J. Physiol. 188, 587–599 (1966).

Newsome, W. T.

A. Mikami, W. T. Newsome, R. H. Wurtz, “Mechanisms of direction and speed selectivity in the middle temporal visual area (mt) of the macaque monkey,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 107 (1983).

W. T. Newsome, M. S. Gizzi, J. A. Movshon, “Spatial and temporal properties of neurons in macaque mt,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 106 (1983).

J. A. Movshon, E. H. Adelson, M. S. Gizzi, W. T. Newsome, “The analysis of moving visual patterns,” in Experimental Brain Research Supplementum II: Pattern Recognition Mechanisms, C. Chagas, R. Gattass, C. Gross, eds. (Springer-Verlag, New York, 1985), pp. 117–151.

Palmer, L.

J. McLean-Palmer, J. Jones, L. Palmer, “New degrees of freedom in the structure of simple receptive fields,” Invest. Ophthalmol. Vis. Sci. Suppl. 26, 265 (1985).

Poggio, T.

M. Fahle, T. Poggio, “Visual hyperacuity: spatio-temporal interpolation in human vision,” Proc. R. Soc. London Ser. B 213, 451–477 (1981).
[Crossref]

Rushton, W. A. H.

K. Naka, W. A. H. Rushton, “S-potentials from luminosity units in the retina of fish (cyprinidiae),”J. Physiol. 188, 587–599 (1966).

Schunk, B. G.

B. K. P. Horn, B. G. Schunk, “Determining optical flow,” Artif. Intell. 17, 185–203 (1981).
[Crossref]

Sperling, G.

J. P. H. van Santen, G. Sperling, “Elaborated Reichardt detectors,” J. Opt. Soc. Am. A, 2, 300–321 (1985).
[Crossref] [PubMed]

Terzopoulos, D.

D. Terzopoulos, “Regularization of inverse visual problems involving discontinuities,”IEEE Trans. Pattern Anal. Mach. Intell. 8, 413–424 (1986).
[Crossref]

Thompson, W. B.

J. K. Kearney, W. B. Thompson, “An error analysis of gradient-based methods for optical flow estimation,” IEEE Trans. Pattern Analy. Mach. Intell. 19, 229–244 (1987).
[Crossref]

Thomson, W. B.

S. T. Barnard, W. B. Thomson, “Disparity analysis of images,”IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2, 333–340 (1980).
[Crossref]

van Santen, J. P. H.

J. P. H. van Santen, G. Sperling, “Elaborated Reichardt detectors,” J. Opt. Soc. Am. A, 2, 300–321 (1985).
[Crossref] [PubMed]

Vaughn, W.

R. Emerson, M. Citron, W. Vaughn, S. Klein, “Substructure in directionally selective complex receptive fields of cat,” Invest. Ophthalmol. Vis. Sci. Suppl. 27, 16 (1986).

Watson, A. B.

A. B. Watson, A. J. Ahumada, “Model of human visual-motion sensing,” J. Opt. Soc. Am. A, 2, 322–342 (1985).
[Crossref] [PubMed]

A. B. Watson, A. J. Ahumada, “A look at motion in the frequency domain,” (NASA-Ames Research Center, Moffett Field, Calif., 1983).

Wright, M. H.

P. E. Gill, W. Murray, M. H. Wright, Practical Optimization (Academic, New York, 1981).

Wurtz, R. H.

A. Mikami, W. T. Newsome, R. H. Wurtz, “Mechanisms of direction and speed selectivity in the middle temporal visual area (mt) of the macaque monkey,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 107 (1983).

Zucker, S. W.

R. A. Hummel, S. W. Zucker, “On the foundations of relaxing labelling processes,”IEEE Trans. Pattern Analy. Mach. Intell. 5, 267–287 (1983).
[Crossref]

Artif. Intell. (2)

B. K. P. Horn, B. G. Schunk, “Determining optical flow,” Artif. Intell. 17, 185–203 (1981).
[Crossref]

E. C. Hildreth, “Computations underlying the measurement of visual motion,” Artif. Intell. 23, 309–355 (1984).
[Crossref]

Comput. Vision Graphics Image Process. (1)

P. Burt, “Fast algorithms for estimating local image properties,” Comput. Vision Graphics Image Process. 21, 368–382 (1983).
[Crossref]

IEEE Trans. Pattern Anal. Mach. Intell. (2)

D. Terzopoulos, “Regularization of inverse visual problems involving discontinuities,”IEEE Trans. Pattern Anal. Mach. Intell. 8, 413–424 (1986).
[Crossref]

S. T. Barnard, W. B. Thomson, “Disparity analysis of images,”IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2, 333–340 (1980).
[Crossref]

IEEE Trans. Pattern Analy. Mach. Intell. (2)

R. A. Hummel, S. W. Zucker, “On the foundations of relaxing labelling processes,”IEEE Trans. Pattern Analy. Mach. Intell. 5, 267–287 (1983).
[Crossref]

J. K. Kearney, W. B. Thompson, “An error analysis of gradient-based methods for optical flow estimation,” IEEE Trans. Pattern Analy. Mach. Intell. 19, 229–244 (1987).
[Crossref]

Invest. Ophthalmol. Vis. Sci. Suppl. (4)

W. T. Newsome, M. S. Gizzi, J. A. Movshon, “Spatial and temporal properties of neurons in macaque mt,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 106 (1983).

J. McLean-Palmer, J. Jones, L. Palmer, “New degrees of freedom in the structure of simple receptive fields,” Invest. Ophthalmol. Vis. Sci. Suppl. 26, 265 (1985).

R. Emerson, M. Citron, W. Vaughn, S. Klein, “Substructure in directionally selective complex receptive fields of cat,” Invest. Ophthalmol. Vis. Sci. Suppl. 27, 16 (1986).

A. Mikami, W. T. Newsome, R. H. Wurtz, “Mechanisms of direction and speed selectivity in the middle temporal visual area (mt) of the macaque monkey,” Invest. Ophthalmol. Vis. Sci. Suppl. 24, 107 (1983).

J. Inst. Electr. Eng. (1)

D. Gabor, “Theory of communication,”J. Inst. Electr. Eng. 93, 429–457 (1946).

J. Neurophysiol. (1)

D. J. Felleman, J. H. Kaas. Receptive-field properties of neurons in middle temporal visual area (mt) of owl monkeys.J. Neurophysiol. 52, 488–513 (1984).
[PubMed]

J. Opt. Soc. Am. A (5)

S. Marcelja, “Mathematical description of the response of simple cortical cells,” J. Opt. Soc. Am. A, 70, 297–1300 (1980).

A. B. Watson, A. J. Ahumada, “Model of human visual-motion sensing,” J. Opt. Soc. Am. A, 2, 322–342 (1985).
[Crossref] [PubMed]

E. H. Adelson, J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Am. A, 2, 284–299 (1985).
[Crossref] [PubMed]

J. P. H. van Santen, G. Sperling, “Elaborated Reichardt detectors,” J. Opt. Soc. Am. A, 2, 300–321 (1985).
[Crossref] [PubMed]

J. G. Daugman, “Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters,” J. Opt. Soc. Am. A, 2, 1160–1169 (1985).
[Crossref] [PubMed]

J. Physiol. (1)

K. Naka, W. A. H. Rushton, “S-potentials from luminosity units in the retina of fish (cyprinidiae),”J. Physiol. 188, 587–599 (1966).

Nature (1)

E. H. Adelson, J. A. Movshon, “Phenomenal coherence of moving visual patterns,” Nature 300, 523–525 (1982).
[Crossref] [PubMed]

Proc. R. Soc. London Ser. B (1)

M. Fahle, T. Poggio, “Visual hyperacuity: spatio-temporal interpolation in human vision,” Proc. R. Soc. London Ser. B 213, 451–477 (1981).
[Crossref]

Vision Res. (1)

J. G. Daugman, “Two-dimensional analysis of cortical receptive field profiles,” Vision Res. 20, 846–856 (1980).
[Crossref]

Other (28)

In their earlier paper Watson and Ahumada5 also employ Gabor filters but not Gabor energy. In their later work6 they abandon Gabor filters. Adelson and Bergen7 do not use Gabor filters, but they do compute motion energy.

Filters higher in the pyramid achieve their peak response for patterns with lower spatial frequency but with the same temporal frequency. Thus the lower-frequency filters have their greatest outputs for patterns moving at greater velocities. Psychophysical evidence (see Watson and Ahumada6 for references) suggests that human motion channels exhibit such a relationship between spatial frequency and velocity. This makes sense from a computational viewpoint since patterns containing only high spatial frequencies may move at only low velocities, whereas patterns containing only lower spatial frequencies may move at greater velocities (see the discussion in Subsection 2.A on sampling and temporal aliasing).

P. E. Gill, W. Murray, M. H. Wright, Practical Optimization (Academic, New York, 1981).

Equation (10) subtracts f(u, v) from a contrast-dependent value, ∑(mi)2. This was an arbitrary choice, and a number of other contrast-dependent values might be substituted. Further investigation may indicate which, if any, of these measures most closely models the physiology.

D. J. Fleet, “The early processing of spatio-temporal visual information,” M.S. thesis (University of Toronto, Toronto, Canada, 1984; available as ).

D. J. Fleet, A. D. Jepson, “A cascaded filter approach to the construction of velocity selective mechanisms,” (University of Toronto, Toronto, Canada, 1984).

An error analysis of gradient-based methods3 confirms that a major problem with the approach is that large errors are made where the image is highly textured, where there is the greatest amount of motion information!

A. B. Watson, A. J. Ahumada, “A look at motion in the frequency domain,” (NASA-Ames Research Center, Moffett Field, Calif., 1983).

E. H. Adelson, Media-Technology Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 (personal communication).

An important psychophysical experiment is to measure the perceived velocity of plaids with various relative angles, contrasts, and spatial frequencies.

M. P. doCarmo, Differential Geometry of Curves and Surfaces (Prentice-Hall Inc., Englewood Cliffs, N. J., 1976).

The normal-flow direction is perpendicular to the direction of minimum curvature, and the normal-flow speed is the dot product of any position along the ridge with the normal-flow direction.

Different subjects were used to collect the data in Figs. 13(a) and 14(a). Thus the data in these two plots are inconsistent with each other, requiring that different curvature thresholds be used to generate Figs. 13(b) and 14(b). The eventual goal is to simulate all the data for one subject with one choice of parameters.

Complexity is defined as the order of magnitude, O( ), of the number of operations required for a computation.

If the stimulus has uncorrelated random phase, then the dc problem may be alleviated by using only sine-phase filters: a phase-independent motion energy can be computed from sine-phase filters alone by averaging their squared outputs within appropriately sized windows.

Brownian fractal functions (see Mandelbrot26 for definitions and references) are characterized by similarity across scales and have an expected power spectrum that falls off as P(ω) = ω−β for some constant β. Fractals may be used to generate natural-looking textures.

B. B. Mandlebrot, The Fractal Geometry of Nature (Freeman, New York, 1983).

Special Interest Group in Computer Graphics, Institute of Electrical and Electronics Engineers.

Since the image velocities in the Yosemite fly-through image sequence are as high as 5 pixels per frame, we must use three levels from the pyramid. In future research, I hope to develop a rule for automatically combining estimates from the different levels. For now, I simply pick the level that is most appropriate for a given image region: the level-zero estimate is chosen if the actual velocity is between 0 and 1.25 pixels per frame, the level-one estimate is chosen if it is between 1.25 and 2.5 pixels per frame, and the level-two estimate is chosen if it is between 2.5 and 5.0 pixels per frame. In the Yosemite fly-through image sequence, there are regions of low contrast adjacent to high-contrast regions (e.g., the face of El Capitan and the cloud region are of low contrast). This exacerbates the smoothing problem discussed in Subsection 4.B. For this image sequence, contrast was first equalized by computing the zero crossings (see Hildreth11 for references) of each image. The model was then applied to the resulting zero-crossing image sequence.

Area MT is also known as V5.

The maximum of Eq. (10) can be located to any precision by using a finer or coarser grid. Also, the grid need be only of limited extent, since bandpass filtering limits the range of possible velocities (as discussed in Subsection 2.A).

Preliminary investigation indicates that absolute value may be substituted for squaring.

The model does not always recover the correct pattern-flow velocity for sine-grating plaids; e.g., for plaids made up of gratings with equal contrasts, equal spatial frequencies, and equal speeds, the model’s estimates are in error (correct direction of motion but wrong speed) when the spatial frequencies of the gratings are not equal to the spatial-frequency tuning of the filters or when the angle between the gratings differs from 90 deg. The model might be more robust with respect to these factors if it utilized more motion-energy filters tuned to a greater number of orientations and spatial frequencies.

J. A. Movshon, Department of Psychology, New York University, New York, New York 10003 (personal communication).

D. E. Rummelhart, J. L. McClelland, eds., Parallel Distributed Processing: Explorations in the Microstructure of Cognition (MIT, Cambridge, Mass., 1986).

J. McLean-Palmer, Department of Neuroscience, University of Pennsylvania, Philadelphia, Pennsylvania 19104 (personal communication).

J. A. Movshon, E. H. Adelson, M. S. Gizzi, W. T. Newsome, “The analysis of moving visual patterns,” in Experimental Brain Research Supplementum II: Pattern Recognition Mechanisms, C. Chagas, R. Gattass, C. Gross, eds. (Springer-Verlag, New York, 1985), pp. 117–151.

J. Allman (Division of Biology, California Institute of Technology, Pasadena, California 91125), in his talk at the 1986 Symposium on Computational Models in Human Vision at the Center for Visual Sciences, University of Rochester.

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (16)

Fig. 1
Fig. 1

Spatiotemporal orientation (redrawn from Ref. 7). (a) A vertical bar translating to the right. (b) The space–time cube for a vertical bar moving to the right. (c) An xt slice through the space–time cube. The orientation of the edges in the xt slice is the horizontal component of the velocity. Motion is like orientation in space–time, and spatiotemporally oriented filters can be used to detect it.

Fig. 2
Fig. 2

Perspective views of a two-dimensional sine-phase Gabor function and its power spectrum.

Fig. 3
Fig. 3

The power spectra of the 12 motion-sensitive Gabor-energy filters are positioned in pairs on a cylinder in the spatiotemporal-frequency domain (temporal-frequency axis pointing up). Each symmetrically positioned pair of ellipsoids represents the power spectrum of one filter. The plane represents the power spectrum of a translating texture. A filter will give a large output only for a stimulus that has much power near the centers of its corresponding ellipsoids, and it will give a relatively small output only for a stimulus that has no power near the centers of its ellipsoids. Each velocity corresponds to a different tilt of the plane and thus to a different distribution of outputs for the collection of motion-energy mechanisms.

Fig. 4
Fig. 4

A problem analogous to that of extracting velocity: estimating the slope of a line that passes through the origin by viewing it with a finite number of circular windows. The upper window has many points within it, while the lower one has very few; in other words, the line must pass close to the center of the upper window while staying far from the center of the lower one.

Fig. 5
Fig. 5

(a) Fourteen natural textures (the two texture squares at the upper left are the same, and so are the two at the upper right). Each texture square was used to generate motion sequences translating 1/2 pixel per frame in each of eight directions. The velocities extracted by the model are accurate to within 10%. (b) Example flow field extracted from a motion sequence generated from the straw texture in the upper-left-hand corner of (a). The actual motion was (−0.5, 0.0). The mean of the extracted velocities is (−0.473, −0.04), and the standard deviation for both the horizontal and vertical components is 0.01.

Fig. 6
Fig. 6

A rotating random-dot sphere. (a) A frame from the motion sequence. (b) The actual flow field. (c) Flow field extracted by the model. (d) Difference between (b) and (c).

Fig. 7
Fig. 7

(a) One frame of an image sequence flying through Yosemite valley. (b) The actual flow field. (c) Flow field extracted by the model. (d) Difference between (b) and (c).

Fig. 8
Fig. 8

Distribution of outputs of velocity-tuned units for a moving random-dot field moving leftward and downward 1 pixel per frame. Each point in the image corresponds to a different velocity; for example, v = (0, 0) is at the center of the image, v = (2, 2) at the top right-hand corner. The maximum in the distribution of outputs corresponds to the velocity extracted by the model. Units in the brighter regions have positive outputs, and units in the darker regions have negative (inhibited) outputs.

Fig. 9
Fig. 9

The perceived motion of two moving gratings is the intersection of the perpendiculars to the two velocity vectors. (a) A single moving grating: the diagonal line indicates the locus of velocities compatible with the motion of the grating. (b), (c) Plaids composed of two moving gratings. The lines give the possible motions of each grating alone. Their intersection is the only shared motion and corresponds to what is seen. (Redrawn from Ref. 32.)

Fig. 10
Fig. 10

(a) Flow field extracted by the model for a plaid pattern made up of a sine grating moving leftward 1 pixel per frame plus a sine grating moving downward 1 pixel per frame. The combined motion extracted by the model is 1 pixel leftward and 1 pixel downward in each frame, (b) Flow field for a plaid pattern made up of a sine grating moving leftward 1 pixel per frame plus a sine grating moving downward and leftward 1/4 pixel each frame. The counter-intuitive combined motion is leftward 1 pixel per frame and upward 1/2 pixel per frame as shown in the flow field extracted by the model. The spatial frequency of the gratings for both (a) and (b) was 0.25 cycle pixel−1.

Fig. 11
Fig. 11

Distribution of outputs of velocity-tuned units for sine-grating plaids made up of orthogonal gratings. The gratings moved 1 pixel frame−1 leftward and downward, and their spatial frequency was 0.25 cycle pixel−1. (a) The two component gratings had the same contrast. The maximum in the distribution of outputs corresponds to the velocity extracted by the model. (b) One grating had twice the contrast of the other grating. (c) One grating had four times the contrast of the other grating. (d) One grating had zero contrast; the aperture problem is evident, as there is a ridge of maxima. Each velocity-tuned unit along this ridge has the same output (to within 1 part in 100,000).

Fig. 12
Fig. 12

The influence of contrast on the coherence of sine-grating plaids. (a) One grating had a fixed contrast of 0.3, while the other was of variable contrast. The two gratings moved at an angle of 1350, both had a spatial frequency of 1.6 cycles deg−1, and both moved at 3 deg sec−1. The plot shows the probability that the observer judged the two gratings to be coherent. The dotted lines indicate the test-grating contrast needed to attain threshold (50% probability) coherence. Subject, EHA. (Replotted from Ref. 32.) (b) One grating had a fixed contrast of 0.3, while the other was of variable contrast. The two gratings moved at an angle of 120°, both had a spatial frequency of 0.25 cycle pixel−1, and their speeds were chosen so that the coherent plaid moved at a speed of 2/3 pixel frame−1. The plot shows the curvature measure as the contrast of the test grating was varied. The dotted lines indicate the test-grating contrast needed to attain threshold (0.006 curvature) coherence.

Fig. 13
Fig. 13

The influence of spatial frequency on the coherence of sine-grating plaids. (a) One grating had a fixed contrast of 0.3, while the other was of variable contrast. The two gratings moved at an angle of 135 deg, and both moved at 3 deg sec−1. The test grating was of variable contrast and variable spatial frequency. The plot shows the threshold contrast for coherence for a range of test spatial frequencies when the first grating was fixed at 2.2 cycles deg−1. Subject, PA. (Replotted from Ref. 32.) (b) One grating had a fixed contrast of 0.3 and a fixed spatial frequency of 0.25 cycle pixel−1, while the other was of variable contrast and spatial frequency. The two gratings moved at an angle of 120 deg, and their speeds were chosen so that the coherent plaid moved at a speed of 2/3 pixel per frame. A fixed value was chosen as the threshold value for the curvature measure. This value was chosen in order to match the psychophysical data in (a) for the case when the fixed grating and the test grating were of equal spatial frequency. For each test grating, the plot shows the contrast needed at that spatial frequency for the curvature measure to attain that value.

Fig. 14
Fig. 14

The influence of angle on the coherence of sine-grating plaids. (a) One grating had a fixed contrast of 0.3, while the other was of variable contrast. The spatial frequency of one grating was fixed at 2.4 cycles deg−1, and that of the second grating was fixed at 1.2 cycles deg−1. As the angle between the two gratings varied, their speeds were chosen so that the coherent plaid moved at a fixed speed of 7.5 deg sec−1. The plot shows the threshold contrast for coherence for a range of angles. Subject, EHA. (b) One grating had a fixed contrast of 0.3, and both had a fixed spatial frequency of 0.25 cycle pixel−1. The speed of the gratings was chosen so that the coherent plaid moved at a fixed speed of 2/3 pixel frame. A fixed value was chosen as the threshold value for the curvature measure. [This value was chosen in order to match the psychophysical data in (a) for an angle of 120 deg.] For each angle, the plot shows the test-grating contrast needed for the curvature measure to attain that value.

Fig. 15
Fig. 15

Comparing the model with physiology. The model’s computations are simply a series of linear steps (weighted sums) alternating with point nonlinearities.

Fig. 16
Fig. 16

Direction tuning of component- and pattern-flow model units. (a) Response of a typical component-flow unit as a function of direction of motion for moving gratings that were matched to the unit’s preferred speed and spatial frequency. (b) Direction tuning of the same component-flow unit for sine-grating plaids; the tuning curve has two lobes, indicating that the unit responds when either of the two component sine-gratings move at the preferred velocity, similar to component MT cells. (c) Direction tuning of a pattern-flow unit for gratings. (d) Direction tuning of the same pattern-flow unit for plaids; the single lobe indicates that the unit responds to the combined pattern motion regardless of the motion of the component gratings, similar to pattern MT cells.

Equations (13)

Equations on this page are rendered with MathJax. Learn more.

v = ω t / ω x .
ω t = u ω x + v ω y ,
g ( t ) = 1 2 π σ exp ( - t 2 2 σ 2 ) sin ( 2 π ω t ) .
g ( x , y , t ) = 1 2 π 3 / 2 σ x σ y σ t exp [ - ( x 2 2 σ x 2 + y 2 2 σ y 2 + t 2 2 σ t 2 ) ] × sin ( 2 π ω x 0 x + 2 π ω y 0 y + 2 π ω t 0 t ) ,
G ( ω x , ω y , ω t ) = ( ¼ ) exp { - 4 π 2 [ σ x 2 ( ω x - ω x 0 ) 2 + σ y 2 ( ω y - ω y 0 ) 2 + σ t 2 ( ω t - ω t 0 ) 2 ] } + ( ¼ ) exp { - 4 π 2 [ σ x 2 ( ω x + ω x 0 ) 2 + σ y 2 ( ω y + ω y 0 ) 2 + σ t 2 ( ω t + ω t 0 ) 2 ] } .
- - - f ( x , y , t ) 2 d x d y d t = 1 8 π 3 - - - F ( ω x , ω y , ω t ) 2 d ω x d ω y d ω t = 1 8 π 3 - - - P ( ω x , ω y , ω t ) d ω x d ω y d ω t ,
( u , v , k ) = ( k 2 / 2 ) - - exp { - 4 π 2 [ σ x 2 ( ω x - ω x 0 ) 2 + σ y 2 ( ω y - ω y 0 ) 2 + σ t 2 ( u ω x + v ω y - ω t 0 ) 2 } d ω x d ω y .
( u , v , k ) = H 4 ( u , v , k ) exp [ - 4 π 2 σ x 2 σ y 2 σ t 2 H 1 ( u , v ) ] , H 1 ( u , v ) = H 2 ( u , v ) H 3 ( u , v ) , H 2 ( u , v ) = ( u ω x 0 + v ω y 0 + ω t 0 ) 2 , H 3 ( u , v ) = ( u σ x σ t ) 2 + ( v σ y σ t ) 2 + ( σ x σ y ) 2 , H 4 ( u , v , k ) = k 2 8 π [ H 3 ( u , v ) ] 1 / 2 .
f ( u , v ) = i = 1 12 [ m ¯ i i ( u , v ) ¯ i ( u , v ) - m i ] 2 .
F ( u , v ) = i = 1 12 ( m i ) 2 - f ( u , v ) = i = 1 12 ( m i ) 2 - [ m ¯ i i ( u , v ) ¯ i ( u , v ) - m i ] 2 ,
F j = i = 1 12 ( m i ) 2 - ( m ¯ i w i j w ¯ i j - m i ) 2 ,
sin ( ω t 0 + t ω x 0 x + ω y 0 y ) = sin ( ω t 0 t ) cos ( ω x 0 x ) cos ( ω y 0 y ) - sin ( ω t 0 t ) sin ( ω x 0 x ) sin ( ω y 0 y ) + cos ( ω t 0 t ) sin ( ω x 0 x ) cos ( ω y 0 y ) + cos ( ω t 0 t ) cos ( ω x 0 x ) sin ( ω y 0 y ) ,
cos ( ω t 0 + ω x 0 x + ω y 0 y ) = cos ( ω t 0 t ) cos ( ω x 0 x ) cos ( ω y 0 y ) - cos ( ω t 0 t ) sin ( ω x 0 x ) sin ( ω y 0 y ) - sin ( ω t 0 t ) sin ( ω x 0 x ) cos ( ω y 0 y ) - sin ( ω t 0 t ) cos ( ω x 0 x ) sin ( ω y 0 y ) .

Metrics