This note shows that the Riemann-space interpretation of motion vision developed by Barth and Watson is neither necessary for their results, nor sufficient to handle an intrinsic coordinate problem. Recasting the Barth-Watson framework as a classical velocity-solver (as in computer vision) solves these problems.
© Optical Society of America
The model of spatiotemporal vision by Barth and Watson  casts the spacetime image f(x,y,t) into a hypersurface in (x,y,t,f), on which is defined a Riemann curvature tensor R. There is a problem with this interpretation: The native units of x, y, t, and f are quite disparate (x and y having length units, t having time units, and image intensity f having luminance units such as foot-lamberts), and the components of R depend on how these units are chosen. Furthermore, a Riemann tensor implies a metric space (complete with distances and angles), and justifying this structure would require an intrinsic connection among the units of x, y, t, and f. Such a connection was made between x and t in Relativity theory through the vacuum speed of light c (so that ct and x have the same units). However, connecting the units of x, y, t, and f in such a fundamental way seems to pose a deeper challenge, rendering the Riemann constructs of distances and angles more difficult to justify.
Here is a solution to the problem that involves considerable reinterpretation, but no loss to the basic results in .
First, it is instructive to note the extent to which Barth and Watson use the elements of R themselves, as opposed to ratios of R components or one factor in a particular R component. The formalism in the offset equations (1–7) clearly uses ratios of R elements. In motion detection,  uses components of R as decision points according as these components are zero or nonzero. These decisions could be replaced by ratios of R elements, with the proviso that the denominator in such a ratio is nonzero. Finally, in the simulations of motion percepts, the authors use low-pass-filtered versions of the numerator of the fraction defining an R component, and not the R component itself. A similar numerator, divorced from its denominator, is also the 3D curvature operator referred to in the motion-detection section (see also ). From these observations, I conclude that there is nothing essential about the R components themselves in the formalism, for these components are dissected or ratioed as needed for the motion-vision model.
Dissecting the R component into numerator and denominator (and using only the numerator) may have been a sensible thing to do, given the units problem I have noted. The denominators of the R components (Eq. 2 in ) exacerbate the coordinate problem because they are not homogeneous to any order in the coordinates. However, the denominators are all the same, so they cancel in ratios of R elements. Furthermore, ratios of R elements have no dependence on the units of image intensity, and all the numerators are homogeneous functions of each of the coordinates (x,y,t, and f). Therefore, making the theory out of ratios of R components helps solve the coordinate problem.
Now, the numerators of the R components are all 2-by-2 determinants, and some ratios of these determinants comprise a Cramer’s-rule solution to two linear equations. The linear system in question is clear from the Appendix of . For an (x,y) image in rigid motion with constant velocity (a,b), there is a fundamental identity:
[Barth and Watson define F(x,y,t)=a fx+b fy+ft and set its partial derivatives in x, y, t equal to zero, which is correct, but the original function F is also zero.] Now, one can write the partial derivatives of Eq. 1 above:
Any pair of these equations can be solved for a and b by Cramer’s rule. For example, solving the first two equations gives
The expressions for a and b above are, respectively, R3221/R2121 and -R3121/R2121 in Eq. 2 of . Furthermore, Eqs. 4 and 5 of  enumerate the solutions of all the pairs of equations in Eq. 2 above.
It follows that all the quantities in  are ratios of determinants of the Cramer’s-rule implementation of a velocity-solver—of which many such are available in computer vision. This interpretation is more parsimonious than the Riemann space, and also lays to rest the question of the coordinate units: The image-intensity units cancel, and velocity components a and b carry the selected units for x/t without further bookkeeping.
This approach solves the problem of coordinate units, but discards the Riemann-space interpretation. No harm is done to the results and substantive theory in , except perhaps to make worrisome the assurance of nonzero denominators in R-component ratios that are used as switches for the motion-detection algorithm. In these cases, the numerators (2×2 determinants) of the R-components can be used instead of the R-components or their ratios. Thus the R components never seem to enter distinctly and testably into the motion-vision model in .
On the basis of this observation, it may be profitable to revisit the authors’ theorem  that “a compact surface is completely determined by its points with non-zero Gaussian curvature.” There may be a generalization of this theorem to spaces in which curvatures cannot be defined, but in which all the determinants survive for the computations I have noted here. Also, it is interesting to ask why the ratios of curvature elements are the same as the ratios of velocity-solving determinants. An answer may lie in expressions of curvature in terms of determinants of the first and second fundamental forms of Gauss.
1. E. Barth and A. B. Watson, “A geometric framework for nonlinear visual coding,” Opt. Express 7, 155–165 (2000), http://www.opticsexpress.org/oearchive/source/23045.htm [CrossRef] [PubMed]
2. C. Zetsche and E. Barth, “Direct detection of flow discontinuities by 3D-curvature operators,” Pattern Recognition Lett. 12, 771–779 (1991). [CrossRef]
3. C. Mota and E. Barth, “On the uniqueness of curvature features,” Proc. Artificial Intell. (Dynamische Perzeption). Köln: Infix Verlag , v. 9, pp. 175–178 (2000).