The third dimension in the reproduction of real scenes in three-dimensional displays is commonly subject to scale changes. The geometry of the situation is laid out, permitting the depth rendition of displays to be characterized and subjected to empirical examination. Psychophysical experiments are presented showing, even when geometrical deformations have been factored out, specific deviations from veridicality in observers’ depth reports for stereograms of simple static patterns devoid of secondary cues.
© 2011 Optical Society of America
When a scene is photographed by a three-dimensional (3D) camera and the stereogram viewed on a screen, the apparent depth may differ from that in the actual target. The analysis of the situation has two phases: how the process of reproduction changes the geometry of the optical stimulus presented to the viewer, and how observers judge the third visual dimension in such displays. After laying out the principles of the former, some aspects of the latter are probed.
1A. Geometry of Visual Stimulus Reconstruction
Because the visual process operates on the image formed on the retina, and this is best specified in terms of the angle subtended by the components of the target configuration at the eye, it is advantageous to transform target locations from coordinates in the Cartesian x, y, z system to those more appropriate for stereoscopic vision: the bipolar angles of elevation, latitude, and parallax, θ, φ, and γ, based on a, the fixed lateral separation of the twin optics (Fig. 1). Starting with the horizontal plane (), which contains this base line, one establishes the midsagittal plane (), along which the target distance z is measured, as normal to it and bisecting the base line. Finally, planes orthogonal to these are called frontoparallel, or frontal in short, within which lateral distances x and vertical distances y are measured. Details of coordinate transformations are given elsewhere , but for large z and targets in, or close to, the horizontal and midsagittal planes, first- order approximation allows substitution of angles in radians for their tangents, and transformation from to takes the form
If the scene is recorded by a camera pair separated by the distance a with optical axes converged on a plane at a distance z along the z axis, the point will be imaged in the center of the right and left receiving planes with coordinates and within a system based on angles , and , , respectively, at the principal points of the optics (Fig. 1). These differ only by a multiplicative constant from the equivalent values in the object space of the recording devices.
Now take a second object point close to the first one, a small distance above it and shift the first point forward along the z axis by a distance , , where . Rays from the latter will generate an image-sided intercept along the x axis whose magnitude in the φ coordinates, distributed between the right and left sides, is . Hence, the points in the two image planes have coordinates
Transfer from the recording device’s image planes to the viewing plane involves merely multiplicative constants all around. For the particular case , i.e., two points just as far apart in the frontoparallel plane as one is in front of the other, the ratio of the distances marked out in the image planes of the binocular difference in the φ coordinates (which correspond to their parallax difference) to their separation is . Because these quantities are associated with the recording in the camera’s object space, they will be identified as , , and . The ratio is independent of any magnification factor that may have been introduced in the process of reconstruction of the right and left images in the stereo display.
This leads to the immediate question: the images of an object element for which that had been acquired with parameters , are now viewed on a display screen at a distance from an observer with interocular separation ; what is the depth rendition? Specifically, will the observer’s retinal images on viewing this screen reproduction have the spatial properties that correspond to two points in the observer’s physical object space whose separation in depth and in the frontal plane also are equal?
That particular condition is satisfied when and are equal and implies that the relation between extents in the frontal plane and in the third dimension that were found in the object has been preserved in this image reconstruction. Its physical realization is that the angle made by the observer’s interocular distance at the viewing screen is the same as that made by the separation of the optics of the recording device at the target being photographed. If depth is expanded relative to the frontal view, if , it is compressed or foreshortened.
Figure 2 illustrates the relationship between a target element C a small distance in front of the fixation plane and the representations (projections) in the fixation plane of the right and left eyes’ images of it. One can think of it as a point on the front surface of a small cubical element, whose back surface is situated in the fixation plane. As this target recedes from the camera or eye, the angle subtended by its width diminishes inversely with distance, and the angular measure of its depth as the inverse square. Hence, the depth rendition R, which is the ratio of width to depth, decreases as the inverse of the target distance. On the other hand, for an observer viewing a stereo image of a cube with the separation of the right and left eyes’ markers ( and in Fig. 2) fixed by virtue of the initial recording, increasing the viewing distance increases the depth rendition, as is readily visualized from the geometry governing the position of C with respect to the fixation plane in Fig. 2: for a constant and t, increases with . That, in the recording situation, R increases with target distance while in the viewing situation it decreases with screen distance is an apparent paradox that is entirely a consequence of the geometry and does not require explanations involving observer responses.
The condition has been called homeomorphic by von Rohr . In it, the optical stimulus in the observer’s view of the stereoimages of a small cube is that of an actual cube, having the same structure and proportions but not necessarily the same actual size. Still, in some situations relative size does matter, even in the face of identical proportions, a point obviously appreciated by von Rohr, because in his original anal ysis, he distinguishes homeomorphic view from what he calls tautomorphic. The difference between them is that a tautomorphic view presents the observer with the display whose geometrical properties are identical to those of the original scene, whereas in a homeomorphic view they are merely in the same proportions. Since a given visual subtense varies inversely with distance, whereas the disparity between its components varies inversely as the square of its distance, and the perceptual estimate of an object’s distance from the observer enters into depth judgment, the distinction matters. Hence, for many purposes, specification of the depth rendition R of a display needs to be supplemented with size information.
1B. Viewing Stereo Images
The discussion so far has dealt with purely geometrical optical stimulus aspects of viewing and reproduction with stereoscopic devices. Its translation into retinal-image positions is direct and unproblematic. What has been said so far, therefore, is also representative of the spatial relationship of the signals conveyed into the viewer’s central nervous system, though not of the resultant percept. The distinction between the two, i.e., the realm of binocular visual stimuli, with which the discussion has been concerned so far, and the observer’s subjective visual space, is mandatory. It has occasionally been conjectured that the objective depth steps match measure for measure those in the visual experience [3, 4, 5], but this is far from the case [6, 7, 8].
The geometry has here been reduced to the essential kernel of its metric, namely the ratio of the extents in the reconstruction of equal small steps in the target’s frontal plane and the third dimension. Perfect rendition would make this ratio unity: in the observer’s view of the reconstituted image, the depth/width ratio is the same as if the target itself were at the location of the screen.
Conservation of this angular disparity/width ratio, the condition for preserving the stimulus integrity of a configuration for stereoscopic viewing purposes, may not be obeyed by the neural mechanisms. For a ratio to remain invariant, its two constituents have to change at the same rate. The neural processing for disparity, which is distributed between the two eyes, differs from that for target extent, available in monocular view. It is a matter of empirical investigation what happens perceptually when there is a proportionate scale change, but this faces a serious problem. In geometry, including application in the bipolar coordinate system and the transfer from recording to viewing situations, the three dimensions of space are treated as equal. But, in the human visual system, structurally and functionally, there are categorical differences between them—two belong to one submodality (representation of the two dimensions of visual angle in retinotopy) and the third to the submodality of representation of binocular parallax between the two eyes. The divide between the two is even more pronounced perceptually. Whereas few conceptual or practical impediments are expected when a horizontal interval provides a comparison for, say, a vertical one in judgments of whether a configuration in a frontoparallel plane appears square, it is not obvious whether this would extend also to the depth judgment of whether a 3D configuration appears a cube.
To establish a nexus between the optical stimulus and psychophysical measurements of 3D reconstructed stereoscopic views, experiments are reported that are designed, once the stimulus geometry of the representation has been defined, to ascertain how well the observer’s depth percept conforms to it. They differ from the several previous explorations of this problem by being restricted to sparsely demarcated small signals close to the fixation point, more or less at arm’s length. They are therefore representative of displays on personal computers and handheld devices, but some control experiments point to their more general validity.
2. EXPERIMENTAL INVESTIGATION
Viewing binocular stimulus panels on a computer monitor, observers made judgments of the apparent disposition in the depth dimension of components of simple patterns composed of small dots or white lines on a black background. The actual displays are described separately for each of the experiments in Subsection 2B. Screens were viewed at distances where each pixel subtended , , or a fixed fraction or multiple thereof. Panels for the right and left eyes were shown side by side and superimposed by a mirror stereoscope and, in the case of some observers, free fusion. Lines were high-contrast white against a black background, wide, and shown in the center of a rectangular frame .
Utilizing the psychophysical procedure of adjustment, the observer used computer keyboard keys to set the disparity of a single feature until the criterion was met, registered the setting, and repeated this step after it had been randomly reset. Thirty such repetitions constituted a run and yielded the average setting and a standard error of this value. For each parameter identified in the Subsection 2B, usually three such runs, obtained on different days, were averaged.
Observers included the author and several undergraduate biology students naïve as to result expectation. Their myopic refractive errors had been corrected with spectacle or contact lenses, and their optometric status, including visual and stereoscopic acuity, was otherwise unremarkable. The experimental protocol was approved by the Institutional Review Board, The Committee for the Protection of Human Subjects.
2B1. How Well Can Disparity-Generated Depth be Measured by Distances in the Frontal Plane?
Observers are adept at detecting depth perturbations (“Are all three lines coplanar?”) and even comparing larger depth steps (“Place target A so that its depth difference from target B is half that of target C” ). But such measurements are performed by making judgments within the realm of depth signals and do not say anything about the overall scale, i.e., the “gain,” in the relationship between disparity and apparent depth. If the whole depth dimension is stretched, this cannot be demonstrated by any stimulus set that is entirely contained within it. For this, an outside standard is needed. Of all possible modes of comparing stereoscopic depth, the most closely allied is distance within the frontal plane, because it also defines location in visual space, though, as mention above, its arises via a different neural apparatus.
This concept was implemented in two different ways, the display always being presented on a screen at a fixed distance from the observer, right and left eye views superimposed. In the first, the observer manipulated the disparity of the middle of a line triplet to create a 3D configuration that gave the appearance of looking into the interior of a cube bisected along a diagonal, or, when reversed, of looking at a cube with one corner pointing toward the observer, in other words, to give the appearance of a right angle in depth [Fig. 3 (left)].
In the second, the observer was asked to change the disparity of one target so that the depth difference separating it from an established plane appears equal in magnitude to a distance between two points within this plane shown binocularly. Because disparity introduces horizontal shifts in the right and left targets, the three points, two for the comparison distance within the plane and one for depth difference, were placed one above the other.
From the results, shown in Fig. 4, it is clear that the procedures yield good measures of the estimates of the depth generated by disparity, with satisfying concordance between two quite different test methods. In Figure 4 all quantities are in angles; spatial intervals in the frontal plane are expressed in visual angle at the observer’s eyes, and the difference in binocular parallax between the fixation plane and the target setting is the disparity angle for this observer. They translate directly to retinal distances; for the case of disparity, this is necessarily distributed between the two eyes.
When these angular quantities are translated into the actual locations that real targets would be occupying in the observer’s object space (Fig. 2), the answer to the second question emerges.
2B2. How True Is the Measure of Depth That Has Been Secured by Comparison with Distance in the Frontal Plane?
The data for four observers were now converted from angles subtended at the eye to corresponding frontal-plane and depth distances in the observers’ object space and plotted in Fig. 5, demonstrating a remarkably linear relationship. The perceived spatial metric generated by stereoscopic viewing of simple tokens on a computer monitor is surprisingly substantive. A glance at the scale along the axes reveals however that it fails by a considerable margin to match the world of stimuli.
Before considering this further, two control experiments were performed. In the first, all variables remained identical throughout except that, by suitable combination of target placements on the screen, the observer was forced to utilize varying amount of fusional convergence. This produced only unsubstantial differences; in trained observers, imposed eye vergence changes, in a range spanning 7–30 prism diopters, did not materially influence the apparent depth associated with a fixed binocular disparity in an otherwise unchanged visual scene (Table 1).
In Subsection 1B above it was pointed out that observation distance is a most relevant variable. Though all geometrical factors have been controlled and their impact eliminated, the experimental procedure involves the apparent size of the spatial comparison interval, and this is known to be affected when observers, as they often do in natural surroundings, perceive a target to be nearer or farther than actually situated. Hence, one of the experiments was replicated by one observer with both double and half the original screen distances, all other retinal-image and eye-vergence variables remaining invariant throughout. The results, in Fig. 6, show that, in an abstract situation, observation distance per se unaccompanied by other changes has only a very minor influence.
Though this study has been couched to address issues of 3D recording and display as they present themselves to users of current devices and has been informed by the differential geometry introduced into the subject by Luneburg, the questions raised have long been a concern to researchers. Quite early in stereoscopy, Brewster  inquired into the veridicality of what is seen in stereoscopes, and the subject was treated further by Helmholtz  and von Kries . The most insightful of the early studies was one by Heine  who credited Hering with suggesting it. Heine judged what 3D configurations of three rods would give him the percept of an isosceles triangle in depth and found that they were in fact isosceles triangles if they were at a distance in the range of about . He used the term “Orthoskopie” for the situation when there was a match between the physical stimulus and the percept but realized that, if unsupported by secondary cues, this was the exception rather than the rule.
Because it speaks to the divide between a psychophysical stimulus and the observer’s percept, the distinction between Heine’s “Orthoskopie” and von Rohr’s “homeomorphic” rendition needs to be emphasized. Von Rohr applied the term homeomorphic to a stimulus rendition in which the 3D structure of an original configuration is preserved in what is being presented to the observer. In Heine’s Orthoskopie it is not necessarily the physical stimulus pattern in the reproduced display but the observer’s percept that is true to the original physical structure.
Though Heine’s paper, which tackled the problem directly, has been all but ignored, the topic has by no means remained dormant. Direct comparisons of frontal-plane intervals with intervals in the third dimensions were conducted by Gogel  and Foley , whose studies featured large depth intervals, where one has to cope with operational factors of eye movement, possible diplopia, and the reduced sensitivity in nonfoveal vision. Moreover, at the time, they felt the need to take issue with a pre-existing theoretical framework—never firmly anchored empirically—which assumed that Heine’s Orthoskopie was the norm. The ingenious approach of looking into depth curvature , in which some invariance is expected, provides further insight into the situation. Its most effective implementation was by Johnston . Using stereograms featuring sheets of random dots, she found the 3D curvature that was needed for them to appear to the observer to constitute circular cylinders. These settings were performed for cylinders of a range of diameters at three observation distances. This is in essence the same paradigm employed in the experiments described above: matching a disparity by visual subtense. But instead of a cube delineated by a minimal number of discrete markers, Johnston used the criterion of a circular cylinder formed by a quasi-continuous pattern. It is, therefore, salutary to recognize the resulting differences: discrete small-signal depth stimuli, representative of fixed-screen stereo displays within arms length, yield a rather good linear depth/width relationship, quite robust to observation distance, that no longer holds in more elaborate situations.
The data presented here, and some of Johnston’s as well, however, reveal a prominent lack of orthoscopy, evident when it is computed how far the observers’ settings deviate from the stimulus that would be generated if an equivalent real structure had been presented in the location occupied by the display screen. Observers report a considerably foreshortened view of the configuration; for them to see the same depth as in the original, more disparity is needed than is mandated by geometry.
A conventional explanation involves the reciprocal apparent size/distance relationship: the larger the retinal-image representation of a known object, the nearer it is deemed to be (for a review, see the invaluable handbook ). Applied to the present case in a stringent form, it would assume the following formulation. The apparent size of the comparison spatial interval in the frontal plane is used as a yardstick for the depth attributed to the disparity. The closer the plane appears, the larger the estimate of the comparison interval and hence the smaller the estimate of the physical depth attributed to a given disparity. (This is because the nearer the display plane, the more angular disparity, relative to frontal-plane extent, would be generated by a cubical element.) The extent that the displayed disparity has to be increased for the desired match to be satisfied would be a measure of the reduced apparent screen distance.
The data in Fig. 5 can be interpreted as signifying a misjudgment of the screen position: it is perceived to be closer than its actual location and this leads secondarily to the compacted view of a cubical element and the augmentation of the depth setting beyond veridicality. For viewing 3D movies on a remote screen, this is certainly a perceptual mandate if the display includes components with uncrossed disparities that would be located behind the screen.
But overall, a numerical application of the apparent size/distance reciprocal relationship gives unconvincingly low estimates of perceived screen distance. And conversely (Table 1), strongly activated convergence, ordinarily associated with close-up viewing, seems not to have an influence here, though it has been reported elsewhere [11, 15].
Stimuli throughout have been of the most abstract kind, comprising just two dots or lines in the fixation plane and an additional one whose disparity was adjusted and judged by the observer. Thus the many secondary clues on which the usual depth judgment in a real scene depends are absent. A few preliminary experiments were conducted to find out whether this remains the case when the target configuration is enriched by creating screen images with more clues, such as the full outline of all visible contours of small cubes presented face-on. This stratagem did little to change the overall conclusions that the depth in simple static geometrical patterns fails to be perceived veridically, even when shown in homeomorphic view, i.e., when their 3D stimulus structure has been preserved. Because movement is also a visual primitive, conflated with depth at a fairly early neural level , the expansion of the analysis to dynamic displays seems indicated. The current results however suggest that perceptual factors predominate in any reports of undistorted depth in stereoscopic views of textured or natural scenes; the elucidation of their properties will as a prerequisite have to be based on the principles of the stimulus geometry here presented.
The concept of depth rendition is introduced and defined as the ratio of the depth to lateral distance in the image of a small cube captured by a 3D recording device and reconstructed on a screen for binocular view of observers. It is independent of magnification and equal to the ratio of the optical separation of the twin recording or viewing devices to the recording or viewing distance.
A rendition is called homeomorphic if the depth rendition of the recording situation is preserved in the viewing situation, but this does not imply that the viewer’s perceived depth in a structure in the image matches that of the original. Orthoscopy, veridical depth view in a reconstituted structure, is not the rule in the absence of secondary cues.
Experiments are presented that show rather good metric, if considerably foreshortened, properties of visual space in observers’ views of simple stereograms viewed in 3D displays.
1. R. K. Luneburg, Mathematical Analysis of Binocular Vision (Princeton University, 1947).
2. M. von Rohr, Die binokularen Instrumente (Springer, 1907).
3. J. von Kries, Notes in H. von Helmholtz Handbuch der physiologischen Optik, 3rd ed. (Voss, 1910), Vol. III.
4. G. A. Fry, “Visual perception of space,” Am. J. Optom. Arch. Am. Acad. Optom. 27, 531–553 (1950). [PubMed]
5. K. N. Ogle, “Precision and validity of stereoscopic depth perception from double images,” J. Opt. Soc. Am. 43, 906–913 (1953). [CrossRef]
6. W. C. Gogel, “Perceived frontal size as a determiner of perceived binocular depth,” J. Psychol. 50, 119–131 (1960). [CrossRef]
8. J. M. Foley, “Depth, size and distance in stereoscopic vision,” Percept. Psychophys. 3, 265–274 (1968). [CrossRef]
9. D. Brewster, The Stereoscope: Its History, Theory, and Construction (Murray, 1856).
10. H. Helmholtz, Handbuch der Physiologischen Optik. Abt. 3 (Voss, 1867).
11. L. Heine, “Über Orthoskopie,” Albrecht von Graefes Arch. Ophthalmol. 51, 563–572 (1900).
14. I. P. Howard and B. J. Rogers, Seeing in Depth (Porteous, 2002), Vol. 2.
15. E. Wist, “Eye movements and space perception,” Bibl. Ophthalmol. 82, 348–357 (1972). [PubMed]
16. G. C. De Angelis, B. G. Cumming, and W. T. Newsome, “Cortical area MT and the perception of stereoscopic depth,” Nature 394, 677–680 (1998). [CrossRef]