When using stereographic image pairs to create three-dimensional (3D) images, a deep depth of field in the original scene enhances the depth perception in the 3D image. The omnifocus video camera has no depth of field limitations and produces images that are in focus throughout. By installing an attachment on the omnifocus video camera, real-time super deep stereoscopic pairs of video images were obtained. The deeper depth of field creates a larger perspective image shift, which makes greater demands on the binocular fusion of human vision. A means of reducing the perspective shift without harming the depth of field was found.
©2012 Optical Society of America
Taking the example of a music concert scene being recorded for a TV program, even though the singer is in sharp focus, the members of the band in the background are almost always out of focus. This is true even with major national television stations. Ordinary video cameras cannot focus both the singer and the members of the band in the background at the same time.
3D movies are essentially made by compiling two images from an ordinary 2D video camera into a pair of stereoscopic images [1,2]. If the background or foreground in the 2D images is out of focus to begin with, then this impairs the 3D perception of the entire scene.
The omnifocus video camera  removes this limitation. All objects in the scene, whether extremely near or extremely far, multiple or single, are all in focus with high image quality without physically moving optics in the camera in order to focus the camera. Figure 1 shows an example of a video image with a super deep depth of field.
In Fig. 1, the matchstick is 30 cm from the camera, the bust is 3 m, and the Daruma in the background is 6 m from the camera. All objects are simultaneously focused. The image of the finger in the foreground is so sharply focused that even the fingerprints are easily recognizable.
It is the purpose of this investigation to report 3D imaging with a super deep depth of field. The stereoscopic images are recorded by using an attachment that converts the omnifocus video camera into a 3D omnifocus video camera. Figure 2 summarizes the operation of the omnifocus video camera without the attachment. The details are reported in .
The camera consists of an array of color video cameras combined with a unique distance mapping camera called the Divcam (divergence-ratio axi-vision camera) , which measures distance pixel by pixel using the transmission decay of two point sources of IR light as a measuring stick. The condensed version of the principle of operation of the Divcam is included in Appendix A. The Divcam and the color video cameras are aimed at the same scene, but each is focused at a different distance. The Divcam provides real-time distance information for every pixel in the scene. A pixel selection utility uses the distance information to select individual pixels from the multiple video outputs in order to generate the final single video display that is in focus throughout.
The principles of 3D imaging are reviewed [Figs. 3 and 4], and these principles are used to explain the attachment [Fig. 5] for converting the omnifocus video camera into a 3D omnifocus video camera with a super deep depth of field. Binocular fusion problems, which arise from the excessive depth of field, are solved by modifying the polarized light method. The optimum location for viewing the super deep stereogram and the achievable depth resolution are analyzed using a horopter diagram.
2. Principle of Operation
Before describing the attachment, it is useful to review the general principle of operation of 3D imaging on which the attachment is based. Figure 3 shows the arrangement for making a stereogram of a scene composed of a coin placed in the foreground and a Daruma placed in the far background. The stereoscopic images consist of two photos taken from two spots that are approximately an interpupillary distance (6.5 cm) apart. Let the image taken from the right spot be called the right eye view image, and the other, the left eye view image. It should be noted that the coin appears at a spot shifted to the left of the Daruma in the right eye view image, while the coin appears at a spot shifted to the right of the Daruma in the left eye view image. These shifts are called perspective shifts. The difference in the appearance of the same scene in the right and left eye view images is called the disparity or parallax effect. It is this disparity that the brain uses to construct the 3D image. The amount of the perspective shift is an important factor in determining the depth of the image, and at the same time, large shifts can regrettably create binocular fusion problems in human vision.
Figure 4 shows an arrangement for viewing a 3D image from a stereoscopic pair of images based on the polarized light method. The right eye view image is posted on the left side of the monitor, while the left eye view image is posted on the right side of the monitor; thus, the posted images are transposed. The line connecting the coin in the left eye view image on the monitor with the observer’s left eye and the line connecting the coin in the right eye view image on the monitor with the observer’s right eye constitute a crisscross light path, which is indicated by the solid lines in Fig. 4. The intersection of these two lines is where the observer perceives the existence of the coin. Thus, the image of the coin appears to pop out of the monitor.
Besides the crisscross light path shown in the solid lines in Fig. 4, there is another possible light path connecting the observer’s eyes. That is the parallel light path indicated by the dotted lines. This parallel light path has to be blocked; otherwise, the 3D perception of the image of the coin is destroyed, and the observer simply sees a picture of the two coins on the monitor instead of one single 3D image of the coin in front of the monitor. It is only after blocking the parallel light path that the 3D image of the coin is observed. The polarized light method uses two orthogonal directions of polarized light to display the image associated with the crisscross path and block the image associated with the parallel path, whereas the anaglyph method  uses two basic colors of light to satisfy this condition. In our polarized light method, polarized light from a liquid crystal display (LCD) monitor is used. The light from the LCD monitor is always polarized for the monitor to function . Let us say that the light from the monitor is vertically linearly polarized.
The 25 μm thick cellophane sheet  acts as a half-wave plate and can be used to rotate the direction of polarization of the light [8–10]. The left half of the monitor where the right eye view image is displayed is covered by the cellophane sheet so that the direction of polarization of the light from this portion is converted into horizontally polarized light. Thus, the left half of the monitor is polarized horizontally, while the right half is polarized vertically.
The observer wears a pair of cross-polarized glasses. The right eyepiece is polarized horizontally, and the left eyepiece is polarized vertically. Thus, the observer’s right eye is able to see the right eye view image but unable to see the left eye view image. Similarly, the left eye is able to see the left eye view image but unable to see the right eye view image. Thus, the crisscross light path is established, and the observer perceives a 3D image of the coin at point P in Fig. 4.
Referring to the geometry shown at the bottom of Fig. 4, the length of the protrusion is calculated as
3. 3D Attachment to the Omnifocus Video Camera
The attachment to the omnifocus video camera performs the operations mentioned in the previous section in real time. The major functions required for the attachment are (1) both left and right eye view images of the same scene are taken simultaneously, and (2) the positions of the left and right eye view images are transposed before they are fed to the omnifocus video camera.
Figure 5 shows the schematics of the attachment. The same scene is taken in by way of the left and right external surface mirrors and then is reflected by total internal reflection from one of the 3 cm sides of the equilateral prism with 3 cm height. Both left (represented by the red line) and right (represented by the green line) eye view images are funneled into the omnifocus video camera. The prism not only combines the two images but also transposes the two images.
The light path represented by the black line indicates a possible unwanted light path that enters the input lens of the omnifocus video camera directly, bypassing the external surface mirrors. This light path is strategically blocked by a combination of 2.5 cm diameter aperture and 50 cm focal length concave lens.
When the depth of field is super deep, the angle of the external mirrors with respect to the center line critically influences the coverage of the scenery, and the external surface mirrors were aligned with the aid of a laser beam temporarily positioned at the back focal point of the input lens of the omnifocus video camera.
4. Experimental Results
In order to study the effect of super deep depth of field, the objects being photographed were arranged with as much spread as possible from one end of the laboratory to the other. No special lighting source was used. The room was lit in ordinary brightness by an array of fluorescent ceiling lamps. The object distances are indicated in Fig. 6. The nearest object is a 17 mm diameter Canadian silver coin and is located 60 cm from the camera. The life-size white clay bust is located at 3 m from the camera, and the 60 cm high Daruma, a symbol of perseverance and good luck by the Japanese, is located at 6 m from the camera, which is the furthest end of the laboratory.
Figure 7 shows the super deep stereogram of the scene shown in Fig. 6. The striking characteristic of the super deep stereogram is the excessive amount of perspective shift . The perspective shift is so large that the stereoscopic pair of images of the coin almost touch both edges of the stereogram. The value of the perspective shift is larger than that of an ordinary stereogram by an order of magnitude. The depth of field of the reconstructed 3D image is, accordingly, extra large. But a large perspective shift means that an observer needs a large power of binocular fusion to view the stereogram as one image.
Ideally, the reconstructed 3D image does not differ from the real scene viewed by the naked eye. Before the experiments, an observer positioned his eyes at the location of the camera in Fig. 6, and he viewed the scene. At the first glance, every object in the scene seemed to be in focus, and nothing was unusual about the present setup. But when the observer concentrated his attention on interrogating the relief pattern of the yacht on the 17 mm diameter coin, the Daruma became a double vision, indicating that the binocular fusion limits of his vision had been exceeded. Then, the observer looked back to the details of Daruma’s whiskers, and he made this object his fixation point (object of precedence). The details of Daruma’s whiskers returned to a single vision, but this time the coin became a double vision. The observer had to put the fixation point (precedence) on one of the objects if he wanted to see it as a single vision. This phenomenon is known as binocular rivalry .
Next, the real scene was replaced by the stereogram of the super deep 3D image reconstructed on the 22 in. monitor. The observer can view every object in the same way as the naked eye saw the real objects. When the observer concentrated his attention to interrogate the relief pattern of the yacht on the coin in the foreground, the binocular fusion of the eyes was broken and the background Daruma became a double vision. When the observer chose the Daruma as the fixation point and concentrated his attention on the Daruma’s whiskers, the coin returned to a double vision due to the binocular rivalry, just like what happened when the real objects were viewed by the naked eye.
The depth range of the super deep stereogram is so deep that the observer’s power of binocular fusion fails against an excessive amount of perspective shift. The depth of field of the super deep 3D image is only limited by the amount of the perspective shift that allows binocular fusion. In order to exploit the full benefits of the super deep stereogram, one must find a way to reduce the amount of the perspective shift without deteriorating the deep field of view.
5. Modification on the Display of the Super Deep Stereogram
In the viewing system shown in Fig. 4, the left and right eye view images are butted side by side in the same plane. In this arrangement, the perspective shift is at least the width of the image, regardless of the location of the object. In the case of anaglyphs or a time sharing 3D TV set (field-sequential stereoscopic television), or even a 3D movie [1,12], the left and right eye view images overlay one another, thus reducing the overall perspective shift.
A. Anaglyph Format
The idea of overlaying the images was realized by converting the obtained stereogram into anaglyph format. Figure 8 shows the anaglyph format fabricated using the left and right eye view images shown in Fig. 7. The left eye view image was converted into red duotone, and the right eye view, into green duotone. When these two color images are flattened into one, the faces of the Daruma, which is the furthest object, were aligned together. The amount of perspective shift of the Daruma became zero, and that of the coin was reduced to almost one half of that in Fig. 7.
The observer wears a pair of glasses with the red color filter on the right eye and with the green color filter on the left eye. With this arrangement, the image of the Daruma appears on the surface of the monitor, but the image of the coin protrudes from the monitor surface. The problem of binocular fusion is significantly suppressed using the anaglyph format. The anaglyph, however, has the disadvantage of losing the color information of the image. The idea of overlaying images was introduced into the polarized light method to shorten the perspective shift and yet retain the color information, as described in the next subsection.
B. Modified Polarized Light Method Using Perpendicular Monitors
Based on knowledge obtained from anaglyph, the polarized light method was modified to reduce the amount of perspective shift to ease the strain of the binocular fusion. The stereoscopic pair of images, which were butted side by side in the same LCD monitor in Fig. 4, was separated onto two separate LCD monitors arranged perpendicular to each other as shown in Fig. 9.
The right eye view image is displayed on the front monitor, and the left eye view image is displayed on the side monitor. The direction of polarization of the front monitor was rotated to the horizontal from the vertical direction by covering the monitor surface with a 25 μm cellophane sheet. The left and right eye view images are combined using a half mirror.
It should be noted that the reflection of the image from the half mirror creates the reversal of the left and right parity of the left eye view image. The left and right parity of the left eye view image was reversed before displaying it on the monitor.
Now that the left and right eye views are displayed on two separate monitors, the relative orientation of the two eye view images becomes adjustable. The relative orientations of the two monitors were adjusted in such a way that the background Daruma images from both monitors were laid almost coincident with each other. In this way the perspective shift of the Daruma becomes zero and that of the coin is significantly reduced. The Daruma appears on the plane of the monitor, and the image of the coin protrudes from this plane.
With this arrangement, the size of the screen is doubled, the high color fidelity of the image is maintained, and the problem of binocular fusion is diminished even when the size of the stereogram is enlarged to the full size of the 22 in. screen. There is an optimum viewing range for this arrangement, which is calculated in the next section.
A superb 3D image with a deep depth of field was observed in full color fidelity. The perceived size of the objects in the 3D image is close to the real size of the objects. An observer who is standing a couple of meters in front of the monitor can really appreciate the depth of the scene and feels as if he is in the real scene instead of just seeing a 3D image on the monitor. This is a very important achievement for a future step toward super deep 3D movies.
C. Binocular Fusion
The optimum range for viewing the stereogram free from the problem of binocular fusion is calculated. The 3D image viewing system shown in Fig. 9 is represented on a horopter diagram [1,11,13–15] in Fig. 10. The furthest object, which is the Daruma, is taken as the fixation position. The left and right eyes are involuntarily converged so that the position of the Daruma image is focused at fovea and , respectively. The images of the coin take the crisscross light path due to the manipulation of the polarized light. The light path from the right coin reaches point at incident angle , which is to the left of the fovea, whereas that from the left coin reaches point at incident angle , which is to the right of the fovea.
If the incident angle with respect to the line to the fovea in one eye is the same as the angle in the other eye, the brain senses no disparity. In fact, the horopter is the locus of zero disparity. If, however, the incident angle in one eye is different from that in the other eye, the brain senses a disparity and uses the information to determine the distance from the Daruma. In the present case, and are in opposite directions, and the total disparity is , which contributes to the stereoscopic perception of the coin with respect to the Daruma. If the disparity is too large, the brain cannot fuse the images seen by the two eyes into one image and the observer sees the double image of the coin.
Referring to the diagram in Fig. 10, the total disparity () can be represented by the difference between the peripheral angle of the Daruma and peripheral angle of the coin as4 as
It is also possible to represent the total disparity () by the peripheral angle subtended by the two coin images at point 0:4) can be proven using trigonometric geometry. The peripheral angles subtended by arc are 5) and (6), Eq. (4) is obtained. Equation (4) is further rewritten in terms of the physical dimensions indicated at the bottom of Fig. 4 as
The condition to be free from the binocular fusion problems  is that the horizontal disparity satisfies7) and (8), the minimum distance of observation is calculated. The observer should stay at least 3.5 m or farther from the monitor. Since the size of the stereogram is large, a distance of a few meters is not an excessive distance. It is analogous to a movie theater where one generally sits several meters from the movie screen. An extremely realistic super deep 3D image is observed. If one wishes to see closer, all that is needed is to reduce the size of the stereogram on the monitor accordingly.
D. Depth Resolution of the Image
The depth resolution of the 3D image is calculated. The minimum change in the disparity that healthy eyes can still detect  is3) and (9) into Eq. (10), an approximate expression of can be found as 11) the values of the parameters associated with the coin object in Fig. 9, and , the depth resolution is obtained as
Evaluation of a 3D image is subjective, but to the author’s and his colleagues’ eyes, there is a distinct difference between viewing an ordinary 3D image and a super deep 3D image. The sensation generated by seeing a super deep 3D image is phenomenal.
Not only have the annoying problems of the blurriness at the extremities in the 3D image been resolved by using the omnifocus video camera, but also the upper limit of the depth of field caused by binocular fusion limitations was practically removed from the polarized light method by arranging two separate LCD monitors in perpendicular orientation.
The range of the optimum location to view the super deep 3D image free from binocular fusion problems as well as the obtainable depth resolution were calculated using the horopter diagram. This investigation has many ramifications that are still being explored, including the future of the 3D movie industry.
Appendix A: Principle of Operation of the Divcam
The intensity of a point source decays with the inverse square of the distance due to the divergence of the light. The Divcam (divergence-ratio axi-vision camera)  is unique in that the decay of two illuminating IR point light sources is used as a means of mapping the distance. As shown in Fig. 11, two similar IR LEDs are installed for illuminating the same object, one in front of the IR CCD camera and the other behind the camera. The procedure is summarized in three steps.
- Step 3. It is the ratio of the intensities of the two images that is used to calculate the distance to the object. By taking the ratio of the two snapshots, the contribution of the reflectivity of the object is cancelled. Thus, this camera provides the correct distance regardless of the color or surface condition of the object:
Incorporating an IR bandpass filter, the Divcam can operate under room light. Resolutions of better than 10 mm over a range of meters and 0.5 mm over a range of decimeters were achieved. Special features of this camera are its high resolution, real-time operation, simplicity, compactness, light weight, portability, and yet low fabrication cost.
The author was reassured of the importance of the project during the course of this investigation by a letter from Mr. Juca Cassioli indicating his keen interest in seeing the super deep 3D images made possible by the omnifocus video camera. The author would like to extend sincere appreciation to Ms. Mary Jean Giliberto for proofreading the manuscript.
1. T. Izumi, The Basics of 3-Dimensional Images (Ohmsha, 1995).
2. T. Okoshi, “Three-dimensional displays,” Proc. IEEE 68, 548–564 (1980). [CrossRef]
3. K. Iizuka, “Omnifocus video camera,” Rev. Sci. Instrum. 82, 045105 (2011). [CrossRef]
4. K. Iizuka, “Divergence ratio axi-vision camera (Divcam): a distance mapping camera,” Rev. Sci. Instrum. 77, 045111 (2006). [CrossRef]
5. K. Iizuka, “Welcome to the wonderful world of 3D: anaglyph experiments,” Opt. Photon. News 18(2), 24–29 (2007). [CrossRef]
6. K. Iizuka, “Light touch: experiments with liquid crystal displays,” Opt. Photon. News 13(11), 12–13 (2002).
7. K. Iizuka, “Cellophane as a half-wave plate and its use for converting a laptop computer screen into a three dimensional display,” Rev. Sci. Instrum. 74, 3636–3639 (2003). [CrossRef]
8. B. E. A. Saleh and M. C. Teich, Fundamentals of Photonics (Wiley, 1991).
9. E. Hecht and A. Zajac, Optics (Addison Wesley, 1974).
10. K. Iizuka, Elements of Photonics (Wiley, 2002).
11. K. N. Ogle, “On the limits of stereoscopic vision,” J. Exp. Psychol. 44, 253–259 (1952). [CrossRef]
12. K. Iizuka, Engineering Optics3rd ed. (Springer-Verlag, 2008).
13. T. Okoshi, Three-Dimensional Imaging Techniques (Asakura Shoten, 1991) (in Japanese).
14. K. M. Schreiber, J. M. Hillis, H. R. Filippini, C. M. Schor, and M. S. Banks, “The surface of the empirical horopter,” J. Vision 8(3), 7 (2008). [CrossRef]
15. M. Schreiber, D. B. Tweed, and C. M. Schor, “The extended horopter: quantifying retinal correspondence across changes of 3D eye position,” J. Vision 6(1), 6 (2006). [CrossRef]