We describe a novel real-time depth-mapping camera, the Gain-modulated Axi-Vision Camera, where pulsed laser light is combined with a gain-modulated camera. Depth resolution of 2.4 mm was obtained, which is higher than the resolution of the previously reported depth-mapping Axi-Vision Camera. Pixel-by-pixel depth information of 768×493 pixels is obtainable at one half of the video frame rate (15 Hz). A short movie clip is attached that illustrates the depth measurement operation. The merits of the Gain-modulated Axi-Vision Camera are high-resolution, real-time operation, and a relatively simple optical system. These merits primarily arise from the ultra-fast exposure time using a pulsed laser diode.
©2004 Optical Society of America
A camera that can map depth information simultaneously with the color image will no doubt find a variety of applications; likely applications include 3D TV, CG animation, biomedical imaging, and robotic vision, to name a few.
There is a vast literature on the subject of depth mapping methods. We will attempt to highlight only recent publications so that the interested reader can go back through their cited references.
Another interesting scheme for depth mapping involves the projection of structured patterns. A structured pattern is projected over an object and from the distortion of the projected pattern, the surface shape of the object is determined. Various kinds of projected patterns have been used, including strip patterns , sinusoidal patterns , coded patterns , spatially intensity modulated patterns , and Moiré patterns [12–14].
The majority of the above mentioned methods, with few exceptions, are applied to stationary objects. When applied to moving objects, these methods have shortcomings in either mapping speed or the number of detectable points, or they have difficulty in obtaining pixel-by-pixel depth information at video frame rates with resolution high enough for TV video images.
In contrast, the time-of-flight methods are well suited to high speed real time operation and require minimal information processing [15–19]. Our work is based on time-of-flight methods. The prototype Axi-Vision Camera uses an ultra-fast shutter to sample the instantaneous distribution of light reflected from objects that are illuminated by high-speed, intensity-modulated light. This prototype has been further modified to make it suitable for creating a virtual studio in high-definition (HD) TV broadcasting systems . The HDTV Axi-Vision Camera has been used to produce TV programs in which live scenes were composited with computer-generated graphics ; these TV programs were broadcast from an NHK station in Japan.
In this paper, we propose a variation on the Axi-Vision Camera in which pulsed laser light replaces the function of the ultra-fast shutter and a gain-modulated camera replaces the need for intensity modulation of the illumination. Owing to the faster exposure time of the system, the depth resolution of this new camera is higher than that of the HDTV Axi-Vision Camera. In addition to the improved resolution, the new camera has the advantage of being less expensive to build than the HDTV Axi-Vision Camera because it does not require an ultra-fast shutter.
2. Principle of the Gain-modulated Axi-Vision Camera
Figure 1 shows a block diagram of the Gain-modulated Axi-Vision Camera.
Objects are illuminated by the infrared pulsed laser. The gain of the camera that captures an image of the objects is linearly increased with respect to time. Thus, the light reflected from farther distances is amplified with a larger camera gain than that from nearer distances. The output image intensity of far objects is higher than that of near objects. The image intensity represents the distance to the object. With an increase in the speed of the gain modulation, the sensitivity of the depth measurement is increased.
Referring to the time diagrams shown in Fig. 2, let us derive an expression to calculate the desired depth from the measured quantities. Let us start with the case of linearly ascending camera gain (or to be more exact, the ascending gain of an image intensifier that is connected to the CCD camera through a relay lens).
Let the object at depth d be illuminated by the pulsed light launched from the laser at time tp. The light reflected from the object returns to the camera at tp+2d/v, where v is the speed of light. The gain of the camera is linearly increased as shown in Fig. 2(a) and is expressed by
where g is the rate of ascending gain of the camera. The detected signal I+ from the camera is
where the object is assumed to be a point object with backscattering cross-section ρ, and I0 is the intensity of the incident pulsed laser light.
Note that the detected signal I+ is not enough to determine the desired depth d. It contains an unknown parameter (2)2 ρI0g 4πd that appears in the first factor of Eq. (2).
Luckily, this first factor can be removed by just one more measurement with the same object. The second measurement is performed with the same object, this time using the linearly descending gain of the camera as shown in Fig. 2(b). The gain of the camera is expressed by
where Tg is the period of gain modulation of the camera. Therefore, signal I- taken with the descending gain of the camera is
Noted that the first factor of Eq. (4) is identical to that of Eq. (2). The dividend R of Eq. (2) with Eq. (4) gives a quantity free from the first factor of Eqs. (2) and (4). It is important to note that the dividend,
3. Details of the Gain-modulated Axi-Vision Camera
The details of the Gain-modulated Axi-Vision Camera are shown in Fig. 3. An infrared laser diode (LD) is used to generate the illuminating pulse. The wavelength of the LD is chosen at 803 nm because it is outside the visible range and so does not interfere with the room light necessary for the color camera. The full width at half maximum (FWHM) of the pulsed light is 68 ps, and the average power output of the pulsed light is 9.7 mW at a 40MHz repetition rate.
The gain-modulated camera consists of an objective camera lens, band-pass optical filter, image intensifier, relay lens, and CCD camera. The objective camera lens first forms the input optical image of the objects on the photocathode of the image intensifier. This optical image is then converted into an electron image by the photoelectric effect. The converted electrons are accelerated toward a microchannel plate (MCP) that in turn amplifies the intensity of the electron image.
When a higher voltage with respect to the photocathode is applied to the MCP, more electrons are introduced into the MCP and the gain of the camera is increased. Fast gain modulation is achieved by quickly changing the level of this bias voltage. The period Tg of gain modulation is 1 ns.
At the final stage, the amplified electron image from the MCP is converted back into an optical image on the phosphor plate of the image intensifier. This optical image is focused by the relay lens onto the CCD camera.
A function generator provides the trigger signals for both triggering the LD and modulating the gain of the image intensifier. When the switch is flipped to port A, the LD is pulsed during the ascending cycle of the triangular gain modulation. When the same switch is flipped to port B, the trigger signal is delayed by an inserted delay line and the LD is pulsed during the descending cycle of the triangular gain modulation.
Instead of switching between port A and B every period of the triangular gain modulation, the switch stays on port A for 33 ms, and the image is repeatedly captured 1.32×106 times during the ascending gain and the outputs are accumulated in the CCD. Then the switch is then flipped to port B for the next 33 ms and the image is repeatedly captured 1.32×106 times during the descending gain and the outputs are again accumulated in the CCD.
The accumulation of the charges in the CCD in this manner significantly enhances the camera sensitivity. The switching pulse of the 33-ms period is extracted from the vertical synchronization signal of the color camera. From the CCD camera output, the depth is calculated in the signal processor according to Eqs. (5) and (6). The total acquisition time for the two images of I+ and I- with 748×493 pixel frames is 66 ms, and the frame rate of the depth measurement is one half that of an ordinary color television camera.
The color video of the objects is simultaneously captured by a separate color camera. The visible light reflected from the object is reflected in both ordinary and dichroic mirrors and reaches the color camera. The near-infrared light reflected from the object passes through the dichroic mirror and reaches the CCD camera.
4. Performance test of the camera
4.1 Camera output as a function of object distance
The output from the camera is plotted as a function of the distance to the object in Fig. 4.
The object was a vertically posted white paper panel. Although the measured curve is relatively straight around 500 mm it starts to curve as the ends of the measured range are approached. The modulation of the image intensifier gain was not perfectly triangular near the edges of the triangle. This is likely the cause of the curvature toward the ends of the measured range.
The root-mean-square value σ of the noise voltage of the output depth video signal was measured, and the depth corresponding to 3 σ defined the depth resolution. The noise level of the depth image of an object in the center of the range was measured by a video signal analyzer. Its RMS value σ was 4.3 mVrms; 3 σ is 1.8% of the video signal level. The depth resolution defined by the depth corresponding to 3 σ in Fig. 4 was 2.4 mm for a distance of 500 m from the camera to the objects in the center of the range.
4.2. Depth imaging
An object with multiple steps was arranged as shown in Fig. 5(a). Five flat panels were arranged like a staircase. The surface of each panel was white paper, and the step depth was 10 mm. The distance from the camera to the centre of the object was 500 mm. The height of the panel was 60 mm. The average light intensity of the illuminated light was 9.2 µW/cm2.
(a) Geometry of the object and camera.
(b) Image I+ with ascending gain.
(c) Image I- with descending gain.
(d) Depth image and video signal from the signal processor.
Figure 5(b) shows the captured video signal image I + when the gain of the image intensifier was in the ascending cycle, while Fig. 5(c) shows the captured video signal image I - when the gain of the image intensifier was in the descending cycle. Compare the shapes of the staircases for these two cases. The step height for the upward staircase in I + is shorter than that for the downward staircase in I -. This may be because the divergence effect ρI0g/(4πd 2)2 counteracted the gain increase with d/v in the I + image, while the divergence effect of ρI0g/(4πd 2)2 assisted the gain decrease with d/v in the I - image.
Figure 5(d) shows the result of the depth measurement. The near object was expressed as a bright image and the far object was expressed as a dark image. The fluctuation in the step height of the staircase shape was approximately ±1.2 mm with a 40-mm-thick object located 500 mm from the camera.
4.3 Video clip from depth-mapping using the Gain-modulated Axi-Vision Camera
Figure 6 consists of videos taken when a mask of a long-nosed Tengu, a mountain goblin from Japanese mythology, was put on a turntable to demonstrate the real-time depth mapping operation. The mask was 160 mm wide, 200 mm high and 160 mm deep. The nose of the mask was 100 mm long. The output from the pulsed LD was expanded by a lens to about a 15° angle. The diameter of the circular area of illumination was approximately 200 mm at a distance of 850 mm. The area was illuminated by infrared pulsed light with an average light intensity of 1.1µW/cm2. The object was also illuminated by fluorescent room light of 550 lx, which has weak components in the near infrared region of the spectrum. A depth image with 768×493 pixels was obtainable during each update interval of 1/15 s. The camera system can simultaneously capture the depth image and an ordinary video image of the object. The output video signals were in standard TV format.
Figures 6(a) and (b) show the video image during the ascending and descending gain of the camera, respectively. Figure 6(c) shows the color-coded depth image of the object that was calculated from the video of (a) and (b). Red represents near objects, and blue, far objects. Even when the object is in motion, and moreover, has pronounced variations in surface conditions such as in color, roughness, glare, convex and concave curvatures, no irregularities are recognizable in the depth video.
Finally, the similarities and dissimilarities between the Gain-modulated Axi-Vision Camera and the prototype Axi-Vision Camera are pointed out. While the prototype Axi-Vision Camera uses the ultra-fast shutter [17, 20], the Gain-modulated Axi-Vision Camera leaves the shutter open at all times and uses an ultra-narrow pulsed laser for illumination to capture an instantaneous image. Thus, it is important to use an optical filter which effectively rejects the visible range spectrum because the residual spectrum significantly increases the error of the system’s depth measurement. A shorter period Tg of modulation also minimizes the influence of the residual spectrum, but the period Tg of the gain modulation cannot be made arbitrarily small because the range of the measurable depth is shortened accordingly.
One of the most important factors determining the resolution power is the pulse width of the laser. At present, the fastest available shutter speed of a camera with an image intensifier is about 1 ns, and lasers with a much shorter pulsed light width (order of picoseconds) are readily available. That is the main reason for the Gain-modulated Axi-Vision Camera (which is based on short laser pulses) having a much higher depth resolution power than the prototype Axi-Vision Camera (which is based on the ultra-fast shutter).
Moreover, in the prototype Axi-Vision Camera that uses the ultra-fast shutter and the intensity-modulated light, illumination is necessary while the shutter is not open. On the other hand, in the Gain-modulated Axi-Vision Camera system, the light is only on for ultra-short periods of time. Thus, the illumination power of the Gain-modulated Axi-Vision Camera is lower than that of the prototype Axi-Vision Camera and the efficiency of light utilization for depth measurement is good. This increased efficiency may be advantageous from the viewpoint of eye safety.
The Gain-modulated Axi-Vision Camera costs less to fabricate than the prototype. The most expensive part of the prototype Axi-Vision Camera is the ultra-fast shutter associated with the image intensifier. The Gain-modulated Axi-Vision Camera does away with this ultra-fast shutter by replacing it with a laser diode.
A novel depth-mapping camera, the Gain-modulated Axi-Vision Camera, was reported. This system combines a short pulsed infrared laser diode with a gain-modulated camera. The system details and performance of the camera were reported. The camera can capture a depth image with 768×493 pixels at a 2.4-mm depth resolution with a 15-Hz repetition rate. A short video clip demonstrated the real time operation of such a camera. The quality of the video is excellent, despite the fact that the object is in motion and has pronounced variations in surface shape and texture.
The authors thank Mary Jean Giliberto for her comments and articulate proofreading of the manuscript and Ayumi Iizuka for her recording the narration.
References and links
1. H. Shimotahira, K. Iizuka, S.-C. Chu, C. Wah, F. Costen, and Y. Yoshikuni, “Three-dimensional laser microvision,” Appl. Opt. 40, 1784–1794 (2001). [CrossRef]
2. H. Shimotahira, K. Iizuka, F. Taga, and S. Fujii, “3D laser microvision,” in Optical Methods in Biomedical and Environmental Science, H. Ohzu and S. Komatsu, eds. (Elsevier, New York, 1994) pp.113–116.
3. T. Kanamaru, K. Yamada, T. Ichikawa, T. Naemura, K. Aizawa, and T. Saito, “Acquisition of 3D image representation in multimedia ambiance communication using 3D laser scanner and digital camera,” in Three-Dimensional Image Capture and Applications III, B. D. Corner and J. H. Nurre, eds., Proc. SPIE3958, 80–89 (2000).
4. D. A. Green, F. Blais, J.-A. Beraldin, and L. Cournoyer, “MDSP: a modular DSP architecture for a real-time 3D laser range sensor,” in Three-Dimensional Image Capture and Applications V, B. D. Corner, R. P. Pargas, and J. H. Nurre, eds., Proc. SPIE4661, 9–19 (2002).
5. V. H. Chan and M. Samaan, “Spherical/cylindrical laser scanner for geometric reverse engineering,” in Three-Dimensional Image Capture and Applications VI, B. D. Corner, P. Li, and R. P. Pargas, eds., Proc. SPIE5302, 33–40 (2004).
6. T. Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka, “A stereo machine for video-rate dense depth-mapping and its new applications,” in Proceedings of the 15th Computer Vision and Pattern Recognition Conference (San Francisco, California, 1996), pp. 196–202.
7. Y. Oike, M. Ikeda, and K. Asada, “Design and implementation of real-time 3-D image sensor with 640×480 pixel resolution,” IEEE J. Solid-State Circuits , 39, 622–628 (2004). [CrossRef]
8. G. Frankowski, M. Chen, and T. Huth, “Real-time 3D shape measurement with digital stripe projection by Texas Instruments micro mirror devices DMD,” in Three-Dimensional Image Capture and Applications III, B. D. Corner and J. H. Nurre, eds., Proc. SPIE3958, 90–105 (2000).
9. M. H. Demers, J. D. Hurley, R. C. Wulpern, and J. R. Grindon, “Three dimensional surface capture for body measurement using projected sinusoidal patterns,” in Three-Dimensional Image Capture and Applications, R. N. Ellson and J. H. Nurre, eds., Proc. SPIE3023, 13–25 (1997).
10. T. Abe, K. Tokai, Y. Yamaguchi, O. Nishikawa, and T. Iyoda, “New range finder based on the re-encoding method and its application to 3D object modeling,” in Three-Dimensional Image Capture and Applications V, B. D. Corner, R. P. Pargas, and J. H. Nurre, eds., Proc. SPIE4661, 20–29 (2002).
11. T. Azuma, K. Uomori, and A. Morimura, “Real-time active range finder using light intensity modulation,” in Three-Dimensional Image Capture and Applications II, B. D. Corner and J. H. Nurre, eds., Proc. SPIE3640, 11–20 (1999).
12. Y.-B. Choi and S.-W. Kim, “Phase-shifting grating projection moiré topography,” Opt. Eng. 37, 1005–1010 (1998). [CrossRef]
13. J.-T. Oh, S.-Y. Lee, and S.-W. Kim, “Scanning projection grating moiré topography,” in Three-Dimensional Image Capture and Applications III, B. D. Corner and J. H. Nurre, eds., Proc. SPIE3958, 46–51 (2000).
15. R. Schwarte, Z. Xu, H.-G. Heinol, J. Olk, and B. Buxbaum, “New optical four-quadrant phase-detector integrated into a photogate array for small and precise 3D-cameras,”in Three-Dimensional Image Capture and Applications, R. N. Ellson and J. H. Nurre, eds., Proc. SPIE, 3023, 119–128 (1997).
16. H. Höfler, V. Jetter, and E. Wagner, “3D-profiling by optical demodulation with an image intensifier,” in Three-Dimensional Image Capture and Applications II, B. D. Corner and J. H. Nurre, eds., Proc. SPIE3640, 21–27 (1999).
17. M. Kawakita, K. Iizuka, T. Aida, H. Kikuchi, H. Fujikake, J. Yonai, and K. Takizawa, “Axi-Vision Camera (real-time distance-mapping camera),” Appl. Opt. 39, 3931–3939 (2000). [CrossRef]
18. G. J. Iddan and G. Yahav, “3D imaging in the studio (and elsewhere…),” in Three-Dimensional Image Capture and Applications IV, B. D. Corner, J. H. Nurre, and R. P. Pargas, eds., Proc. SPIE4298, 48–55 (2001).
19. B. Büttgen, T. Oggier, R. Kaufmann, P. Seitz, and N. Blanc, “Demonstration of a novel drift field pixel structure for the demodulation of modulated light waves with application in three-dimensional image capture,” in Three-Dimensional Image Capture and Applications VI, B. D. Corner, P. Li, and R. P. Pargas, eds., Proc. SPIE5302, 9–20 (2004).
20. M. Kawakita, K. Iizuka, H. Nakamura, I. Mizuno, T. Kurita, T. Aida, Y. Yamanouchi, H. Mitsumine, T. Fukaya, H. Kikuchi, and F. Sato, “High-definition real-time depth-mapping TV camera: HDTV Axi-Vision Camera,” Opt. Express 12, 2781–2794 (2004), http://www.opticsexpress.org/abstract.cfm?URI=OPEX-12-12-2781. [CrossRef] [PubMed]
21. M. Kawakita, K. Iizuka, T. Aida, T. Kurita, and H. Kikuchi, “Real-time three-dimensional video image composition by depth information,” IEICE Electronics Express 1, 237–242 (2004), http://www.jstage.jst.go.jp/article/elex/1/9/1_237/_article. [CrossRef]