We developed a real-time capture and reconstruction system for three-dimensional (3D) live scenes. In previous research, we used integral photography (IP) to capture 3D images and then generated holograms from the IP images to implement a real-time reconstruction system. In this paper, we use a 4K (3,840 × 2,160) camera to capture IP images and 8K (7,680 × 4,320) liquid crystal display (LCD) panels for the reconstruction of holograms. We investigate two methods for enlarging the 4K images that were captured by integral photography to 8K images. One of the methods increases the number of pixels of each elemental image. The other increases the number of elemental images. In addition, we developed a personal computer (PC) cluster system with graphics processing units (GPUs) for the enlargement of IP images and the generation of holograms from the IP images using fast Fourier transform (FFT). We used the Compute Unified Device Architecture (CUDA) as the development environment for the GPUs. The Fast Fourier transform is performed using the CUFFT (CUDA FFT) library. As a result, we developed an integrated system for performing all processing from the capture to the reconstruction of 3D images by using these components and successfully used this system to reconstruct a 3D live scene at 12 frames per second.
©2012 Optical Society of America
Holography is an ideal technology for displaying three-dimensional (3D) images. Information from 3D objects is recorded on a hologram by the effect of light interference and is reconstructed from the hologram by the effect of light diffraction. A computer-generated hologram (CGH) is generated by simulating the propagation of light from 3D objects . In electronic holography, 3D objects are reconstructed by displaying the computer-generated hologram on a spatial light modulator (SLM) such as a liquid crystal display (LCD) panel [2–4]. 3D movies can be constructed by displaying computer-generated holograms in real time [5–8]. Recently, various systems for implementing this have been developed. One is a system for reconstructing 3D objects from computer graphics models in real time [9, 10]. Another is a system for reconstructing 3D objects in real time using a range camera such as the Kinect from Microsoft as a 3D image input device [11–13].
To realize ultra-realistic communications, our research team uses integral photography (IP) for recording 3D images and electronic holography for displaying 3D images. We developed the reconstruction component for electronic holography using 8K (7,680 × 4,320) LCD panels and achieved a diagonal size of 4 cm and a viewing angle of 15 degrees for the reconstructed image . We also developed the capture component for integral photography and implemented a 3D live-scene reconstruction system by generating holograms from those IP images in real time . Most recently, we developed a system using a 4K (3,840 × 2,160) camera for integral photography and 4K LCD panels for electronic holography .
For this paper, we developed a real-time reconstruction system with graphics processing units (GPUs) for 8K electronic holography. We use a 4K IP camera for capturing 3D images and the 3D images are reconstructed by generating 8K holograms from the 4K IP images. As a result, we developed an integrated system for performing all processing from the capture to the reconstruction of 3D images by using these components and successfully used this system to reconstruct a 3D live scene at 12 frames per second. We report the composition and performance of our system below.
2. Generation of holograms from IP images
2.1 Capture of 3D images by integral photography
Integral photography is a technique for capturing and displaying a 3D image on a photographic plate, which was invented by M. G. Lippmann in 1908 . Figure 1 shows the method of capturing a 3D image by integral photography. A lens array is placed between a 3D object and a capturing medium such as a photographic plate. The lens array is made up of many small lenses, which are called elemental lenses. By placing the photographic plate at the focal points of the elemental lenses, object beams are captured in elemental images on the photographic plate near each elemental lens. The size and location of each elemental image are equal to those of each elemental lens. Since object beams propagated from various angles are recorded in the elemental image, a 3D image can be taken under natural light. Normally, to reproduce a 3D image from this captured IP image, we should rotate each elemental image 180-degrees and observe the 3D object with the lens array located at its original position. This is because the image is inverted top to bottom and left to right since the direction used for observing it differs from the direction used when it was captured .
Lens system for capturing IP images by 4K video camera is shown in Fig. 2 . f is a focal length of the lens array and f’ is a focal length of the field lens. The number of elemental lenses is 240H × 135V, the diameter of the elemental lens is 0.8 mm and f is 8.6 mm. The diameter of the field lens is 300 mm and f’ is 600 mm. We constructed a telecentric optical system for capturing parallel light beams from the IP image. The IP image is scaled down to fit a CMOS sensor of the 4K video camera.
2.2 Fast parallel computing method for generating holograms
In this section, we describe our method for generating holograms from IP images. Interference fringe patterns from the object light and reference light are recorded on the hologram. Figure 3 shows a virtual optical system for generating a hologram from the IP image . In Fig. 3, the IP image is placed at a location separated from the lens array by a distance equal to the focal length. Similarly, the hologram is placed on the other side of the lens array. f is the focal length of the lens array constituted by elemental lenses and d is a distance from the lens array to the hologram. D is the diameter of an elemental image and DH is the diameter of an elemental hologram. One of the elemental holograms is generated by simulating the propagation of the object light from one of the elemental images when the reference light is parallel to optical axis. It is evaluated using the following equations:Equations (1) and (3) are Fresnel diffraction integrals. Equation (2) is the phase variation of the object light caused by transmitting the object light through the lens array. If we assume that f is equal to d, then by substituting Eq. (1) and Eq. (2) into Eq. (3), we obtain the following Fourier transform.
In a limited integration range, we obtain the following discrete Fourier transform (DFT) corresponding to Eq. (4):14]. Moreover, the real part of g3(x3, y3) is the light intensity distribution of the elemental hologram when the reference light is assumed to be parallel light.Equation (6) shows that the parameters of the virtual optical system in Fig. 3 are determined by M and N. Since we can determine the values of M and N arbitrarily, fast Fourier transform (FFT) can be performed efficiently by substituting suitable values for M and N. Moreover, the calculation of each elemental hologram is performed in parallel because elemental images correspond one-to-one with elemental holograms.
Optical system for electronic holography with 8K LCD panels is shown in Fig. 4 . This optical system is consist of RGB lasers (R: 632.8 nm, G: 523.8 nm, B: 473 nm), 8K LCD panels to display holograms (pixel pitch: 4.8 μm) and system of many lenses. Reference  contains details of that optical system. Parallel lights are generated by collimator lenses in Fig. 4. Elemental holograms in Fig. 3 are in-line Fresnel holograms. For that reason a conjugated image and direct light interfere with reconstruction of a good 3D image. So we avoid that issue using techniques with combination of a single-sideband method and a half zone plate at LC shutter 2 [20,21]. LC shutter 1 and 2 are only used as spatial filters in this paper.
3. Interpolation and enlargement of IP images
An image captured by integral photography differs from a typical image in that each elemental image of the IP image has different object beam information. Generally, with integral photography, the more elemental images that there are, the higher the resolution of the 3D image that can be captured on the IP image. On the other hand, the larger that the elemental images are, the deeper the 3D image that can be captured on the IP image [22–24]. Therefore, we performed two methods for enlarging 4K IP images to 8K IP images. Figure 5 shows the details of two methods.
The other method for enlarging IP images is increasing the number of elemental images by magnifying the pixels collected from the same location of each elemental image. We assume that we have the same IP image as before in Fig. 5. Next, 16 images (each consisting of 4 pixels A, B, C, and D in the upper right of Fig. 5) are generated by collecting object beams propagated from the same angle. The number of pixels in these images depends on the number of elemental images in the original IP image, which is 2 × 2. The number of pixels in these images is increased by using a technique such as bi-cubic interpolation. As a result, the number of pixels in each image that is generated by collecting object beams propagated from the same angle is 4 × 4. At that time, pixel a is generated between pixel A and pixel B in Fig. 5. Finally, each pixel in the 16 enlarged images is located at optically correct positions. As a result, the elemental image consisting of pixel a is generated between the elemental image consisting of pixel A and B. So, the number of pixels in the enlarged IP image is 16 × 16, the number of pixels of each elemental image is 4 × 4, and the number of elemental images is 4 × 4 in Fig. 5. This method can be thought of as elemental image interpolation.
Additionally, a flow of increasing the number of pixels of elemental images with a live scene is shown in Fig. 6 . First, 256 ( = 16 × 16) sub images are generated from an IP image by collecting object beams propagated from the same angle. The number of pixels of these sub images is 240 × 135 that is the number of elemental images in 4K IP image. Next, the number of pixels of the sub images is 480 × 270 by enlarging each sub image. Finally, each pixel in the enlarged sub images is located at optically correct positions. The number of elemental images is 129,600 ( = 480 × 270) because the number of pixels of these sub images is 480 × 270. As a result it is possible to generate the IP image that has increased the number of elemental images.
4. Real-time capture and reconstruction system with multiple GPUs
Figure 7 shows an overview of a real-time capture and reconstruction system with multiple GPUs. Figure 8 shows photographs of the actual system, and Table 1 shows its specifications. Our system consists of a 4K IP camera component, a special-purpose component for generating 8K holograms from 4K IP images, and a 3D display component for electronic holography. References  and  contain details of the 4K IP camera component and 3D display component, respectively. Details of the special-purpose component with multiple GPUs are described below.
The 4K IP camera outputs 4-channel HD (1,920 × 1,080) video signals. The special-purpose component for generating holograms consists of 4 personal computers (PCs) with a video capture card and a GPU interface card. Each of those PCs captures HD IP images and transmits those IP images from the local memory on the PC to the global memory on an external GPU. Those IP images are then transformed to special images that can be easily processed on the external GPU. The external GPU enlarges the HD IP images to 4K IP images by either method described in Section 3 and generates 4K holograms from the 4K IP images using the transformation described by Eq. (5). Finally, an 8K hologram is synchronized from the 4 external GPUs and displayed.
Some red arrows in Fig. 7 are show data flows on our system. Our system has only a unidirectional data flow because generating each elemental hologram from each elemental image is performed independently. So the parallel processing is performed without extra communications. This is one of the causes of achievement of a real-time capture and reconstruction system.
This section describes the performance of our system. The lens array consists of 240 × 135 elemental lenses. The diameter of one elemental lens is 0.8 mm and the focal length is 8.6 mm. Therefore, we can capture 3D objects with a horizontal size of 192 mm and a vertical size of 108 mm. The number of pixels in the elemental image is 16 × 16 because the number of pixels in the capture device is 4K. Two types of 8K IP images are generated by the methods described in Section 3.
- 8K IP (1): Method of increasing the number of elemental images. The number of elemental images is 480 × 270, the size of each elemental image is 16 × 16 pixels, the focal length of an elemental lens is 0.693 mm for the wavelength of green, and the diameter of each elemental lens is 0.0768 mm.
- 8K IP (2): Method of increasing the size of each elemental image. The number of elemental images is 240 × 135, the size of each elemental image is 32 × 32 pixels, the focal length of an elemental lens is 1.386 mm for the wavelength of green, and the diameter of each elemental lens is 0.1536 mm.
Table 2 shows the performance of our system for the two situations shown in (1) and (2) above. “Interpolation” in Table 2 represents a frame rate when capturing, enlarging and displaying IP images. At this time, FFT for generating holograms from IP images is not performed by GPUs. So the 8K IP image is displayed on the LCD panels and it is useful for debug. “Interpolation + FFT” in Table 2 represents a frame rate when performing all of the processing with FFT for generating holograms from IP images.
There is a small difference between situations (1) and (2) when the GPUs only execute “Interpolation”. However, there is a larger difference between situations (1) and (2) when the GPUs execute “Interpolation + FFT”. CUFFT is not efficient when performing many small-sized FFT calculations in parallel although it is efficient when performing one large-sized FFT in parallel.
5.2 Reconstructed images and movies
Figure 9 shows photographs of reconstructed images created using method 8K IP (1). 3D objects were placed in a range located approximately 30 cm from the lens array when they were captured by the 4K IP camera. The scale of the reconstructed objects is about one sixth as large as that of the real objects due to a convex lens located between the lens array and 4K camera in Fig. 8(a). The camera is focused at approximately 4 mm away from the hologram in Fig. 9(a) and at approximately 50 mm away from hologram in Fig. 9(b). Objects that are located at different distances from the camera focus are blurred. From this, it is apparent that 3D objects are reconstructed correctly. The same holds for method 8K IP (2).
Figure 10 shows movies of 3D objects reconstructed by our system using the two methods described in Section 3. In our system, the 4K IP image is divided into 4 HD IP images as shown in Fig. 11 . There were incomplete elemental images along the boundaries of the 4 IP images because of skew. The data gaps in the captured images are filled by referring to nearby pixels in a manner that does not increase the computational load very much. As a result, we succeeded in reconstructing a 3D image with little effect at the boundary of the 4K IP image.
Figure 10(a) and 10(b) are movies of reconstructed images created using method 8K IP (1) and 8K IP (2). These movies are captured with changing a camera focus from the cubic to the background '3D' letters. There is little difference between the reconstructed images created with methods 8K IP (1) and 8K IP (2). Therefore, the method 8K IP (2) is more suitable for our system because the frame rate of reconstruction with method 8K IP (2) is higher than with method 8K IP (1).
Figure 10(c) is a combined movie. The main video is reconstructed images captured by a digital camera. And the sub video on the bottom-left corner is real objects captured by another digital camera. We succeeded in capturing and reconstructing 3D objects in a real time and a transmission delay time of our system is about one second.
6. Conclusions and future work
We developed a real-time capture and reconstruction system that consists of a 4K IP camera component, a special-purpose component for generating 8K holograms from 4K IP images, and a 3D display component for electronic holography. We used two methods for enlarging the 4K images that were captured by integral photography to 8K images. Finally, we successfully used this system to reconstruct a 3D live scene at 12 frames per second.
One of our future topics of research will be to increase the performance of the special-purpose component for generating 8K holograms from 4K IP images. The reconstruction component for electronic holography with 8K (7,680 × 4,320) LCD panels can reconstruct 3D images with a maximum viewing angle of 15 degrees at 20 fps . Since this is a time-multiplexed system for expanding the viewing-zone angle of reconstructed images, the calculation costs increase for expanding the viewing-zone angle in our system. Therefore, it is necessary to increase the performance of our system.
Furthermore, in this paper, we used two methods for enlarging the 4K images that were captured by integral photography to 8K images. However, there was little difference between the reconstructed images created with methods 8K IP (1) and 8K IP (2) in Fig. 8. Since our goal in this paper is the development of a real-time capture and reconstruction system, we do not discuss the difference between the reconstructed images obtained by these two methods. However, the difference in the reconstructed image quality obtained by using these two methods becomes more significant when we generate IP images that are enlarged 3 or more times from the 4K IP images. We will carry out experiments regarding this at some point in the future.
The authors sincerely thank Mr. Hisayuki Sasaki of NICT, Tokyo, Japan. He gives insightful comments and suggestions.
References and links
1. G. Tricoles, “Computer generated holograms: an historical review,” Appl. Opt. 26(20), 4351–4357 (1987), http://www.opticsinfobase.org/ao/abstract.cfm?URI=ao-26-20-4351. [CrossRef] [PubMed]
2. P. St-Hilaire, S. A. Benton, M. Lucente, M. L. Jepsen, J. Kollin, H. Yoshikawa, and J. Underkoffler, “Electronic display system for computational holography,” Proc. SPIE 1212–20, 174–182 (1990). [CrossRef]
3. N. Hashimoto, S. Morokawa, and K. Kitamura, “Real-time holography using the high-resolution LCTV-SLM,” Proc. SPIE 1461, 291–302 (1991). [CrossRef]
4. K. Sato, K. Higuchi, and H. Katsuma, “Holographic television by liquid crystal devices,” Proc. SPIE 1667, 19–31 (1992). [CrossRef]
5. M. Lucente and T. A. Galyean, “Rendering interactive holographic images,” Proc. ACM SIGGRAPH 95, 387– 394 (1995).
6. M. Lucente, “Holographic bandwidth compression using spatial subsampling,” Opt. Eng. 35(6), 1529–1537 (1996). [CrossRef]
7. M. Lucente, “Computational holographic bandwidth compression,” IBM Syst. J. 35(3.4), 349–365 (1996). [CrossRef]
8. T. Okada, S. Iwata, O. Nishikawa, K. Matsumoto, H. Yoshikawa, K. Sato, and T. Honda, “The fast computation of holograms for the interactive holographic 3D display system,” Proc. SPIE 2577, 33–40 (1995). [CrossRef]
9. A. Shiraki, N. Takada, M. Niwa, Y. Ichihashi, T. Shimobaba, N. Masuda, and T. Ito, “Simplified electroholographic color reconstruction system using graphics processing unit and liquid crystal display projector,” Opt. Express 17(18), 16038–16045 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-17-18-16038. [CrossRef] [PubMed]
10. Y. Ichihashi, N. Masuda, M. Tsuge, H. Nakayama, A. Shiraki, T. Shimobaba, and T. Ito, “One-unit system to reconstruct a 3-D movie at a video-rate via electroholography,” Opt. Express 17(22), 19691–19697 (2009), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-17-22-19691. [CrossRef] [PubMed]
11. J. Barabas, S. Jolly, D. E. Smalley, and V. M. Bove Jr., “Diffraction specific coherent panoramagrams of real scenes,” Proc. SPIE 7957, 795702, 795702-7 (2011). [CrossRef]
12. J. Barabas, S. Jolly, D. E. Smalley, and V. Michael Bove Jr., “Depth perception and user interface in digital holographic television,” Proc. SPIE 8281, 828109, 828109-6 (2012). [CrossRef]
13. Y. Takaki and J. Nakamura, “Development of a Holographic Display Module Using a 4k2k-SLM Based on the Resolution Redistribution Technique,” in Digital Holography and Three-Dimensional Imaging, OSA Technical Digest (Optical Society of America, 2012), paper DM2C.5 http://www.opticsinfobase.org/abstract.cfm?URI=DH-2012-DM2C.5.
14. T. Senoh, T. Mishina, K. Yamamoto, R. Oi, and T. Kurita, “Viewing-zone-angle-expanded color electronic holography system using ultra-high-definition liquid crystal displays with undesirable light elimination,” J. Display Technol. 7(7), 382–390 (2011), http://www.opticsinfobase.org/jdt/abstract.cfm?URI=jdt-7-7-382. [CrossRef]
15. T. Mishina, R. Oi, J. Arai, F. Okano, and M. Okui, “Three-dimensional image reconstruction of real objects with electronic holography using 4K2K liquid crystal panels,” in Proceedings of the 14th International Display Workshops (IDW’07), 3D3–4L, 2253–2254 (2007).
16. K. Yamamoto, T. Mishina, R. Oi, T. Senoh, and T. Kurita, “Real-time color holography system for live scene using 4K2K video system,” Proc. SPIE 7619, 761906, 761906-10 (2010). [CrossRef]
17. M. G. Lippmann, “Epreuves reversible donnant la sensation du relief,” J. Phys. 7, 821–825 (1908).
18. J. Arai, F. Okano, H. Hoshino, and I. Yuyama, “Gradient-index lens-array method based on real-time integral photography for three-dimensional images,” Appl. Opt. 37(11), 2034–2045 (1998), http://www.opticsinfobase.org/ao/abstract.cfm?URI=ao-37-11-2034. [CrossRef] [PubMed]
19. R. V. Pole, “3D Imagery and Holograms of Objects Illuminated in White Light,” Appl. Phys. Lett. 10(1), 20–22 (1967). [CrossRef]
20. H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A 15(8), 2059–2065 (1998), http://www.opticsinfobase.org/josaa/abstract.cfm?URI=josaa-15-8-2059. [CrossRef]
21. T. Naemura, T. Yoshida, and H. Harashima, “3-D computer graphics based on integral photography,” Opt. Express 8(4), 255–262 (2001), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-8-4-255. [CrossRef] [PubMed]
22. T. Georgiev, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-Angular Resolution Tradeoff in Integral Photography,” Proceedings of Eurographics Symposium on Rendering, 263–272 (2006).
23. O. Bryngdahl and A. Lohmann, “Single-Sideband Holography,” J. Opt. Soc. Am. 58(5), 620–624 (1968), http://www.opticsinfobase.org/josa/abstract.cfm?URI=josa-58-5-620. [CrossRef]
24. T. Mishina, F. Okano, and I. Yuyama, “Time-Alternating Method Based on Single-Sideband Holography with Half-Zone-Plate Processing for the Enlargement of Viewing Zones,” Appl. Opt. 38(17), 3703–3713 (1999), http://www.opticsinfobase.org/ao/abstract.cfm?URI=ao-38-17-3703. [CrossRef] [PubMed]