## Abstract

We have applied the graphics processing unit (GPU) to computer generated holograms (CGH) to overcome the high computational cost of CGH and have compared the speed of a GPU implementation to a standard CPU implementation. The calculation speed of a GPU (GeForce 6600, nVIDIA) was found to be about 47 times faster than that of a personal computer with a Pentium 4 processor. Our system can realize real-time reconstruction of a 64-point 3-D object at video rate using a liquid-crystal display of resolution 800×600.

© 2006 Optical Society of America

## 1. Introduction

Electroholography systems using computer-generated holograms (CGH) are considered to
have the potential to realize three-dimensional (3-D) television, as holography is the
only technology that can directly record and reconstruct a 3-D image[1, 2].
However CGH requires high-performance computational power for real-time reconstruction.
The calculation cost of CGH is proportional to
*M*×*N*, where *M* is the number
of points of the hologram (the display resolution) and *N* is the number
of points of the 3-D object.

Researchers have developed fast algorithms that can calculate CGH more than 10-times faster than a direct calculation algorithm [3, 4, 5, 6]. However, at present even fast algorithms cannot reconstruct electroholography at video rate. In another approach, the hardware for electro-holography has been studied. The research group of MIT Media Lab has developed a special-purpose computational board to implement a holographic video system. It uses an array of regularly spaced holographic elements as the unit of a fringe pattern of CGH [7, 8, 9, 10]. They recorded a calculation speed 50-times faster than workstation of the time. Since 1992, our research group has also been studying a special-purpose hardware system for holography, named HORN (HOlographic ReconstructioN) [11].

Currently, graphics hardware architectures have programmatically reconfigured graphics pipelines. The graphics processing unit (GPU) is a full-function streaming processor with high floating-point performance. Blocks that use these pipelines are called “shaders”. We have formulated a program for GPU using the High Level Shader Language (HLSL) and graphics API (DirectX or OpenGL), called the “Shader Program”. GPU implementations have traditionally used general numerical simulation[12, 13, 14, 15]. In the study of CGH, the research group of MIT Media Lab used GPU as the computational subsystem for the display to generate a video signal [16]. In this paper, furthermore, we use “shaders” and “shader program” for CGH calculation.

This paper is structured as follows. In section 2, we present the equation for CGH. In section 3, we demonstrate how to apply GPU to CGH. In section 4, we detail the performance of GPU. In the last section, we discuss our results and further studies.

## 2. Computer generated hologram

In this study, we adopt a simple algorithm of an in-line hologram for CGH to allow us to simply estimate the high-speed ratio. In this method, a plane-wave reference light is incident perpendicularly on a hologram and the intensity of the hologram point is calculated by a simple arithmetic operation, as follows[3].

For a 3-D object comprising *N* points, the intensity of the hologram at
(*x _{α}*,

*y*),

_{α}*I*(

*x*,

_{α}*y*), is given by

_{α}where the indices *α* and *j* indicate the
hologram and the object respectively. *A _{j}* is the intensity of the object point and λ is the wavelength of the
reference light. Furthermore, since

*x*,

*y*≪

*z*in the system, eq. (2) can be approximated by the following expression using the Fresnel approximation,

The first term in the right side can be omitted, giving:

## 3. Implementation

Computer graphics require many floating point calculations. A CPU is not able to perform such a large number of calculations in real-time processing. Therefore, GPU, a processor for graphics, has been developed. GPU has a vertex shader, rasterizer and pixel shader(Fig. 1). A vertex shader transforms geometry, a rasterizer rasterizes geometry and a pixel shader draws pixels. The vertex shader and pixel shader are programmable. The shader includes 4-dimensional vector processors (32-bit floating point arithmetic units). For example, there are three vertex shaders and eight pixel shaders in nVIDIA’s Geforce 6600.

We use Microsoft’s High Level Shader Language (HLSL) as the shader program, and Direct X 9.0c as the graphics API. In the program we have developed, the position of a 3-D object and other parameters can be stored in the register of a GPU. The intensity of the hologram is stored in the graphics buffer and is output by a screen of a personal computer. In this way we can calculate CGH in a pixel shader.

Below we show a part of a program we have developed. This program calculates the intensity of a hologram, applying (eq. (1) and eq. (4)). In this program, “float2” is a 2-dimensional float vector type and “float4” is a 4-dimensional float vector type. The function “PS” returns the color of the pixel. The “VS OUTPUT” is the output of vertex shader and has “Tex.x” and “Tex.y” ,which are coordinates of the pixel(Its range is from 0 to 1.). “WIDTH” and “HEIGHT” are width and height of the hologram. “k” is a constant(= 1/(2λ)). “black” and “white” are constants for the color of pixel. The variable “a” is the position of the hologram and “model[]” is the position of the points of the 3D object. ”DATA” is the number of the points in the 3D object. “model[]” and “DATA” are saved in the registers of the GPU. The function “dot(a,b)” calculates the scalar product of “a” and “b” and is defined by the library of HLSL.

The program which we devloped is loaded in GPU. The pixel shader calculates according to this program.

## 4. Performance

We use a nVIDIA Geforce 6600 as a GPU. Table 1 shows the spec sheet of the Geforce 6600. It has three vertex shaders and eight pixel shaders. We compared the calculation time of the GPU with a CPU. The specifications of the personal computer used are as follows: Intel Pentium 4 3.2-GHz CPU, 2.0 GB of memory, Linux operating system (kernel 2.4.26) and Intel C++ compiler Ver. 8.1. Table 2 shows the calculation time for generating a hologram with a 800 × 600 grid of a 3-D object (star) consisting of 70 points. The calculation time of the GPU is 67 msec, while the calculation time of the CPU is 2980 msec. Hence, the calculation speed of the GPU is about 47 times faster than that of the CPU.

We also compared hologram images generated by the GPU and the CPU (Fig. 2). The hologram image produced by the GPU agrees well with that of the CPU.

Secondly, we compared reconstructed images from a hologram generated by the GPU and the CPU (Fig. 3). The reconstructed image from the hologram produced by the GPU agrees well with that of the CPU.

The system for the reconstruction of the hologram is described in detail in Ref [11].

Finally, we show an example of real-time electroholography by GPU in Fig. 4. The object (a 64-point circle) is moved by
operation from a PC. The GPU calculates the CGH in 33 *μsec*,
which is of video frame rate.

## 5. Conclusion and discussion

We have applied GPU to CGH. The hologram images generated by a GPU agree well with those generated using a CPU. The calculation speed of the GPU (for a 70-point 3D object and a hologram image size of 800 × 600) is about 47 times faster than that of a CPU. However, in our method, the number of points in the 3D object is limited to 100 using a Geforce 6600 because of the limited number of registers in the GPU. However, this limitation is expected to be addressed. We can create a hologram from several 3D objects by saving the results of calculations in VRAM temporarily. However, reading data from VRAM takes time and therefore it is problematic to calculate CGH of multiple objects using a GPU. A special purpose computer system would be useful for generating CGH from multiple objects [11].

The scalable line interconnect (SLI) has been developed by nVIDIA. Using SLI, several graphics cards can be used in one PC and a calculation of CG can be divided amongst the cards so that the calculation load of each card is the same. It is expected that SLI systems will improve the efficiency of CGH calculations.

In the near future, we plan to perform CGH using several GPUs simultaneously. To confirm the efficiency of such a parallel system, we are developing a parallel system consisting of two graphics cards using SLI.

## Acknowledgments

We would like to thank Dr. T. Shimobaba for his useful advice. This research was partly supported by a Grant-in-Aid for Scientific Research (C) (17560031).

## References and links

**1. **P. S. Hilaire, S. A. Benton, M. Lucente, M. L. Jesen, J. Kollin, H. Yoshikawa, and J. Underkoffler,“Electonic display system for computational
holography,” Proc. SPIE **1212–20**, 174–182
(1990). [CrossRef]

**2. **G. Tricoles, “Computer generated holograms: an historical
review,” Appl. Opt. **26**, 4351–4360
(1987). [CrossRef]

**3. **M. Lucente, “Interactive Computation of Holograms Using a
Look-Up Table,” J. Electron. Imaging **2**, 28–34
(1993). [CrossRef]

**4. **H. Yoshikawa, S. Iwase, and T. Oneda, “Fast computation of Fresnel holograms
employing difference,” Proc. SPIE **3956**, 48–55
(2000). [CrossRef]

**5. **K. Matsushima and M. Takai, “Fast computation of Fresnel holograms
employing difference,” Appl. Opt. **39**, 6587–6594
(2000). [CrossRef]

**6. **T. Shimobaba and T. Ito, “An efficient computational method suitable
for hardware of computer-generated hologram with phase computation by
addition,” Comp. Phys. Commun. **138**, 44–52
(2001). [CrossRef]

**7. **J. A. Watlington, M. Lucente, C. J. Sparrell, V. M. Bove Jr., and I. Tamitani, “A hardware architecture for rapid generation
of electro-holographic fringe patterns,” Proc.
SPIE **2406–23**, 172–183
(1995).

**8. **M. Lucente and T. A. Galyean, “Rendering interactive holographic
images,” Proc. ACM SIGGRAPH **95**, 387–394
(1995).

**9. **M. Lucente “Diffraction-Specific Fringe Computation for
Electro-Holography,” Ph. D. Thesis, Dept.
of Electrical Engineering and Computer Science, Massachusetts Institute of
Technology, (1994)

**10. **M. Lucente. “Computational holographic bandwidth
compression,” IBM Systems Journal ,
**35**, 349–365,
(1996) [CrossRef]

**11. **T. Ito, N. Masuda, K. Yoshimura, A. Shiraki, T. Shimobaba, and T. Sugie, “A special-purpose computer for
electroholography HORN-5 to realize a real-time
reconstruction,” Opt. Express ,
**13**,
1923–1932(2005),
http://www.opticsexpress.org/abstract.cfm?URI=OPEX-13-6-1923 [CrossRef]

**12. **J. Bolz, I. Farmer, E. Grinspun, and P. Schoder, “Sparse Matrix Solvers on the GPU:Conjugate
Gradients and Multigrid,” SIGGRAPH 03
Proceedings (2003).

**13. **C. Thompson, S. Hahn, and M. Oskin, “Using Modern Graphics Architectures for
General-Purpose Computing : A Framework and Analysis,”
Proceedings of 35th International Symposium on Microarchitecture
(MICRO-35),
306–320(2002).

**14. **J. Krüger and R. Westermann, “Linear Algebra Operators for GPU
Implementation of Numerical Algorithms,” SIGGRAPH
03 Proceedings (2003).

**15. ** nVIDIA corporation,
“GPU Gems,” Addison
Wesley (2004).

**16. **V. M. Bove Jr., W. J. Plesniak, T. Quentmeyer, and J. Barabas “Real-Time Holographic Video Images with
Commodity PC Hardware,” Proc. SPIE Stereoscopic
Displays and Applications , **5664A**
(2005).