## Abstract

We report the generation of a real-time large computer generated hologram (CGH) using the wavefront recording plane (WRP) method with the aid of a graphics processing unit (GPU). The WRP method consists of two steps: the first step calculates a complex amplitude on a WRP that is placed between a 3D object and a CGH, from a three-dimensional (3D) object. The second step obtains a CGH by calculating diffraction from the WRP to the CGH. The disadvantages of the previous WRP method include the inability to record a large three-dimensional object that exceeds the size of the CGH, and the difficulty in implementing to all the steps on a GPU. We improved the WRP method using Shifted-Fresnel diffraction to solve the former problem, and all the steps could be implemented on a GPU. We show optical reconstructions from a 1,980 × 1,080 phase only CGH generated by about 3 × 10^{4} object points over 90 frames per second. In other words, the improved method obtained a large CGH with about 6 mega pixels (1,980 × 1,080 × 3) from the object points at the video rate.

© 2012 Optical Society of America

## 1. Introduction

Computer-generated-hologram (CGH) based three-dimensional (3D) displays have attractive features, such as no requirement for special glass, and fully satisfying 3D perception (congestion, binocular, motion parallax and so forth) because of their holographic nature [1, 2]. However, the computational time required to generate a CGH prevents the realization of a practical CGH-based 3D display [3]. To solve this problem, accelerating methods for CGH have been proposed toward real-time calculation. Yoshikawa and Yamaguchi proposed an image CGH method that performs real-time 3D color reconstruction because the method of placing a 3D object close to a CGH calculates a small region on the CGH instead of the entire region on the CGH [4,5]. Although the image CGH method was successful in achieving significant real-time reconstruction, the method could not reconstruct a 3D object in Fresnel region.

We proposed a wavefront-recording plane (WRP) method [6,7] that extended the image CGH method to the Fresnel region. The WRP method consists of two steps: the first step calculates a complex amplitude on a WRP, which is a virtual plane placed between a 3D object and a CGH, from a three-dimensional (3D) object. The second step obtains a CGH by calculating diffraction from the WRP to the CGH. Tsang et al. improved the calculation time of the WRP method [8]. The disadvantages of the previous WRP methods include the inability to record a large 3D object exceeding the CGH size, and the difficulty in implementing the first step on a GPU because of random memory access to the GPU memory.

In this paper, we report our work in which we improved the WRP method using Shifted-Fresnel diffraction [9] to solve the former problem, and all the steps could be implemented on a GPU. We show optical reconstructions from a 1,980 × 1,080 phase only CGH generated by about 3 × 10^{4} object points, whose size exceeds the CGH, at over 90 frames per second. In other words, the improved method could obtain a large CGH with about 6 mega pixels (1,980 × 1,080 × 3) from the object points at the video rate.

In Section 2, we explain the WRP method for large 3D objects and the implementation of all the steps of the WRP method on the GPU. In Section 3, we present the results of the optical experiments. Section 4 concludes this work.

## 2. WRP method for large 3D object and implementation of all steps of the WRP method on the GPU

The WRP method reduces the computational complexity of CGH by two-step calculation as shown in Fig. 1(a). In this section, we discribe the improvements for the following problems in previous WRP methods:

- A large 3D object exceeding the CGH size cannot be recorded on a CGH.
- It is difficult to implement the first step of the WRP method on a GPU.

#### 2.1. Wave-recording plane method for large 3D object

First, we describe the improvement of problem 1. In the first step of the WRP method, we record the complex amplitude *u _{w}*(

*x*,

_{w}*y*) on a WRP from a 3D object, which consists of

_{w}*N*object points and is expressed by the

*j*-th coordinates (

*x*,

_{j}*y*,

_{j}*z*) and intensity

_{j}*A*of the object point, using the following equation:

_{j}*d*=

_{j}*z*–

_{j}*z*is the perpendicular distance between the WRP and a 3D object point, and $i=\sqrt{-1}$. When placing the WRP close to the 3D object, the object light traverses small regions on the WRP. Therefore, the computational amount of Eq. (1) can be reduced, compared with the case in which the WRP is not introduced.

The second step calculates the complex amplitude *u*(*x,y*) on the CGH using the diffraction calculation between the WRP and CGH. Previous WRP methods used Fresnel diffraction:

^{−1}[·] are the Fourier and inverse Fourier operators,

*z*is the perpendicular distance between the WRP and the CGH, and the impulse response $h\left(x,y\right)=\text{exp}\left(i\frac{\pi}{\lambda z}\left({x}^{2}+{y}^{2}\right)\right)$.

Lastly, we calculate an amplitude-type CGH or a phase only type CGH by using light intensity *I*(*x,y*) = |*u*(*x,y*) + *R*(*x,y*)|^{2}, where, *R*(*x,y*) is a reference light, or the argument of *u*(*x,y*), namely *θ*(*x,y*) = *arg*{*u*(*x,y*)}, where, the operator *arg*{·} adopts the argument of *u*(*x,y*).

Due to calculation of the Fresnel diffraction of Eq. (3) in the same sampling pitches and pixel numbers on the WRP and the CGH, the size of a 3D object is limited by the size of the WRP or CGH.

There are two methods of solving this problem. The first solution is to increase the pixel numbers of the WRP and CGH. However, this will increase the calculation time and the required memory of the second step of the WRP method due to calculation of the Fresnel diffraction with a larger pixel number. The second solution is to change different sampling pitches on the WRP and CGH, but not change the pixel numbers on the WRP and CGH.

We adopt the second solution. Of course, when determining such a calculation using the direct integral of Eq. (2), the calculation time is fatally increased since it does not use fast Fourier transform (FFT). To avoid this problem, we used the Shifted Fresnel diffraction [9] proposed by Muffoletto et al. It calculates off-axis and scale operations, which indicates that it is capable of setting the different sampling pitches, by scaled Fourier transform. Restrepo and Garcia also proposed the same method as the Shifted Fresnel diffraction using Bluestein transform [10]. As indicated above, some methods already exist for devising the Shifted Fresnel diffraction. Defining the ratio *m* = *p*_{1}/*p*_{2} where *p*_{1} and *p*_{2} are the sampling pitches on the WRP and CGH respectively (as shown in Fig.1(a)), we devise the Shifted Fresnel diffraction using the simple relation
${\left(x-m{x}_{w}\right)}^{2}=m{\left(x-{x}_{w}\right)}^{2}+\left({m}^{2}-m\right){x}_{w}^{2}+\left(1-m\right){x}^{2}$, and the Shifted Fresnel diffraction is expressed by:

*m*times the size of the reconstructed image without increasing the required memory and calculation time as compared with Eq. (3).

#### 2.2. GPU implementation of all of the steps in the WRP method

Figure 2 shows two GPU implementations of the first step of the WRP method. We use the compute unified device architecture (CUDA) as the programming environment for NVIDIA GPUs. The following method is also available on AMD GPUs. In the two implementations, object points and the complex amplitude *u _{w}*(

*x*,

_{w}*y*) are stored in the global memory on a GPU board.

_{w}In Fig. 2(a), each GPU thread (GT) is assigned to an object point one-on-one. Each GT reads an object point from the global memory, and each GT calculates multiple complex amplitudes on the WRP from one object point. Therefore, when each GPU writes complex amplitudes, it becomes random memory access. It fatally decreases the memory access speed because we do not use coalescing access to the global memory. As a result, the calculation time is increased. Moreover, the memory access may conflict when GTs simultaneously write complex amplitudes to the same memory address (dashed red arrows in Fig. 2(a)). Although we need to use atomic operation in CUDA to avoid the conflict, it also decreases the memory access speed. As indicated above, we do not recommend the implementation of Fig. 2(a).

Instead of the implementation of Fig. 2(a), we use the implementation of Fig. 2(b). In the implementation, each GT is assigned to a sampling point on the WRP one-on-one. Each GT reads the object points corresponding to the GT from the global memory, and each GT calculates one complex amplitude corresponding to the GT using Eq. (1) serially. Note that a complex amplitude calculated by a GT is stored in GT’s register memory, which is the fastest memory on a GPU. Therefore, until the calculation of Eq. (1) is completed, none of the GTs access the global memory to write a complex amplitude unlike the implementation of Fig. 2(a). After the calculation of Eq. (1) is completed, each GT writes complex amplitudes in parallel, so that the memory access becomes the coalescing access.

Actually, in CUDA, the WRP is divided by “blocks”, which is CUDA terminology and consists of some GTs as blue box in Fig. 2(b). We used the block size of 16 × 16 GTs. Each block has a shared memory, which is a fast memory, so that the share memory is used for storing object points corresponding to the block from the global memory. Before starting the GPU calculation, we pre-compute the allocation of object points to the corresponding block, whose computational complexity is only *O*(*N*), on a CPU, according to small regions on the WRP which each object point light traverses (e.g., Eq.(3) in [7]).

For the second step of the WRP method, we used CUFFT library for the FFT operations in Eq. (4). This equation involves three FFT operations; however, we calculate [*h _{n}*(

*x*,

*y*)] on a CPU only once before the GPU calculation, and then it is maintained on the global memory.

## 3. Results

In this section, we present the optical results using the improved WRP method described in Section 2. We used Microsoft Windows XP Professional Service Pack 3 as the operating system, a computer with an Intel Core i7 920 processor (2.8 GHz), memory of 3G bytes, and Microsoft Visual C++ 2008 as the compiler. In addition, we used one Nvidia GeForce GTX580 GPU board and CUDA 3.2 for the GPU programming.

The calculation conditions were as follows: 3D objects are about 50×50×50 mm, therefore the depth range of *d _{j}* is about 0 ≤

*d*≤ 50mm. We used

_{j}*z*= 1m,

*λ*= 633nm,

*p*

_{2}= 8

*μ*m, and the CGH pixels of 1,920 × 1,080. We calculated CGHs from two 3D objects: “Tyranno” and “Earth” consisting of 11,646 and 30,492 points, respectively. All of the calculations on the CPU used only one CPU thread. In addition, we used a point light source LED with a wavelength of 633 nm and emitting diameter of 50

*μ*m as the reference light, and a phase-modulated LCD panel HEO-1080 made by Holoeye, with a pixel pitch of 8

*μ*m and 1,920 × 1,080 pixels.

We measured the calculation times of phase only CGHs (Kinoforms) from the two objects, whose sizes exceed the CGHs, and showed the optical reconstructions. In the following calculations, we used the GPU implementation of the WRP method described in Section 2.2.

First, for comparison, we used the previous WRP method using Eq.(1) and Eq.(3). In the calculation, we set *p*_{1} = *p*_{2} = 8*μ*m, namely *m* = 1, and had to expand the original sizes of the CGH and WRP of 1,920 × 1,080 pixels to 6,144 × 3,072 pixels, namely about 50 × 24 mm, in order to record 3D objects larger than the CGH. The middle column of Table 1 shows the calculation times and Fig. 3(a) shows the optical reconstruction movie of “Earth” from the phase only CGH. As shown, the reconstructed image is cut into upper and lower parts because the size of the reconstructed image exceeds that of the CGH and WRP.

In contrast, the improved WRP method using Eq. (1) and Eq. (4) could improve the calculation times and solve the problem of reconstructed image size. In the calculation, we set *m* = 8, namely the WRP size of about 139 × 61mm, and *p*_{2} = 8*μ*m. We did not change the original sizes of the CGH and WRP of 1,920×1,080 unlike the method above. The right column of Table 1 shows the calculation times of the improved WRP method and Fig. 3(b) shows the optical reconstruction movie of “Earth” from the phase only CGH. Unlike Fig. 3(a), the reconstructed image could be wholly reconstructed without increasing the calculation time. Figure 3(c) shows the optical reconstruction movie of “Tyranno” from the phase only CGH. They could be calculated over 90 frames per second. In other words, the improved WRP method could obtain a large CGH with about 6 mega pixels (1,980×1,080×3) from the object points at the video rate. It is very important to enlarge the CGH size because the viewing angle of the reconstructed image is proportional to the CGH size, for example, see Yaras, Kang and Onural’s seminal work [11].

Comparing numerical reconstructions from a CGH generated by the previous WRP method with that from a CGH generated by the improved WRP method, blurring in the order of micrometers was evident in the improved method; however, blurring is almost indiscernible by the naked eye.

## 4. Conclusion

We propose an improved WRP method that calculates a CGH from a 3D object larger than the CGH using the Shifted Fresnel diffraction. We overcame the difficulty in implementing the first step of the WRP method onto a GPU.

## Acknowledgments

This work is supported by the Ministry of Internal Affairs and Communications, Strategic Information and Communications R&D Promotion Programme (SCOPE), Japan Society for the Promotion of Science (JSPS) KAKENHI (Young Scientists (B) 23700103) 2011, and the NAKAJIMA FOUNDATION.

## References and links

**1. **C. Slinger, C. Cameron, and M. Stanley, “Computer-Generated Holography as a Generic Display Technology,” Computer **38**, 46–53 (2005). [CrossRef]

**2. **S. A. Benton and V. M. Bove Jr., *Holographic Imaging* (Wiley-Interscience, 2008). [CrossRef]

**3. **M. Lucente, “Interactive Computation of holograms using a Look-up Table,” J. Electron. Imaging , **2**, 28–34 (1993). [CrossRef]

**4. **T. Yamaguchi, G. Okabe, and H. Yoshikawa, “Real-time image plane full-color and full-parallax holographic video display system,” Opt. Eng. **46**, 125801 (2007). [CrossRef]

**5. **H. Yoshikawa, T. Yamaguchi, and R. Kitayama, “Real-Time Generation of Full color Image Hologram with Compact Distance Look-up Table,” OSA Topical Meeting on Digital Holography and Three-Dimensional Imaging 2009, DWC4 (2009).

**6. **T. Shimobaba, N. Masuda, and T. Ito, “Simple and fast calclulation algorithm for computer-generated hologram with wavefront recording plane,” Opt. Lett. **34**, 3133–3135 (2009). [CrossRef] [PubMed]

**7. **T. Shimobaba, H. Nakayama, N. Masuda, and T. Ito, “Rapid calculation algorithm of Fresnel computer-generated-hologram using look-up table and wavefront-recording plane methods for three-dimensional display,” Opt. Express **18**, 19504–19509 (2010). [CrossRef] [PubMed]

**8. **P. Tsang, W.-K. Cheung, T.-C. Poon, and C. Zhou, “Holographic video at 40 frames per second for 4-million object points,” Opt. Express **19**, 15205–15211 (2011). [CrossRef] [PubMed]

**9. **R. P. Muffoletto, J. M. Tyler, and J. E. Tohline, “Shifted Fresnel diffraction for computational holography,” Opt. Express **15**, 5631–5640 (2007). [CrossRef] [PubMed]

**10. **J. F. Restrepo and J. Garcia-Sucerquia, “Magnified reconstruction of digitally recorded holograms by Fresnel Bluestein transform,” Appl. Opt. **49**, 6430–6435 (2010). [CrossRef] [PubMed]

**11. **F. Yaras, H. Kang, and L. Onural, “Circular holographic video display system,” Opt. Express **19**, 9147–9156 (2011). [CrossRef] [PubMed]