Special-purpose computer for electroholography in embedded systems

Yota Yamamoto; Nobuyuki Masuda; Ryuji Hirayama; Hirotaka Nakayama; Takashi Kakue; Tomoyoshi Shimobaba; Tomoyoshi Ito

doi:10.1364/OSAC.2.001166

1. Introduction

Recently, commercial stereoscopic imaging systems that employ a head-mounted display (HMD) have become available. Such systems are used in a wide range of fields, such as entertainment, education, and medical applications. However, it has been reported that such systems cannot reproduce all depth cues required for people to perceive a 3D image [1]. With long-term use, this can result in nausea and fatigue, i.e., the vergence-accommodation conflict (VAC)) [1–6].

Currently, binocular disparity is used in most stereoscopic image presentation systems. However, systems that rely on binocular disparity cannot reproduce all depth cues; thus, VAC [1–6] occurs. Holography is a 3D image presentation method that can reproduce all depth cues. Holography was developed by Gabor in 1947 [7] to improve the accuracy of electron microscopes. Most recently, holography technology has been applied to electronic display systems [8]. This is referred to as electroholography. Electroholography is attracting increasing attention as an alternative to the binocular disparity method and is considered an ideal 3D image presentation method. Holography is the only technology that can record and reconstruct a 3D image and all depth cues directly. With the development of 3D display systems that use holography, breakthroughs in 3D display technology are expected.

In electroholography, a computer-generated hologram (CGH) that can store the amplitude and phase distribution of a 3D image must be simulated. With CGHs, calculation complexity increases in proportion to the size of the hologram $\times$ the number of points. In a real-time system, such as a 3D television, the time required to calculate a CGH must be within the time that human can perceive moving images. Unfortunately, the enormous computational complexity and calculation time of electroholography make it difficult to realize [9].

Several systems have employed Field-Programmable Gate Arrays (FPGA) [10–12] or Graphics Processing Unit (GPU) [13–17] to accelerate the calculation of CGHs. However, such systems tend to be large. When electroholography is used in a real-time system, such as an HMD, relative to load balancing, a system that can calculate a CGH is more suitable to reduce the calculation cost for each 3D system; thus, we consider that small CGH calculation systems that can be embedded are required. Previous studies have investigated the use of compact optical systems for electroholography [18,19]; however, few studies have employed small computers to reduce computation time [20,21].

In this paper, we describe the development of a special-purpose computer for electroholography for embedded devices. The remainder of this paper is organized as follows. In Section 2, we describe the overall system design, including the optical system and its mounting method. In Section 3, the proposed special-purpose computer is described and evaluated. A discussion and suggestions for future work are provided in Section 4.

2. System design

2.1 CGH calculation algorithm

In this study, a point cloud was used. A point cloud can represent an object’s surface using a point light source. For CGHs, the point cloud model has a significant computational advantage because it is possible to calculate each point independently and employ parallel computation [22]. Previous studies have demonstrated that computation time can be reduced using multicore GPU-base parallel computation with the point cloud model [14,15].

With the point cloud model, a CGH is calculated as follows:

(1)$$I(x_a,y_a) = \sum_{j=1}^{M} A_j \cos \left[ \frac{\pi}{\lambda} \cdot \frac{(x_a-x_j)^2+(y_a-y_j)^2}{z_{j}} \right].$$

Here, $z_j >> x_j, y_j$, where the coordinates for the $j$-th point light source are $(x_j, y_j, z_j)$. In a hologram $I (x_a, y_a)$, Eq. (1) is expressed by the sum of complex amplitudes from a point cloud represented by $M$ point light sources. Here $A_j$ (fixed at 1) is the amplitude of the point light source, and $\lambda$ is the reference light’s wavelength.

2.2 Optical environments

The proposed system (including the optical system) is shown in Fig. 1. For a holographic display, a reflection-type Spatial Light Modulator (SLM; 1,920 $\times$ 1,080 pixels) is used to display CGHs. Here, the pixel interval is 6.4$\mu$m, the distance of the reproduced image from the hologram is 0.5 m, and the wavelength of the reference light is 532$\mu$m.

Fig. 1. Optical environment. The proposed special-purpose computer is mounted on a ZCU102 evaluation board.

Download Full Size | PDF

2.3 Proposed special-purpose computer

To reduce CGH computation time, we developed a special-purpose computer using a Xilinx Zynq UltraScale+ MPSoC with an ARM CPU and an FPGA mounted on a single chip. If the ARM CPU and FPGA comprise a single system on chip, it is possible to operate a general-purpose OS (e.g., Linux) on the ARM CPU. The OS can control the point cloud data and easily cooperate with various frameworks, such as OpenGL and OpenCV; thus, it is possible to create a highly versatile standalone system. The performance of the ARM CPU is less than that of a CPU in a general-purpose PC; however, the proposed system, which uses an FPGA as an accelerator, can achieve high computational performance in special-purpose calculations.

In the initial research stage, the Xilinx Zynq UltraScale + MPSoC ZCU 102 evaluation kit (ZCU102) was used to implement the proposed system. Here, a ZU9EG-2FFB1156I was used as the mounted FPGA chip. Table 1 shows the logical resources of the ZCU102.

Table 1. Logical resources of ZCU102. Available shows the number of the resource. Block RAM is the on-chip memory. DSP48 is a high-speed arithmetic circuit.

View Table | View all tables in this article

2.4 Implementation

The architecture of the proposed special-purpose computer is shown in Fig. 2. In this system, Ubuntu 16.04.4 LTS 64-bit (Linux kernel 4.14.0) runs on the ARM CPU. The ARM CPU performs various roles, such as management of the point cloud data, FPGA control, and CGH screen output. The ARM CPU is a quadcore Cortex-A53 (1200 MHz) with 4 GB of main memory. Although the ARM CPU has lower processing performance than a general-purpose CPU, its performance is sufficient for our purpose. The FPGA is used to reduce calculation time by employing multiple calculation pipelines specialized for parallel computation of a single pixel of a CGH named “intensity calculator.” There are 810 intensity calculators, with an operating frequency of 250 MHz.

Fig. 2. Block architecture of the proposed special-purpose computer, including an ARM Mali-400 MP2 display controller.

Download Full Size | PDF

ARM’s Advanced eXtensible Interface (AXI) facilitates communication between the ARM CPU and FPGA. In the first CGH calculation step, the ARM CPU, which is mounted on the proposed system, controls the point cloud data and sends the data to the FPGA via the AXI. When CGH calculation is complete, the FPGA sends the data to the CPU via direct memory access using the AXI. After obtaining the CGH data, the CPU sends the CGH to the display controller. To display the CGH, the Display controller is connected via a DisplayPort.

2.5 Light intensity calculator module

The architecture of the LIC in the FPGA is shown in Fig. 3.

Fig. 3. Block architecture of the Light Intensity Calculator (LIC). There are a number of 810 intensity calculators in LIC. It works at 250MHz.

Download Full Size | PDF

The point cloud data stored in memory by the ARM CPU are read at high speed via DMA. The input point cloud data are stored in Block RAM. The point cloud data are stored in a single column in the order $x_j, y_j, z_j$. An output port is connected to the intensity calculator module, which is a single pixel calculation pipeline. The maximum number of point cloud data that can be saved in Block RAM is 12,000.

2.6 Intensity calculator module

The architecture of the LIC pipeline is shown in Fig. 4, where “<number>” under the slashes indicates the word lengths of the data over that line.

Fig. 4. Block architecture of special pipeline.

Download Full Size | PDF

To compute a CGH using the FPGA, we modify Eq. (1) as follows.

(2)$$z_{j-inv} = \frac{\pi}{\lambda z_j}.$$

(3)$$I(x_a,y_a) = \sum_{j=1}^{M}\cos \left[ \{ (x_a - x_j)^2 + (y_a - y_j)^2 \} z_{j-inv} \right].$$

Note that Eq. (2) can be pre-calculated by the CPU. In addition, division operations are computationally expensive on an FPGA; thus, such operations are suitable for calculation by the CPU. Therefore, division operations are pre-calculated by the CPU. It is possible to calculate on the FPGA by only adding or subtracting powers, which can reduce the resource usage rate and speed up the calculation.

To speed up the calculation, intensity calculators can execute Eq. (3) in parallel. For the cosine calculation in Eq. (3), we use a lookup table method and obtain the result by referencing a memory table [10]. The calculation precision required for CGH calculation has been investigated previously [10].

Registers written as “reg” are sandwiched between the respective arithmetic units in Fig. 4. This increase the utilization ratio of the arithmetic unit and increases the throughput of the LIC module. This method referred to as pipelining. Each clock operation is performed relative to the information of each point cloud, and each computational unit is operated efficiently to improve calculation efficiency. In the circuit shown in Fig. 4, six clocks are required to calculate the intensity, one clock is required for Block RAM address specification, and one clock is required to specify the x-y coordinate. Therefore, eight clocks are required to calculate intensity in this module.

3. Evaluation

3.1 Calculation time

Table 2 shows the results of comparing the time required for CGH calculation of $1,920 \times 1,080$ pixels from point cloud data with 6,500 points using the CPU, GPU, and FPGA.

Table 2. Calculation time

View Table | View all tables in this article

Here, the CPU was running CentOS Linux 7.1.1503 (Core) Red Hat with an Intel Xeon CPU E5-2697 v2 2.70 GHz with 64 GB of main memory. The CGH calculation by the CPU was implemented with a single precision floating point (float) using an Intel C compiler 16.0.1.150. Note that we used all cores and executed in parallel.

The GPU environment was implemented using CUDA 8.0 with a Jetson TX1 embedded system. It was also implemented using a single precision floating point (float).

In the time measurement, we used the C “clock_gettime” function to measure from the transfer of the point cloud data to the end of the computation of all holograms. However, in the FPGA, transfer time is included because the data transfer time of the hologram cannot be separated from the total calculation time.

As shown in Table 2, we succeeded in speeding up calculation 28.1 times compared to the CPU and 19.5 times compared to the GPU.

3.2 Performance

The proposed system requires $I + D-1$ steps to calculate a single pixel, where $I$ is the calculation step, $D$ is the pipeline depth, and $H$ [Hz] is the operating frequency. The theoretical formula required to calculate a single pixel in the proposed system is:

(4)$$t(single\;pixel)=\frac{I+D-1}{H}.$$

Since the operating frequency of the pipeline is 250 MHz, the LIC module can calculate $1,920 \times 1,080$ pixels at 810 parallels, which is the number of intensity calculators (Fig. 3), when the point cloud data comprise 6,500 points. The theoretical calculation time is given as follows:

(5)$$ t(all\;pixels)=\frac{1920 \times 1080}{810} \times \frac{8 + 6500-1}{250 \times 10^6 } = 0.0662s.$$

The calculation efficiency (theoretical value /measured value) is 99.8%. These results demonstrate that the proposed special-purpose computer is a highly efficient calculation circuit.

3.3 Resource usage

In the proposed system, speed up is achieved by operating multiple special modules, i.e., the number of intensity calculators is a primary factor of calculation time. The proposed system operates 810 parallel pipelines at 250 MHz.

Table 3 shows the FPGA resource utilization. Here, DSP48 is used for multiplication operations in the computational pipeline. There are three multiplication operations for each light intensity calculation pipeline, i.e., three DSP48s are used. The Block RAM is primarily used to store the point cloud data. In the proposed system, the number of intensity calculators cannot be increased further because the DSP48 devices reached near resource limited.

Table 3. FPGA resource usage

View Table | View all tables in this article

3.4 Reconstructed image

Figure 5 shows an image of the point cloud data (original data) and the reconstructed image obtained using the optical system shown in Fig. 1. The point cloud data image comprises 6,500 points. The time required to display 100 scenes was 6.6554 s, which demonstrates that the proposed system can display at 15.0 frames per second (fps).

Fig. 5. (a) Point cloud image (original data) and (b) reconstructed image (see Visualization 1).

Download Full Size | PDF

4. Discussion and future work

In this study, we have developed a compact holographic computer using a Xilinx Zynq UltraScale+ MPSoC with an ARM CPU and FPGA on a single chip. The proposed system can execute point cloud data with 6,500 points at 15.0 fps, representing speed up of 19.5 times compared to the Jetson embedded GPU system. The proposed system outperforms other embedded system architectures, which demonstrates the superiority of the proposed architecture. In the 6,500 point cloud data, a complicated figure (e.g., Fig. 5), such as an artificial satellite, can be expressed. Thus, we consider that the proposed system’s ability to display simple information on a small-scale device, such as an HMD, is sufficient. In addition, the performance of the proposed system is sufficient for guidance or medical assistance systems to display a simple figure.

In the proposed system, the number of intensity calculators could not be increased because the DSP48 device reached near resource utilization capacity. In this study, Eq. (1) was used as the CGH calculation method; however, other algorithms can reduce the computational load [23]. The algorithm is expected to reduce DSP48 usage, which was used for per intensity calculator module. The proposed system is expected to handle more point cloud data by improving using the algorithm. In the performance of current SLMs, it was thought that it is not possible to obtain sufficient 3D reconstructed images, but in the literature [24,25], it succeeded in obtaining reconstructed images with good image quality. Using these methods, our next system will be expected better demonstration.

Note that the evaluation board (ZCU102) is not small and cannot be mounted in an embedded system (e.g., an HMD). However, by employing a special-purpose board that can be mounted on a single chip, the proposed system will be sufficiently small such that it can be mounted in an HMD. Therefore, in the future, we are planning to develop a special board that can be mounted in an HMD.

References

1. G. Kramida, “Resolving the vergence-accommodation conflict in head-mounted displays,” IEEE Trans. Vis. Comput. Graph. 22(7), 1912–1931 (2016). [CrossRef]

2. S. R. Bharadwaj and T. R. Candy, “Accommodative and vergence responses to conflicting blur and disparity stimuli during development,” J. Vis. 9(11), 4 (2009). [CrossRef]

3. D. Hoffman, A. Girshick, K. Akeley, and M. Banks, “Vergence-accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vis. 8(3), 33 (2008). [CrossRef]

4. M. Lambooij, M. Fortuin, I. Heynderickx, and W. IJsselsteijn, “Visual discomfort and visual fatigue of stereoscopic displays: A review,” J. Imaging Sci. Technol. 53(3), 030201 (2009). [CrossRef]

5. S. Reichelt, R. Häussler, G. Fütterer, and N. Leister, “Depth cues in human visual perception and their realization in 3D displays,” Proc. SPIE 7690, 76900B2010. [CrossRef]

6. T. Bando, A. Iijima, and S. Yano, “Visual fatigue caused by stereoscopic images and the search for the requirement to prevent them: a review,” Displays 33(2), 76–83 (2012). [CrossRef]

7. D. Gabor, “A new microscopic principle,” Nature 161(4098), 777–778 (1948). [CrossRef]

8. P. St-Hilaire, S. A. Benton, M. E. Lucente, M. L. Jepsen, J. Kollin, H. Yoshikawa, and J. S. Underkoffler, “Electronic display system for computational holography,” Proc. SPIE 1212, 12121990. [CrossRef]

9. M. Lucente, “Interactive three-dimensional holographic displays: seeing the future in depth,” SIGGRAPH Comput. Graph. Curr. New, Emerg. Disp. Syst. 31(2), 63–67 (1997). [CrossRef]

10. Y. Ichihashi, H. Nakayama, T. Ito, N. Masuda, T. Shimobaba, A. Shiraki, and T. Sugie, “Horn-6 special-purpose clustered computing system for electroholography,” Opt. Express 17(16), 13895–13903 (2009). [CrossRef]

11. T. Sugie, T. Akamatsu, T. Nishitsuji, R. Hirayama, N. Masuda, H. Nakayama, Y. Ichihashi, A. Shiraki, M. Oikawa, N. Takada, Y. Endo, T. Kakue, T. Shimobaba, and T. Ito, “High-performance parallel computing for next-generation holographic imaging,” Nat. Electron. 1(4), 254–259 (2018). [CrossRef]

12. P. Tsang, J. Liu, T. Poon, and K. Cheung, “Fast generation of hologram sub-lines based on field programmable gate array,” Adv. Imaging p. DWC2 (2009).

13. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express 14(2), 603–608 (2006). [CrossRef]

14. Y. Pan, X. Xu, S. Solanki, X. Liang, R. B. A. Tanjung, C. Tan, and T.-C. Chong, “Fast cgh computation using s-lut on gpu,” Opt. Express 17(21), 18543–18555 (2009). [CrossRef]

15. H. Niwase, N. Takada, H. Araki, H. Nakayama, A. Sugiyama, T. Kakue, T. Shimobaba, and T. Ito, “Real-time spatiotemporal division multiplexing electroholography with a single graphics processing unit utilizing movie features,” Opt. Express 22(23), 28052–28057 (2014). [CrossRef]

16. L. Ahrenberg, P. Benzie, M. Magnor, and J. Watson, “Computer generated holography using parallel commodity graphics hardware,” Opt. Express 14(17), 7636–7641 (2006). [CrossRef]

17. P. W. M. Tsang, A. S. M. Jiao, and T.-C. Poon, “Fast conversion of digital fresnel hologram to phase-only hologram based on localized error diffusion and redistribution,” Opt. Express 22(5), 5060–5066 (2014). [CrossRef]

18. T. Ichikawa, T. Yoneyama, and Y. Sakamoto, “Cgh calculation with the ray tracing method for the fourier transform optical system,” Opt. Express 21(26), 32019–32031 (2013). [CrossRef]

19. E. Murakami, Y. Oguro, and Y. Sakamoto, “Study on compact head-mounted display system using electro-holography for augmented reality,” IEICE Trans. Electron. E100.C(11), 965–971 (2017). [CrossRef]

20. T. Ito and T. Shimobaba, “One-unit system for electroholography by use of a special-purpose computational chip with a high-resolution liquid-crystal display toward a three-dimensional television,” Opt. Express 12(9), 1788–1793 (2004). [CrossRef]

21. T. Shimobaba, A. Shiraki, N. Masuda, and T. Ito, “Electroholographic display unit for three-dimensional display by use of special-purpose computational chip for holography and reflective lcd panel,” Opt. Express 13(11), 4196–4201 (2005). [CrossRef]

22. T. Nishitsuji, T. Shimobaba, T. Kakue, and T. Ito, “Review of fast calculation techniques for computer-generated holograms with the point-light-source-based model,” IEEE Trans. Ind. Inform. 13(5), 2447–2454 (2017). [CrossRef]

23. T. Shimobaba and T. Ito, “An efficient computational method suitable for hardware of computer-generated hologram with phase computation by addition,” Comput. Phys. Commun. 138(1), 44–52 (2001). [CrossRef]

24. Y. Takaki and K. Fujii, “Viewing-zone scanning holographic display using a mems spatial light modulator,” Opt. Express 22(20), 24713–24721 (2014). [CrossRef]

25. A. Maimone, A. Georgiou, and J. S. Kollin, “Holographic near-eye displays for virtual and augmented reality,” ACM Trans. Graph. 36(4), 1–16 (2017). [CrossRef]

Hardware	Calculation time [s]	FLOPS [TFLOPS]	Acceleration ratio
FPGA	0.066	−	28.1
GPU: Jetson TX1	1.294	1.0	1.44
CPU: Intel Xeon	1.860	0.518	1.00

Hardware	Calculation time [s]	FLOPS [TFLOPS]	Acceleration ratio
FPGA	0.066	−	28.1
GPU: Jetson TX1	1.294	1.0	1.44
CPU: Intel Xeon	1.860	0.518	1.00

Special-purpose computer for electroholography in embedded systems

Abstract

1. Introduction

2. System design

2.1 CGH calculation algorithm

2.2 Optical environments

2.3 Proposed special-purpose computer

2.4 Implementation

2.5 Light intensity calculator module

2.6 Intensity calculator module

3. Evaluation

3.1 Calculation time

3.2 Performance

3.3 Resource usage

3.4 Reconstructed image

4. Discussion and future work

References

Supplementary Material (1)

Cited By

Figures (5)

Tables (3)

Equations (5)

OSA Continuum

Resource	Available
Lookup table	274,080
Flip flop	548,160
Block RAM	912
DSP48	2,520

Resource	Available
Lookup table	274,080
Flip flop	548,160
Block RAM	912
DSP48	2,520