Computer-generated holography at high resolutions is a computationally intensive task. Efficient algorithms are needed to generate holograms at acceptable speeds, especially for real-time and interactive applications such as holographic displays. We propose a novel technique to generate holograms using a sparse basis representation in the short-time Fourier space combined with a wavefront-recording plane placed in the middle of the 3D object. By computing the point spread functions in the transform domain, we update only a small subset of the precomputed largest-magnitude coefficients to significantly accelerate the algorithm over conventional look-up table methods. We implement the algorithm on a GPU, and report a speedup factor of over 30. We show that this transform is superior over wavelet-based approaches, and show quantitative and qualitative improvements over the state-of-the-art WASABI method; we report accuracy gains of 2dB PSNR, as well improved view preservation.
© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Digital holography is a technology that allows for recording and reconstructing both the amplitude and phase of electromagnetic wave fields. It can account for all visual cues, such as continuous parallax, depicts no accommodation-vergence conflict and supports resolutions up to the diffraction limit of the wavelength. Because of these properties, it is often considered as the ultimate display technology.
Acquiring digital holograms with an optical setup for visualization purposes poses several practical limitations: the need for a specialized optical setup, restrictions on camera resolution, pixel pitch and object sizes, contaminations with aberrations and/or speckle noise and the need of a physical object for recording. These problems are not present Computer-Generated Holography (CGH), where holograms are numerically computed by simulating the diffraction of light.
CGH for display systems comes with its own challenges: large space-bandwidth product are needed to get decent viewing angles and hologram dimensions, corresponding to resolutions of at least 108 pixels. Furthermore, unlike in conventional computer graphics, every point in the scene can potentially affect every hologram pixel, resulting into a many-to-many relationship. These factors make CGH a very computationally intensive task.
That is why many techniques have been developed to address this problem. CGH methods can be classified into many categories, which all have a trade-off in terms of realism, calculation time and content types that can be generated. There are polygon methods [1–4], point cloud methods [5–14], RGB+Depth methods , holographic stereograms [16–18] and hybrid methods [19, 20]. We will focus on point cloud methods, and elaborate on how signal sparsity can contribute to drastically improve computation performance.
The notion of sparsity is a well-known concept in signal processing literature. It refers to the property that signals drawn from a particular source can be well-approximated by a few coefficients in a certain basis. This property is very useful for many applications, such as: (1) compression, where only a small subset of the coefficients need to be stored to accurately represent the compressed signal; (2) denoising, since the noise is generally uncorrelated with the signal it will be spread out over many low-amplitude coefficients which can be thresholded away; (3) compressed sensing, where prior information on the sparsity enables one to accurately reconstruct a measured signal at sampling rates well below the Nyquist bound.
In particular, sparsity has been leveraged for CGH. Albeit often not expressed in these terms, many CGH methods leverage the sparsity of the holographic signal in some bases where only a fraction of the coefficients need to be updated; in doing so, they substantially accelerate the generation of holograms. The Wavefront Recording Plane method (WRP)  will express the signal in a backpropagated hologram, effectively representing the signal in a convolved space using e.g. the Angular Spectrum Method (ASM) . There, point-spread functions (PSFs) will only affect a limited amount of pixels in the WRP depending on their distance, rendering them sparse in the spatial domain. This principle can be extended further by using multiple parallel WRPs, and allocating points to their respective nearest WRP . Holographic stereograms approximate PSFs by piecewise continuous wavefronts in the Short-Time Fourier Transform (STFT) domain by updating a single coefficient per block corresponding to pieces of planar wavefronts . More recently, the WASABI method was proposed where PSFs are expressed in the wavelet domain . Here, Shimobaba et. al. pioneered the use of CGH with a sparsifying transform using wavelet shrinkage. Wavelets are a multi-scale representation of a signal, which can successfully sparsify smooth signals where neighbouring pixels are highly correlated. This reduces the amount of needed coefficients to accurately represent PSFs. This method was later combined with the WRP method to further reduce calculation time .
In this work, we improve over existing sparse CGH methods by updating the hologram in the STFT domain. Our novel contributions in this manuscript are the following:
- We study the properties of sparse CGH systems, and analyze the properties and requirements of transforms in the space-frequency domain to optimize quality and calculation times
- We demonstrate the superiority of the STFT over existing transforms by reporting an increase in the objective and subjective quality of the generated hologram
- We validate its applicabililty by implementing the algorithm on a GPU using CUDA, and report a 30-fold speedup over a CUDA implementation of the conventional method.
2. Optimizing sparsity
2.1. The wavefront recording plane method
Holograms can be computed from a point cloud, describing a collection of luminous points. Points with complex amplitudes aj and positions (xj, yj, zj) will create a diffraction pattern on the hologram plane U at e.g. z = 0, given by
Directly evaluating this sum is computationally very costly, since it involves updating every pixel for every point in the point cloud. The goal is therefore to compute the hologram in a transformed space, and to maximize the sparsity of the transformed holographic signal while minimizing computation time. Higher sparsity means fewer needed coefficient updates for CGH and better signal approximation leading to higher quality. Moreover, the used transform needs to be efficiently computable. Otherwise, it would defeat the purpose of accelerating CGH in the first place. As an extreme example, one could use an ASM propagation for every point; the ASM expresses diffraction between parallel planes as a convolution, namely:
Optimizing the sparsity/speed balance will require some analysis of the signal properties of holographic PSFs. First, by backpropagating the hologram closer to the object, we will reduce their spatial footprints, as shown in Fig. 1. However, this choice is still suboptimal: we can further reduce the average support size by placing the WRP in the middle of the point cloud. One problem that occurs is that calculating the PSF at small distances from the WRP using Eq. (1) will result in too coarse approximations, and amplitudes tending to infinity. This can be circumvented by representing PSFs as propagated Dirac functions instead (i.e. using an ASM-propagated single point).
2.2. The space-frequency representation
Furthermore, a particularly useful method to analyze holograms is to study at them in the space-frequency domain  (also referred to as the time-frequency domain, the Wigner space representation, or phase space), where signals are simultaneously depicted in space and (instantaneous) frequency. Diffraction of light can be described as geometric operations in space-frequency domain, such as: shears for planar Fresnel diffraction, rotations describe fractional Fourier transforms modeling diffraction between spherical surfaces, translations can model object tilts, etc. These are very useful for intuitive manipulations of holographic signals.
PSFs are especially simple in this representation: because the instantaneous frequency is well-defined in every point, it can be described by a graph in space-frequency domain. We will restrict our analysis to 1D-signals to simplify the model, as to allow for a visual representation of the corresponding 2D space-frequency domain. In the 1D-case, a PSF is described as a 2D curve η(x) in the space-frequency representation. If we use the Fresnel approximation of a PSF :Fig. 2(a).
PSFs at a non-zero distance from the WRP can be sparsified further by compositing other transforms with the WRP method (with e.g. wavelets in the WASABI method). Many transforms can be viewed in the space-frequency domain as well; this can be done in terms of their “Heisenberg boxes”: every basis function will have a particular footprint in the space-frequency domain. There is a trade-off between space and frequency resolution which is called the Heisenberg uncertainty principle, expressing that there must be a lower bound on the space-frequency support of a basis function. Since reversible transforms need to capture all the signal’s information, they must cover the whole space-frequency domain, which can often be represented as a particular tiling of the space-frequency plane.
In Fig. 2, we represent three representations: spatial, wavelets and STFT. The spatial domain representation will have perfect spatial localization but no localization in frequency; therefore, all pixels in the support will be affected, amounting to the WRP method.
Wavelets are a multi-resolution representation, depicted in Fig. 2(b) with three levels. The high-level sub-bands will have better spatial localization, while the low-level ones have better frequency localization. This property is very useful for natural imagery, because they tend to have 1/f2-shaped autocorrelation behavior , for which wavelets are highly effective. However, holographic PSFs are highly symmetric in the frequency domain, causing the wavelet sparsity to drop substantially compared to natural imagery. Moreover, wavelets transforms are generally real-valued, meaning that they cannot distinguish the positive from the negative frequency components; thus, the wavelet coefficient will respond to signals as if they are present at locations on the opposite side of the frequency axis, depicted by the green Heisenberg boxes in Fig. 2(b). Finally, when sparsifying the wavelet coefficients, the highest subband coefficients will be discarded first after thresholding because of their lower average magnitudes. This will cause an uneven degradations of the views: this effect has been documented in , where wavelet compression of holograms gives rise to a “key-hole” effect, referring to the vanishing of the large viewing angles. This effect is also demonstrated in section 4. That is why we propose to use the block-wise STFT: as seen in Fig. 2(c), PSFs will cover less Heisenberg boxes resulting in better sparsity, it can distinguish positive and negative frequencies and exhibits symmetry in both spatial and frequency domains. Interestingly, the Accurate phase-added stereogram  can be viewed as a special case of our proposed method: they use an analytic approximation of a sparsified signal containing a single coefficient per block, without any WRPs.
2.3. Sparsity evaluation
We can numerically evaluate and compare the sparsity as well by keeping a small subset of the largest coefficient magnitudes of transformed PSFs. In Fig. 3, we plot the Normalized Mean Square Error (NMSE) of a sparsified PSF as a function of its distance to the hologram plane, namely
We observe that wavelets perform better than using no transform, followed by STFT when the point distance surpasses a few mm. As the distance increases, larger STFT block sizes will perform better. This can be explained by the fact that when the slope of the PSF in the space-frequency domain will decrease, the optimum shifts and it becomes preferable to trade increased frequency resolution for decreased spatial resolution. In the limit to infinite distance, the slopes will be horizontal; we then get Fourier holography, so using a single FFT on the whole hologram will become optimal.
In the next section, we will explain how the STFT can be used to improve sparse hologram generation.
For the experiments, we implement two versions of our framework: a baseline method using precomputed basis functions in the spatial domain (the “spatial version”), based on the conventional approach  combined with a WRP, and a sparse version using the STFT (the “STFT version”). Diagrams summarizing the pipelines are shown in Fig. 4 and Fig. 5. To maximize sparsity, we place the WRP in the middle of the point cloud in both pipelines. We compute a lookup table by discretizing the z-values of the points into K equidistant quantization levels. The extent (or support) wk of a PSF can be found using the grating equation, determined by the hologram pixel pitch p, and wavelength λ:
For the spatial version, we directly store the cropped PSFs (according to its width wk) in the look-up table. For the STFT version, we crop the PSFs to the next multiple of B = 16 and transform them with an STFT using B × B blocks. We only store 4 coefficients per block, amounting to approximately s = 1.6% of the total coefficient count. However, the STFT is not translation-invariant for all pixel shifts, just like the discrete wavelet transform; it is only invariant to translations which are a multiple of the block size. This means we need to store B × B translated versions for every transformed PSF.
The STFT LUT thus has three dimensions, totaling to B × B × K entries. Formally, the index of the needed LUT entry for a point (qx, qy, qz) is:
For better memory coalescence, we assume that the highest-amplitude coefficients are grouped together in a b ×b subblock, where b = 2 (this is justified because of the well-defined instantaneous frequencies of PSFs as denoted in the Section 2.2). Thus we only need to store one extra byte per block in the look-up tables to denote the index value within a block. This is illustrated in Fig. 6.
In summary, for every possible translation (or shift) and depth quantization level, we store four 2×2-grouped STFT coefficients and their group position index within the B × B block.
We implemented the algorithm for a GPU using CUDA. The WRP (which has the same resolution as the target hologram) is divided into large 2048 × 2048 tiles, which are computed independently. This is not only to limit GPU global memory usage, but also to avoid evaluating PSFs for points lying far from the target tile. This is done by distributing the point cloud over the tiles according to their (x, y)-position. Note that when points are close to tile edges, they must be allocated to both tiles sharing the edge.
The preloaded look-up tables are stored in global GPU memory. For both methods (spatial and STFT), the additions of the PSF entries happen in shared memory for improved speed. This has as an additional advantage that after all the PSF additions, the inverse STFT can be computed immediately in the shared memory as well, which has negligible computational cost (we report average calculation times of roughly 25 ns per STFT block). All computations are done using complex-valued 32-bit floating-point numbers (amounting to 8 bytes per pixel).
3.1. Complexity analysis
This subsection will summarize the memory and computation complexities of the proposed algorithm. Aside from the memory requirement of the hologram itself, which can be constrained with the use of tiling, the main memory requirements are imposed by the stored look-up tables. Every LUT entry has a width wk (see Eq. (6)), and thus a memory footprint which is proportional to . The three properties that affect the total LUT size are the maximum distance zmax of a point to the WRP, the quantization level count K and the hologram parameters (p, λ).
In Big-O notation, for a conventional WRP the memory footprint of the LUT is:
For the midpoint WRP, zmax is halved, while K remains the same. The memory footprint is thus reduced by a factor of 4. If we omit the entries with negative zk because of the complex conjugate property as mentioned before, then the memory requirement is halved once more.
For the STFT method, we must take the block size B and sparsity s into account as well. Hence, the space complexity is given by
Concerning calculation time, we have to consider 3 steps: LUT additions, the transform stage (if applicable) and the ASM propagation step. For the STFT method, the respective time complexities are:Table 1), meaning that the sparsity s will become one of the main factors determining speed. The db4 wavelet transform of WASABI can be implemented efficiently with linear time complexity as well by using the lifting scheme and interleaving.
For our experiments, we generate a hologram of the “Venus” point cloud consisting of 105 points. The hologram resolution is 4096×4096, with a pixel pitch p = 4 µm and wavelength λ = 633 nm. The center of the point cloud was placed at 1.7 cm from the hologram plane, with the WRP placed at the center as explained in the previous sections. The look-up tables were quantized in 100 entries, spanning a distance of 1.6 cm as to fully encompass the Venus head. The LUT took 135 MB in total. The hologram was computed on a machine with an Intel Core i7 6700K CPU, 32GB RAM and a NVIDIA GeForce 1080 GTX GPU. The code was compiled for CUDA 5.0 and ran on the OS Windows 10. The ASM propagation was also computed on the GPU, using the FFT libraries of MATLAB R2017b with a precomputed filter. Renderings are shown in Fig. 7.
We compare three configurations: the spatial method, the STFT method, and a version based on the WASABI method using db4 wavelets. The latter was computed by sparsifying the look-up table entries with the weighted 2-level db4 transform as explained in , with a sparsity of 1.6%, matching the STFT approach (see Fig. 8).
The total calculation time of the spatial method is 3.97 s, while the STFT method only took 0.13 s (a breakdown for every stage is shown in Table 1). This means we achieved a speedup factor of over 30. We did not implement an optimized CUDA version of the WASABI method for comparison, but we expect very similar calculation times to the STFT method since the coefficient count and overall transform structure is akin to the STFT version.
For the quality, we report a Peak Signal-to-Noise Ratio (PSNR) of 29.33dB for the STFT method compared to a PSNR of 27.30dB for the WASABI method, indicating an improvement of about 2dB. The general quality improvement for other b × b subblock sizes for B = 16 is shown in Fig. 9; we can see that the STFT subblock approximation is a better approximation than db4 wavelets, unless the amount of kept coefficients is very high (which would hamper calculation efficiency).
The 2dB improvement can be generalized to other 3D objects: because the CGH is a linear combination of all PSFs, the distortion will be an accumulation of all individual point distortions. Therefore, the quality difference will not be affected by the object’s shape, but only by the (average) point distance, as delineated in Section 2.3.
We also included a visual quality evaluation in Fig. 7. We reconstructed a view from the left and right by cropping a smaller aperture from the leftmost and rightmost side of the hologram. We can see that both views are well-preserved in the STFT method w.r.t. to the spatial method. Unfortunately, the WASABI method loses more information at the large viewing angles, resulting in darkening of the sides of Venus. The suppression of the high frequencies is visible in the Fourier magnitudes in Fig. 7. This can also be seen by investigating the sparsified PSF entries as seen in Fig. 8, showing that the high frequencies containing the large viewing angles are suppressed by the wavelets.
We present a framework for accelerating CGH by computing the hologram in the transform domain. We performed an analysis of the signal properties of holograms, described and implemented a technique for efficiently computing it on GPU and evaluate the performance on a point cloud. We report a speedup with factor of 30 over the conventional approach, and improve with 2dB over the WASABI method, with better preservation of viewing angles.
European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013)/ERC Grant Agreement (n.617779 (INTERFERE)).
The Venus (Head sculpture) model is courtesy of Direct Dimensions Inc (www.dirdim.com).
References and links
2. Y. Pan, Y. Wang, J. Liu, X. Li, and J. Jia, “Fast polygon-based method for calculating computer-generated holograms in three-dimensional display,” Appl. Opt. 52, A290–A299 (2013). [CrossRef] [PubMed]
3. K. Matsushima, M. Nakamura, and S. Nakahara, “Silhouette method for hidden surface removal in computer holography and its acceleration using the switch-back technique,” Opt. Express 22, 24450–24465 (2014). [CrossRef] [PubMed]
4. J.-H. Park, S.-B. Kim, H.-J. Yeom, H.-J. Kim, H. Zhang, B. Li, Y.-M. Ji, S.-H. Kim, and S.-B. Ko, “Continuous shading and its fast update in fully analytic triangular-mesh-based computer generated hologram,” Opt. Express 23, 33893–33901 (2015). [CrossRef]
5. M. E. Lucente, “Interactive computation of holograms using a look-up table,” Journal of Electronic Imaging 2, 28–34 (1993). [CrossRef]
7. Y. Pan, X. Xu, S. Solanki, X. Liang, R. B. A. Tanjung, C. Tan, and T.-C. Chong, “Fast cgh computation using s-lut on gpu,” Opt. Express 17, 18543–18555 (2009). [CrossRef]
9. A. Symeonidou, D. Blinder, A. Munteanu, and P. Schelkens, “Computer-generated holograms by multiple wavefront recording plane method with occlusion culling,” Opt. Express 23, 22149–22161 (2015). [CrossRef] [PubMed]
10. N. Takada, T. Shimobaba, H. Nakayama, A. Shiraki, N. Okada, M. Oikawa, N. Masuda, and T. Ito, “Fast high-resolution computer-generated hologram computation using multiple graphics processing unit cluster system,” Appl. Opt. 51, 7303–7307 (2012). [CrossRef] [PubMed]
11. H. Niwase, N. Takada, H. Araki, H. Nakayama, A. Sugiyama, T. Kakue, T. Shimobaba, and T. Ito, “Real-time spatiotemporal division multiplexing electroholography with a single graphics processing unit utilizing movie features,” Opt. Express 22, 28052–28057 (2014). [CrossRef] [PubMed]
12. H. Niwase, N. Takada, H. Araki, Y. Maeda, M. Fujiwara, H. Nakayama, T. Kakue, T. Shimobaba, and T. Ito, “Real-time electroholography using a multiple-graphics processing unit cluster system with a single spatial light modulator and the infiniband network,” Optical Engineering 55, 55–56 (2016). [CrossRef]
14. D. Arai, T. Shimobaba, T. Nishitsuji, T. Kakue, N. Masuda, and T. Ito, “An accelerated hologram calculation using the wavefront recording plane method and wavelet transform,” Optics Communications 393, 107–112 (2017). [CrossRef]
15. N. Okada, T. Shimobaba, Y. Ichihashi, R. Oi, K. Yamamoto, M. Oikawa, T. Kakue, N. Masuda, and T. Ito, “Band-limited double-step fresnel diffraction and its application to computer-generated holograms,” Opt. Express 21, 9192–9197 (2013). [CrossRef] [PubMed]
17. M. Yamaguchi, H. Hoshino, T. Honda, and N. Ohyama, “Phase-added stereogram: calculation of hologram using computer graphics technique,” (SPIE,1993).
21. J. W. Goodman, Introduction to Fourier Optics (Roberts and Company Publishers, 2004), 3rd ed.
22. K. Wolf, Geometric Optics on Phase Space, Theoretical and Mathematical Physics (SpringerBerlin Heidelberg, 2004).