Performance analysis of versatile video coding for encoding phase-only hologram videos

Yongrok Kim; Won Shin; Jaewoo Lee; Kwan-Jung Oh; Hyunsuk Ko

doi:10.1364/OE.502254

1. Introduction

Holography has rapidly advanced owing to the development of efficient hologram generation methods and three-dimensional (3D) rendering devices. Its applications include digital holographic microscopy, augmented reality displays, and entertainment [1]. A hologram is a physical medium that records interference patterns that can reproduce 3D light field waves of real objects through diffraction. Unlike a photograph which records only the amplitude of light, a hologram captures both the amplitude and phase information of light. Specifically, the hologram records the patterns produced by the interference between the object and reference beams. The transmittance or refractive ratio is modulated by the brightness and darkness of the interference pattern. When the same reference beam is incident on the hologram, the original wavefronts of light can be completely reconstructed by diffraction, and the volumetric object is displayed with the restored 3D shape without a vergence-accommodation conflict [2].

Holograms can be classified into two types based on their method of generation [3]. The first type is referred to as optically captured hologram (OCH), in which the interference patterns are recorded on a film made of a special material, such as silver halide. However, these films are not reusable once the patterns have been recorded. The second type is referred to as computer-generated hologram (CGH), which is a more flexible method of generation that stores digitized (quantized) patterns by simulating the appropriate propagation model of a point-cloud light source representing virtual objects. Then, a spatial light modulator (SLM) that electrically modulates the wavefronts may be used as the display device [4].

Digital holography is regarded as a premier technology for three-dimensional representation, offering a truly immersive viewing experience. However, the huge data size of holograms poses a hindrance to its commercial utilization [5]. Specifically, it is important to attain an appropriate viewing angle for the virtual objects rendered by holograms. This can be accomplished through the utilization of a small pixel pitch (i.e., the size of a unit pixel), as the viewing angle is inversely proportional to the pixel pitch of a holographic display. Hence, a high resolution is necessary, which results in a significant expansion of the hologram’s data size. Moreover, the quantization of floating-point raw data holograms is typically carried out with a high number of bits per pixel (bpp) to maintain maximum information in the reconstructed object with a deep depth range. This leads to an additional amplification of the hologram’s data size. Table 1 presents a comparison between the data size of a 4K ultra-high-definition (UHD) image with 10 bits per pixel (bpp) and a hologram image. It is common to use a resolution of 4K with 10 bpp to achieve good quality for natural image, while a resolution of 16K with 16 bpp is required to attain sufficient quality when the hologram is rendered. Therefore, we compared the data sizes for each of these commonly used cases. It can be seen that the data size of the latter is approximately 52 times larger than that of the former. Hence, it is imperative to devise a highly efficient coding method for digital holograms.

Table 1. Comparison of data size between a 4K image and hologram image.

View Table | View all tables in this article

Over the past decades, multiple endeavors have been made to efficiently encode hologram data. Unlike traditional photography, holograms possess distinct signal properties. In particular, due to the prevalence of low-frequency components in natural images, significant compaction of energy can be achieved in the frequency domain, enabling the use of discrete cosine/sine transforms (DCT/DST) for effective transform coding. Furthermore, a majority of images exhibit objects in horizontal and vertical arrangements. This attribute can be leveraged in a predictive coding framework to significantly enhance coding efficiency. However, holograms exhibit completely distinct signal characteristics from natural photography. For instance, the complex fringe interference patterns of holograms are highly random, exhibiting numerous high-frequency components and weak directionality. Thus, conventional coding techniques are inadequate in achieving desirable coding results for digital hologram data.

Many studies have been conducted on the compression of holograms to take advantage of their inherent characteristics. In a previous study [6], the authors applied entropy coding methods such as LZ77, LZW, and Huffman encoding to phase-shifting holograms. Nonuniform and vector quantization schemes were proposed in [7], considering the difference between the dynamic range of holograms and natural images. The quantization of wavelet coefficients was proposed in [8], while nonuniform lossless coding using companding quantization was introduced in [9]. Compressive sensing-based methods using a sparse matrix representation were introduced in [10]. With regard to transform coding, discrete wavelet transform and its variants were proposed in [11] to further tailor the adaptability to local geometries and improve the coding performance. The authors also proposed a unitary transform for modeling deep holographic scenes using a generalization of linear canonical transforms [12]. Other approaches include the vector lifting method [13], matching pursuits-based method [14], and a lossy coding method that used row/column-based uniform downsampling with spline interpolation [15].

The JPEG community is working on a new project named the JPEG Pleno Holography [16], which aims to develop an efficient coding method for hologram images to establish the final standard in 2024. For the purpose of this endeavor, two coding scenarios for holograms were introduced: hologram domain coding and object domain coding, as depicted in Fig. 1 [17]. In the scenario of hologram domain coding, the hologram itself is encoded and the decoded hologram is then rendered to a numerical reconstruction (NR) using the appropriate propagation model, such as the Fresnel and angular spectrum method. In the scenario of object domain coding, the uncompressed hologram is initially rendered to NR, and then it is encoded. However, in the latter scenario, the decoded NR must be backpropagated to the hologram domain in order to generate other NRs with different reconstruction distances. In addition, determining the appropriate reconstruction distance for encoding a specific NR also remains a subject of debate. On the other hand, in the former scenario, once the hologram is encoded, any NR with arbitrary reconstruction distances can be generated from the decoded hologram without the need for iterative forward/backward propagation. For these reasons, the JPEG Pleno standardization group has provided guidelines recommending the mandatory use of the hologram-plane coding pipeline, while the object-plane coding is optional for proponents [17]. Therefore, in this paper, we conducted encoding experiments in the hologram domain.

Fig. 1. Two scenarios for hologram encoding.

Download Full Size | PDF

Hologram image compression using existing image/video coding standards has been also actively studied. In [18], the authors conducted a comprehensive benchmarking study on hologram image coding by comparing several standard codecs, such as JPEG, JPEG2K, H264/AVC-Intra [19], and HEVC-Intra [20]. They also performed benchmarking comparisons against several holographic databases with different representation formats, such as amplitude, phase, real, and imaginary formats. The experimental results can be summarized as follows. 1) In both hologram and object domains, HEVC-Intra, H.264/AVC-Intra, JPEG2K, and JPEG exhibited high coding performance in that order. 2) The amplitude component is the easiest to encode, while the phase component is usually the most difficult because it is highly random. 3) The real and imaginary components have similar coding efficiency. The coding in the object domain has the potential to result in improved performance compared to coding in the hologram domain, due to the higher levels of spatial/temporal correlation.

The authors in [18] conducted experimental comparisons and related analyses to evaluate the coding performance of existing image compression standards on holographic data. However, there is still a deficiency in the thorough examination of the performance of each coding tool within the latest standard codec, such as Versatile Video Coding (VVC). This is crucial for researchers to comprehend the constraints of state-of-the-art codecs and identify areas for improvement. Furthermore, while evaluations of the performance of holographic image coding were conducted [18], there has been a lack of corresponding assessments for holographic video coding. In this work, we examine the performance of various coding tools, with a particular focus on transform / in-loop filters / screen-content coding tools, in VVC, which is the latest and the most efficient video coding standard, for holographic videos to furnish insights for the development of new techniques to enhance coding efficiency. To this end, we conducted comprehensive tool-on and tool-off tests for computer-generated phase-only hologram (POH) video dataset and performed in-depth analysis to identify the limitations of existing video coding standards. In addition, we analyzed the influence of coding artifacts in compressed holograms on the perceptual quality of NRs. Distinct from the typical coding artifacts found in standard videos, the artifacts in NRs exhibit unique characteristics, such as a decrease in average brightness, the dispersion of speckle noise, and the appearance of extraneous fake objects. These factors must be taken into account when designing a phase-only holographic coding scheme.

The remainder of this paper is organized as follows. In Section 2, we introduce a database from the Electronics and Telecommunications Research Institute (ETRI) in the form of a computer-generated POH. An overall framework for video coding standard is described in Section 3. Next, the performance of several coding tools in VVC are presented and analyzed in Section 4, and the analysis of the subjective video quality of NRs is detailed in Section 6. Finally, concluding remarks are provided in Section 7.

2. ETRI phase-only hologram video dataset

Because light waves are expressed in the form of a complex number, digital holograms can be represented in various formats for efficient encoding or rendering. For example, a full-complex hologram can either be a real-imaginary form or an amplitude-phase form. There is also an interferogram format that utilizes a set of interference patterns with three different phase shifts (0, $\pi /2$, and $\pi$) of the reference wave. The JPEG Pleno holography has made available a collection of hologram image databases, procured from a variety of academic institutions and companies, for research purposes [21]. The databases contain different types of full-complex holograms, such as monochrome holograms, color holograms, CGHs, OCHs, and holograms with different resolutions (972$\times$972 to 16384$\times$16384) and scene depths.

Due to the fact that available Spatial Light Modulators (SLMs) can only modulate the amplitude or phase signal individually, the utilization of amplitude-only holograms (AOHs) or POHs is more practical. It is known that AOHs incur DC-bias and twin images during the reconstruction process, which can lead to a reduction in image quality. On the other hand, adopting POHs that modify only the phase can be a solution to mitigate these issues, because they are theoretically free of zeroth-order diffraction and twin images, although such artifacts may still exist due to the less-than-optimal efficiency of presently available phase SLMs in practical application. For these reasons, we used an ETRI dataset in the form of POHs for the encoding experiments in this work. For the details of the ETRI dataset, please refer to [22]. The ETRI dataset consists of six computer-generated POHs derived from the 3D contents. The process of acquiring the computer-generated POH is depicted in Fig. 2. First, the 3D content is rendered as an RGB image and depth map. The content of the RGB+DepthMap is sliced into multiple depth layers, parallel to the hologram, using a layer-based approach. By leveraging the parallel nature between each layer and the hologram plane, the Fresnel propagation model can be applied to calculate the propagation of light between the layer and the hologram plane. By performing this process for all layers and overlaying the results, the full-complex hologram for the virtual object can be generated [23]. Subsequently, the random phase method [24] was implemented to obtain the POH from the full-complex hologram, discarding only the amplitude component. In the generation process of ETRI dataset, POHs in the range of 0 to 2$\pi$ were quantized to a bit depth of 8bpp (values between 0 and 255). Each POH is a video hologram of 33 frames, with each consecutive frame capturing the object scene at an increment of 1°. This means that the entire frame covers 33° of 3D virtual scene. The resolution of all POHs is 1920 $\times$ 1080. The wavelengths of the red, green, and blue components used in the CGH generation process are 660$nm$, 532$nm$, and 473$nm$, respectively. The pixel pitch was 8$\mu m$, and the reconstruction distance from the hologram plane was 0.25$m$. A summary of the ETRI dataset is provided in Table 2, and the six color holograms and their corresponding NRs are shown in Fig. 3.

Fig. 2. Process of acquiring computer-generated POH.

Download Full Size | PDF

Fig. 3. ETRI computer-generated phase-only hologram dataset [22]. (first and third rows) The center (17th) frames of the six POHs called bumbameuboi, bunny, can, cube, dragon, and head, respectively. (second and fourth rows) The corresponding numerical reconstructions.

Download Full Size | PDF

Table 2. Summary of the ETRI dataset of computer-generated phase-only holograms.

View Table | View all tables in this article

3. Overall framework of video coding standard

In this section, we summarize the overall framework of VVC, as shown in Fig. 4. For decades, video coding standards have been developed by the Joint Collaborative Team on Video Coding (JCT-VC) for the purpose of doubling the coding efficiency compared with the predecessor and enabling high-speed parallel processing. Standard codecs basically adopt the block-based coding scheme, whereby the image frames are divided into basic, non-overlapping units known as coding tree units (CTUs). These units are processed by the encoder in a raster scan order. In addition, standard codecs are grounded in a hybrid-coding scheme that sequentially executes various encoding core modules. To be more specific, 1) either intra- or inter-prediction is performed to generate the predictor ($P$) that best fits the original signal ($O$), and the residual signal with the spatial or temporal redundancy removed is computed ($R=O-P$). 2) The residual signal is then fed into a transform module to obtain the energy-compacted coefficients ($C$) in low frequencies. 3) Taking into account the human visual system’s (HVS) relative insensitivity to high-frequency components, a quantization module achieves lossy compression by permitting errors in such transform coefficients, with the compression level being regulated by the quantization parameter (QP). 4) Subsequently, an entropy coding engine, such as context-adaptive binary arithmetic coding (CABAC), is utilized to generate a bitstream with the removal of statistical redundancy from the quantized transform coefficients ($\hat {C}$). 5) Finally, in-loop filters, such as luma mapping with chroma scaling (LMCS), deblocking filter (DF), sample adaptive offset (SAO), and adaptive loop filter (ALF), can be applied in sequence to reduce blocking artifacts or reconstruction pixel errors and further improve coding efficiency.

Fig. 4. Overall encoder framework of VVC.

Download Full Size | PDF

Recently, the Joint Video Expert Team (JVET), organized by ISO/IEC MPEG and ITU-T VCEG, finalized the next-generation video coding standard, VVC, in response to the growing demand for enhanced video compression and support for a wider range of media content, such as high dynamic range/wide color gamut video, 360° video, and computer-generated or screen-captured video. VVC incorporates a novel block structure and several innovative coding tools, which are thoroughly described in [25]. As a result, VVC is known to achieve a substantial reduction in bit rate, approximately 40$\sim$50%, compared to HEVC while maintaining the same visual quality for natural videos.

4. Performance analysis of VVC for phase-only holograms

4.1 Encoding experiment setup

Six POHs of the ETRI dataset were used in the experiments. To achieve a thorough evaluation of performance, encoding was performed on Main10 profile using the VVC reference software VTM18.0 [26], with four QP values (37, 42, 47, and 51) and two encoding configurations, AI (All Intra) and RA (Random Access with a group of pictures size of 32). It is worth mentioning that the conventional video coding usually employs four QP values (22, 27, 32, 27). However, these low QP values do not accurately distinguish the subjective visual qualities in the NRs generated by compressed holograms, which is why higher QP values were chosen in our study. In the generation of anchor test results, we also compared the encoding results for POHs under different color spaces (YCbCr and RGB) and chroma subsampling methods (4:4:4 and 4:2:0 for YCbCr). As demonstrated in Fig. 2, the POH image was initially generated in the form of RGB 4:4:4 format. The color space conversion from RGB 4:4:4 to YCbCr 4:4:4 was performed by applying mapping functions to separate luminance (Y) and chrominance (Cb, Cr) components. Furthermore, the chroma subsampling from YCbCr 4:4:4 to YCbCr 4:2:0 could be performed by downsampling two chrominance components (Cb and Cr) with a scale factor of 0.5 while preserving the luminance component (Y). For the remaining encoding conditions, we adhered to the Common Test Conditions (CTC) of VVC [27]. To evaluate the effectiveness of several coding tools in VVC, we performed turn-off tests, as shown in Table 3. The turn-off test is conducted to assess the efficacy of specific coding tools. In this experimental setup, the target coding tool is initially activated as the default configuration in the reference codec. Subsequently, the same coding tool is deactivated in the test codec. Lastly, we compare two encoding results in order to observe the impact of the coding tool on coding performance in terms of Bjontegaard delta rate (BD-rate) [28]. BD-rate is a widely used metric in the field of video compression. It quantifies the reduction in bitrate at the same visual quality level. A negative and positive BD-rate values indicate a coding gain and loss of the test codec over the reference codec, respectively. The greater the absolute BD-rate value, the more pronounced the discrepancy in coding performance between the reference and test codecs. To assess the suitability of VVC, an optimized codec designed for natural videos, in encoding holographic videos, we compared the encoding outcomes of the ETRI dataset with those of CTC video sequences. The CTC sequences encompass various natural videos with differing resolutions and were employed for the development of the VVC codec [27].

Table 3. List of turn-on and turn-off encoding tests on the ETRI dataset using the AI and RA configurations of VVC for the Main10 or Main10-444 profile. The selected coding tools are marked with a ‘$\checkmark$’ symbol, with transform-related & in-loop filter-related tools for the turn-off test, and SCC-related tool for the turn-on test).

View Table | View all tables in this article

4.2 Statistical analysis of intra-/inter-mode selections

An analysis of the distribution of intra or inter-mode coding units (CUs) can provide valuable insights into the prediction types that may be effective for encoding hologram data. This information can help guide the design and optimization of encoding algorithms and techniques, leading to more effective and efficient hologram data compression. To this end, we compared the mode selection statistics resulting from encoding six POH videos of the ETRI dataset and six CTC video sequences of 4K resolution (class-A). The experiments were performed for the AI and RA configurations, respectively. For intra prediction, we categorized the statistics of the chosen modes into three groups: 1) angular modes, 2) non-angular modes (DC and Planar), 3) matrix-based intra prediction (MIP) mode. Note that, in VVC, intra prediction can be executed utilizing a single reference line or multiple reference lines (MRLP) for the angular modes and DC mode. For inter prediction, the selected modes were simply categorized into two categories: 1) skip mode and 2) the other inter-modes (including advanced motion vector prediction (AMVP), subblock-based temporal motion vector prediction (SBTMVP), Merge, Affine and so on).

The results are shown in Table 4, Table 5 and Fig. 5. Figure 5(a) presents a visual representation of the selected modes for a single frame of the POH encoded using the AI configuration. The CUs marked with circle, square, and arrow symbols indicate that these are coded in DC, Planar, and Angular modes, respectively, and unmarked coding units use the MIP mode. As shown in Table 4, the distribution of intra-modes of POHs differs significantly from that of natural videos. For example, the DC mode (39.5%) was selected much more frequently than the Planar mode (6.5%) in the case of POHs, whereas the corresponding percentages for the CTC sequences were 6.2% and 30.4%, respectively. For the DC mode, a single predictor value, which is the average of the adjacent reference samples of the current coding unit, is used. Conversely, for the Planar mode, predictor values are obtained by a position-dependent linear combination of four reference samples, which is suitable for local areas with gradually changing pixel values. As demonstrated in Fig. 3, the POH is unsuitable for the Planar mode because the correlation between neighboring pixels is very low. Additionally, the total percentage of angular modes and MIP mode selected for POHs was approximately 10% lower than that for the CTC sequences. This indicates that the directional prediction by such modes is ineffective for POHs due to their intricate fringe patterns, particularly in the background areas that are primarily composed of speckle noise.

Fig. 5. Graphical examples of the selected modes for the POH (bumbameuboi) using QP37. (a) An encoded frame using the AI configuration, where the Planar, DC, and Angular intra-modes are respectively represented by the square, circle, and arrow symbols. CUs that do not have any markers are encoded using the MIP mode. (b) An encoded frame using the RA configuration, where the color blue represents CUs coded in intra-modes, while the red and green represent CUs coded in inter-modes. The red arrow and the green arrow represent the motion vectors in the reference picture lists L0 and L1, respectively.

Download Full Size | PDF

Table 4. The ratio of coding units (CUs) encoded with four intra-modes for the All-Intra (AI) configuration: six POH videos of the ETRI dataset and six CTC test sequences (Class A) from VVC were encoded using VTM18.0 with a YCbCr-4:2:0 format. The intra-modes were divided into three categories, non-angular modes (Planar / DC), angular modes, and matrix-based intra prediction (MIP) mode. For more detailed information on the intra-modes, please refer to [25].

View Table | View all tables in this article

Table 5. The ratio of coding units (CUs) encoded with intra-mode or inter-mode for the Random Access (RA) configuration: six POH videos of the ETRI dataset and six CTC test sequences (Class A) from VVC were encoded using VTM18.0 with a YCbCr-4:2:0 format. The intra-modes were divided into three categories, non-angular modes (Planar / DC), angular modes, and matrix-based intra prediction (MIP) mode. The inter modes are categorized into two groups: skip mode and the other modes(AMVP, SBTMVP, Merge, Affine, and so on). For more detailed information on the intra-/inter-modes, please refer to [25].

View Table | View all tables in this article

In the RA configuration, the skip mode (70.2%), the other inter-modes (19.7%), and intra-modes (10.1%) were the most frequently selected for POHs, as shown in Table 5. The corresponding modes for the CTC sequences were 53.3%, 27.1%, and 19.6%, respectively. An example of mode selection for a single POH frame is shown in Fig. 5(b), where blue-colored coding units denote those coded in intra-modes, while other colors represent those coded in inter-modes. Here, the skip mode is a type of inter-mode that utilizes only the predictor of adjacent coding units to reconstruct the current coding unit without signaling the motion information and residual signal. Therefore, it is often chosen in cases where the signaling overhead is considered more important than the distortion during the rate-distortion optimization (RDO) process, such as low-bitrate encoding. Meanwhile, the selected inter-modes statistics for POHs are notably different from those of CTC sequences. For example, the skip mode is more frequently selected than the other inter modes such as AMVP, merge, Affine and so on. It implies that the motion-compensated inter-prediction scheme is ineffective for POHs, and the encoder is inclined to select inter-modes that primarily focus on bitrate reduction (i.e., skip mode).

In summary, the intra-/inter-prediction tools of existing video coding standards are inefficient for POH coding. This is mainly because POHs generally have a weak spatial and temporal correlations between neighboring pixels. The spatial correlation is weak due to the presence of random interference patterns, while speckle noise hinders temporal correlation. The ineffectiveness of prediction tools has a detrimental effect on overall coding performance. Specifically, Fig. 6 shows the average RD-curves of six POHs in the YCbCr 4:2:0 format using HM16.22 (reference software of HEVC) and VTM18.0 with the AI and RA configurations. First, unlike the typical RD-curve of a normal video, which rapidly increases in the low-bpp range and saturates in the high-bpp range, the RD-curves of POHs exhibit a linear relationship between the bpp and PSNR. This means that the coding performance is mainly influenced by QP values involved in quantization, while other coding tools are less influential. Secondly, a comparison of the RD-curves of HEVC and VVC for POHs reveals that VVC has a slightly better coding gain than HEVC over the entire bpp range. It is worth noting that VVC is generally known to provide a BD-rate gain of up to 40% for CTC video sequences, but the coding gains of VVC for POHs are comparatively limited (-25.2% and -23.0% in the AI and RA configurations, respectively), implying that POHs are more challenging to compress than regular videos.

Fig. 6. RD-curves of the average results for the ETRI database encoded using HM16.22 and VTM18.0 in the AI and RA configurations.

Download Full Size | PDF

4.3 Different color spaces and chroma subsampling methods

To encode normal video sequences, standard codecs typically use the YCbCr color space to decompose the signal into luminance (luma Y) and chrominance (chroma Cb and Cr) components. Taking advantage of the fact that the HVS is more sensitive to changes in brightness than to changes in color, the chroma components are subsampled to be a quarter of the luma sample, i.e., 4:2:0 format.

To compare the effects of different video formats on the POH coding performance, we conducted encoding experiments with the YCbCr-4:2:0, YCbCr-4:4:4, and RGB-4:4:4 formats using the Main10 or Main10-444 profile. The results are shown in Table 6 and Fig. 7. In a nutshell, it can be stated that, among the available input formats, YCbCr-4:2:0 has been determined to be the most optimal, followed by YCbCr-4:4:4 and RGB-4:4:4. Specifically, for the luma component, YCbCr-4:2:0 significantly outperformed the other formats, providing BD-rate gains of 54.0% and 79.1% relative to YCbCr-4:4:4 and RGB-4:4:4, respectively, in the RA configuration. For chroma components, the coding gains were even more substantial. However, the superiority of the YCbCr-4:2:0 format is apparent, as the chroma subsampling reduces the size of the original source video by half in comparison to YCbCr-4:4:4 and RGB-4:4:4.

Fig. 7. RD-curves of the encoding results using combinations of two color spaces (YCbCr and RGB) and two chroma subsampling methods (4:4:4 and 4:2:0).

Download Full Size | PDF

Table 6. Comparison of encoding results resulting from different color spaces and chroma subsampling methods for POHs using VTM18.0.

View Table | View all tables in this article

More interestingly, the YCbCr-4:4:4 format also exhibits a substantial coding gain of approximately 55.3% for the luma component compared to the RGB-4:4:4 format, despite having the same source video signal size. This implies that the YCbCr color space is a more suitable option than RGB for both normal video and POH coding. To analyze the reason for these results,the absolute Pearson correlation coefficient (PCC) between the three channels of the YCbCr and RGB color spaces was calculated for POHs and CTC sequences, as a measure of the pixel-wise covariance between the two images. The average results are presented in Table 7. For normal CTC video sequences, the RGB format showed high PCC values of up to 0.90, while the YCbCr format had a maximum PCC value of 0.62 due to the decorrelation of luma and chroma channels through a color transform. For POHs, the RGB format demonstrated near-zero PCC values, while the YCbCr format had PCC values of up to 0.55. Thus, when encoding POHs with the RGB format, the near-zero correlations between channels reduced the coding gain, whereas the use of the YCbCr format could improve the coding performance of certain tools that leverage the inter-channel cross-correlation.

Table 7. Average absolute Pearson correlation coefficients between the three channels for POHs and CTC sequences.

View Table | View all tables in this article

4.4 Performance analysis of transform-related tool

POHs are characterized by a substantial amount of high-frequency components. Hence, it is meaningful to examine the applicability of the transform tool in VVC for encoding POHs, as the quantization procedure that follows the transform process is designed to sacrifice high-frequency information, considering the reduced sensitivity of VVC to such components.

In VVC, a set of transforms are employed to mitigate the spatial correlation in prediction errors. However, for POHs, the residual signal exhibits limited spatial correlation and contain many high-frequency components. For instance, as demonstrated in Fig. 8, a comparison between the residual images and their corresponding histograms for a selected normal video (Tango) and POH (bumbameubui) reveals that POHs exhibit weak spatial correlation and a widely dispersed distribution of pixel values, in contrast to normal videos. Thus, for POHs, the transform is ineffective in terms of energy compaction. A large amount of original signal’s high-frequency information is lost during the quantization process, leading to a degradation in the visual quality of the reconstructed video. To deal with such a case, the transform skip mode (TSM) in VVC was implemented, which has been shown to improve coding efficiency by bypassing the transform on the prediction residual. An experimental evaluation of the TSM was conducted using the Main10 profile of VVC, with the results presented in Table 8 and Fig. 9. The results indicate that the omission of TSM resulted in a loss in BD-rates for the luma component by 0.9% and 8.0% in the AI and RA configurations, respectively, which demonstrates the usefulness of TSM for POH coding.

Fig. 8. The residual images and corresponding histograms of a single frame of (a) normal video (Tango) and (b) POH (bumbameubui).

Download Full Size | PDF

Fig. 9. RD-curves of the turn-off test for the transform-skip mode.

Download Full Size | PDF

Table 8. A summary of the turn-off test for the transform-skip mode.

View Table | View all tables in this article

4.5 Performance analysis of in-loop filters

The VVC incorporates multiple in-loop filters, arranged in the following order: LMCS, DF, SAO, and ALF as illustrated in Fig. 4. While the DF and SAO have already been utilized in HEVC, the LMCS and ALF have been newly implemented. These in-loop filters serve to mitigate coding artifacts and enhance the quality of the reconstructed frame. Given the limitations of the prediction modules in encoding POHs, it is deemed necessary to examine the coding performances of the in-loop filters. The encoding results for four in-loop filters of VVC are summarized in Fig. 10 and Table 9.

Fig. 10. RD-curves of the turn-off tests for four in-loop filters of VVC using the Main profile, (a) LMCS, (b) DF, (c) SAO, and (d) ALF, and (e) all in-loop filters.

Download Full Size | PDF

Table 9. A summary of the turn-off test for the in-loop filtering tools of VVC.

View Table | View all tables in this article

4.5.1 LMCS

The dynamic range of the input signal is modified by the LMCS through the utilization of a luma inverse mapping function that is imposed on the reconstructed video prior to in-loop filterings. Furthermore, chroma residual scaling is employed to adjust the chroma signal in accordance with the luma mapping, effectively enabling the balancing of bits allocated for coding both luma and chroma samples. As shown in Table 9, there is no change in BD-rate from the turn-off test, which means that the LMCS has no effect on the coding performance. To understand the reason behind this observation, a comparison was made between the histograms of five natural images from the Kodak Image dataset [29] and six POHs from the ETRI dataset. The results, as depicted in Fig. 11, reveal that the histograms of the POHs are uniformly distributed due to the iterative CGH generation method used for the ETRI dataset, which could be the explanation for the lack of improvement from the LMCS.

Fig. 11. Histograms of (a) five natural images from the Kodak image dataset and (b) six POHs of the ETRI dataset.

Download Full Size | PDF

4.5.2 Deblocking filter

The DF is an imperative tool for reducing visible artifacts present at block boundaries to enhanced the subjective quality of the reconstructed video and coding efficiency. The main deblocking filter design of HEVC is kept in VVC, but it has been updated to accommodate the new block structures and coding tools that support long deblocking. Furthermore, the filter employs luma-adaptive deblocking to better handle a variety of video signals. It can be seen in Table 9 that when the DF is turned off, there is BD-rate loss of up to 14.0% for RA configuration, whereas the result for AI configuration shows no discernible change in performance. The unexpected change in performance observed in the POH encoding experiment remains unexplained for now, as the DF only provides a BD-rate gain of 1.04% for the RA configuration on CTC sequences [25]. Further investigation is necessary to shed light on the reason for this discrepancy.

4.5.3 SAO

The SAO filter is performed after the deblocking filter to bring the reconstructed frame closer to the original frame by applying an offset value to each pixel. In VVC standard, the SAO filter inherited from HEVC remains unchanged [30], and is available in two modes. The first mode is the band offset that divides the full range of pixel values into 32 bands and applies offset values to four consecutive bands to reduce errors. The second mode, edge offset, uses a 3 $\times$ 3 mask to classify a reconstructed pixel at the center of the mask into one of four directions, and applies an offset value based on the relationships between adjacent pixel values. In the turn-off tests of POHs, it was observed that disabling SAO resulted in a marginal coding loss of 0.9% and 0.2% in the AI and RA configurations, respectively. This may be due to the fact that the subsequent ALF compensates for the majority of coding gain achieved by SAO, because ALF serves a similar purpose as SAO, as explained in the following subsection. The Table 10 lists the mode selection ratio for luma component when encoding POHs. Specifically, the edge offset mode was selected up to 54.8%, while the band offset mode was rarely selected. The band offset mode is not a preferred option due to the presence of speckle noises, which results in the absence of homogeneous regions. This lack of uniformity makes the band offset mode less effective for encoding POHs.

Table 10. Selection ratio of SAO modes for luma component in POH encoding.

View Table | View all tables in this article

4.5.4 ALF

The ALF enhances the reconstructed video signal through Wiener filter approaches at the encoder. It performs filtering by deriving optimal filter coefficients that minimize the error between the original video and the restored video, and encode the derived filter coefficients for transmission. The filter has a diamond-shaped region of support of 7x7 for the luma component and 5x5 for the chroma components. The ALF classifies a current block into one of 25 classes based on the directionality and variation of the pixels within the block using the gradient. Specifically, the gradient is calculated from the surrounding pixels, and is classified into 5 directions, which in turn are used to classify into 5 variations. In addition, ALF applies clipping to the difference between the current pixel and the surrounding pixels, which provides nonlinearity to the filtered pixels. This is to limit the influence of surrounding pixels when the difference is large. The ALF is known to yield BD-rate gains of 2.31% and 4.35% in turn-on tests for the CTC test sequences, for AI and RA respectively [25]. However, in turn-off tests for POHs, the corresponding BD-rate losses are significantly larger at 7.8% and 9.8%. This indicates that in-loop filtering is crucial for mitigating the poor performance of prediction modules for POH encoding. Notably, the ALF is an exceptionally effective coding tool, even for highly random video sources.

Finally, as shown in Table 9, disabling all four in-loop filters results in a BD-rate loss of 10.7% and 32.0% for AI and RA, respectively.

4.6 Performance analysis of screen content coding (SCC) related tool

VVC provides screen content coding related tools, intra-block copy (IBC), palette coding, and adaptive color transform (ACT), which aim to improve the coding performance for special types of video content, such as computer-generated graphics, text, and animation. Among these SCC-related tools, we analyze the effects of the IBC on the POH coding performance because the POHs from the ETRI dataset are also computer-generated holograms. IBC [31] is an efficient block-matching technique for predicting the current block from reference blocks located in reconstructed regions within the same picture. Because this tool provides a displacement-compensated coding method for intra-prediction that originally worked only through extrapolation using adjacent reference pixels, the prediction accuracy can be significantly improved. As shown in Fig. 12(a), IBC can be effective for screen contents with repetitive patterns of textures or graphics within the same picture, which also present in POHs. For example, as shown in Fig. 12(b), due to the predominantly repetitive patterns within POHs, there is a high likelihood of finding a suitable match for the current coding unit (CU) in the reconstructed areas. In [32], the turn-on test for IBC was conducted on Text and Graphics with Motion (TGM) and natural content using VTM 9.0 with a YCbCr-4:2:0 format. TGM, which represents screen content, achieved significant improvements of 45.6% and 36.7% for AI and RA, respectively, in terms of YCbCr average. In contrast, the natural sequence achieved minimal improvements of 1.6% and 0.1% for AI and RA. Unlike the previous turn-off tests, we performed turn-on test for IBC, since it is turned off by default in VTM. Table 11 demonstrates that IBC provides significant BD-rate gains, achieving improvements of 14.0% and 10.7% for luma component in the AI and RA configurations, respectively. These results confirm that IBC is highly effective tool for encoding POHs.

Fig. 12. An example of intra-block copy: (a) screen content and (b) POH.

Download Full Size | PDF

Table 11. A summary of the turn-on test for IBC: measuring BD-rate results using VTM18.0 between the reference codec and test codec with the IBC tool turned-on.

View Table | View all tables in this article

5. Additional performance analysis of VVC for POH with different characteristics

In this section, we evaluated performance of VVC on the more comprehensive dataset, which includes backgrounds, multiple objects, various motion patterns, or mutually occluding objects. For this purpose, we selected three B-COM datasets (Ballet4k, Breakdancers2k, and Cars2k) from [33]. Cars2k was introduced in [34], while Ballet4k and Breakdancers2k were created as holographic videos using Multiview-plus-Depth data from [35] and the method described in [36]. Ballet4k and Breakdancers2k have perfect backgrounds and a variety of movements in the objects. The hologram Cars2k, which has no background, shows various motions of the objects along a specified path. All three B-COM holograms consist of multiple objects that mutually occlude each other. The Angular Spectrum method was used as the propagation model. The resolutions are 3840x2160 or 1920x1080 with a bit depth of 8 bpp. The wavelengths of the red, green, and blue components used in the CGH generation process are 640$nm$, 532$nm$, and 473$nm$, respectively. The pixel pitchs were 3.74$\mu m$ or 6.4$\mu m$. A summary of the B-COM dataset is provided in Table 12, and the thumbnail images for three color holograms and their corresponding NRs are shown in Fig. 13. The B-COM dataset consists of full complex holograms. Therefore, we utilized only the phase component for encoding. The experiments identical to those in Table 3 were conducted to evaluate the coding performance of different color space and chroma subsampling, TSM, inloop filters, and IBC on B-COM dataset. Encoding was performed in the same environment as Section 4.1, excluding only the dataset.

Fig. 13. B-COM computer-generated hologram dataset [33,34]. (first row) The phase images called Ballet4k, Breakdancers2k, and Cars2k, respectively. (second row) The corresponding numerical reconstructions.

Download Full Size | PDF

Table 12. Summary of the B-COM computer-generated hologram dataset [33].

View Table | View all tables in this article

5.1 Different color spaces and chroma subsampling methods on B-COM dataset

To evaluate the effects of different video formats on the coding performance of B-COM dataset, we conducted encoding experiments using the YCbCr-4:2:0, YCbCr-4:4:4, and RGB-4:4:4 formats. In Table 13, the results show that YCbCr-4:2:0 is the most optimal format followed by YCbCr-4:4:4 and RGB-4:4:4, similar to the POH of the ETRI dataset. Specifically, for the luma component in the RA configuration, YCbCr-4:2:0 significantly outperformed other formats, with BD-rate gains of 61.0% and 93.6% compared to YCbCr-4:4:4 and RGB-4:4:4, respectively. The coding gains for chroma components were generally more substantial, reaching up to 100%.

Table 13. Comparison of encoding results on POHs of the B-COM dataset from different color spaces and chroma subsampling methods using VTM18.0.

View Table | View all tables in this article

5.2 Performance analysis of turn-on and turn-off encoding tests on B-COM dataset

The turn-off test was conducted to assess the encoding performance of transform-related and in-loop filter-related tools, while the turn-on test was conducted to evaluate the encoding performance of SCC-related tool on the POH of the B-COM dataset. The encoding results for the turn-on and turn-off tests of VVC are summarized in Table 14.

Table 14. Results of turn-on and turn-off encoding tests on POHs of the B-COM dataset using VTM 18.0, with transform-related & in-loop filter-related tools for the turn-off test, and SCC-related tool for the turn-on test.

View Table | View all tables in this article

As a result, the DF, ALF, and IBC tools, which demonstrated significant performance improvements on the POH of the ETRI dataset upon activation, also exhibited substantial enhancements in performance on the B-COM dataset. Specifically, in the turn-off test with the AI configuration, DF showed a marginal BD-rate gain of 0.1% for the luma component, but in the RA configuration, it demonstrated a significant BD-rate loss of 31.8%. ALF, in the turn-off test with the AI and RA configurations, resulted in encoding losses of 1.4% and 5.1%, respectively, for the luma component. In the turn-on test with the AI and RA configurations, IBC experienced encoding gains of 4.6% and 4.9%, respectively, for the luma component.

On the other hand, when deactivated in the AI configuration, LMCS exhibited a significant encoding gain of 18.8% for the luma component on the B-COM dataset, in contrast to its encoding results on the ETRI dataset. This indicates that LMCS can degrade encoding performance for POH in the AI configuration.

Overall, the experimental results for the B-COM dataset exhibited a similar trend to those for the ETRI dataset.

6. Analysis on subjective video quality of numerical reconstructions

Despite hologram videos being encoded, what is perceptually displayed to the user is a NR, making the subjective quality of the NR of greater significance than that of the decoded hologram. In this section, we analyze the impact of POH encoding on the visual quality of the NR, and delineates the distinctive attributes of the coding artifacts that are noticeable in the NRs.

As shown in Fig. 1, the hologram is rendered to the NR using the appropriate propagation model, such as the angular spectrum or Fraunhofer diffraction formula. In the case of ETRI dataset, the NR was calculated using the Fresnel model ($P$) as follows.

(1)$$\begin{aligned} u(x,y) & =P\{s\}(x,y)\\ & = F^{{-}1}\{F\{s\}(f_x,f_y)exp(j2\pi z\sqrt{\lambda^{{-}2}-f_x^{{-}2}-f_y^{{-}2}})\}(x,y) \end{aligned}$$

where $s$ and $u$ are the light waves in the source and destination planes, and $F$ and $F^{-1}$ are the forward and inverse Fourier transforms, respectively. $(f_x, f_y)$ are the coordinates in the frequency domain, $\lambda$ is the wavelength of light, $k=\frac {2\pi }{\lambda }$ is the wavenumber, and $z$ is the user-defined reconstruction distance.

It should be noted that evaluating the subjective quality of the NR rendered from compressed digital holograms is a challenging task. While JPEG Pleno has initiated efforts to develop and standardize a subjective evaluation protocol recently, this work is still in its early stages. The primary obstacle to evaluating holograms’ visual quality is the need for a 3D holographic-rendering device, which is required to properly display holographic information. Although the latest devices are capable of showing complete 3D plenoptic information, they typically have a limited viewing angle. Consequently, a common approach to assessing subjective quality is to use 2D displays to represent NRs at different distances or view ports to evaluate various object scenes with different focal points.

In conventional videos, common coding artifacts are blockiness, blurring, and ringing, each of which degrades the original content in distinct ways. For instance, blockiness caused by block-based coding leads to unpleasant line artifacts surrounding block boundaries, whereas blurring due to the loss of high-frequency components smudges the original content. The degree of compression, indicated by the QP value, directly affects the severity of the coding artifacts; the greater the compression, the more pronounced the artifacts become.

The artifacts present in compressed POHs are akin to those found in natural videos. However, when the POH is converted to the NR, so called, the rendering artifacts exhibit different characteristics. First, the greater the compression applied to the POH, the darker the NR becomes. For example, the first row in Fig. 14 depict the NRs of bunny. The figure shows the original NR without coding and the four NRs generated by encoding the POHs using four QP values (37, 42, 47, and 51) from left to right. As the QP value increased, the NRs of the compressed POHs became increasingly dark compared to the original NR. To analyze this phenomenon statistically, the mean ($\mu$) and standard deviation ($std$) of each NR were calculated. In the case of POH bunny, the mean value remained approximately 32, irrespective of the QP value, while the $std$ decreased from 42.6 to 10.2.

Fig. 14. Coding artifacts observed in the NR. (first row) Examples of the decrease in brightness according to the increase the QP value. The original NR and four NRs when the POHs are encoded with four QP values (37, 42, 47, and 51) from left to right. (second row) Examples of temporal video quality fluctuations that occurred in encoding of the RA configuration. The picture order count (POC) is the display order. Two NRs with POC5 and POC13 are rendered from the compressed POH using QP45. The other three NRs are rendered from the compressed POHs using QP42.

Download Full Size | PDF

To explore why the mean value of the overall NR image remained relatively constant despite the reduced brightness of the foreground object, we divided the NR of the POH bumbameuboi into the foreground and background regions and calculated their respective $\mu$ and $std$. Figure 15 shows an example of this analysis. The first row displays the original NR, and the second row displays the NR of the coded POH using QP45. As anticipated, the energy level in the foreground region decreased from 68.4 to 41.6 in terms of the mean value after POH encoding, with the dynamic range narrowing and focusing on low pixel intensities. In the background region, the energy level increased from 14.1 to 27.7, while the average $std$ remained relatively constant after POH encoding. A comparison of the two yellow circles in Fig. 15 (viewed at 300% magnification for clarity)indicates a significant increase in speckle noise in the second row, which negatively impacts the subjective quality. It is worth noting that speckle noise was intentionally added during the POH generation process to achieve a sufficient viewing angle, despite its adverse impact on video quality. Another issue with speckle noise is that it degrades the spatial/temporal correlation of the video, leading to a significant reduction in coding efficiency. Therefore, compression-friendly POH generation methods that introduce less speckle noise, particularly in foreground objects, such as the iterative Gerchberg-Saxton (GS) [37], mixed-region amplitude freedom (MRAF) algorithms [38], or stochastic gradient descent (SGD) method [39], should be considered.

Fig. 15. Histogram analysis of the foreground (F) and background (B) region of the NR rendered from the POH bumbameuboi. The circled regions are magnified 300% for easy viewing: (a) without coding and (b) with coding using QP47.

Download Full Size | PDF

Furthermore, in the tests of RA configuration, encoding with high QP values can cause temporal distortions similar to motion judder. For instance, in the second row of Fig. 14, five NRs were rendered from the coded POH using the RA configuration with QP45. In the caption, POC represents the picture order count, which indicates the display order. Since RA encoding is based on the hierarchical B-frame prediction structure, two NRs with higher temporal ID, such as POC5 and POC13, were encoded using a higher QP value than the remaining three NRs, resulting in temporal distortions. Specifically, a temporal fluctuation in brightness was observed that appeared as a motion judder artifact. One of the reasons would be the frequent selection of skip mode in the inter prediction, which leads to a loss of details in the original CUs.

The distortions in NR images after POH compression can also be observed through the experimental results in Fig. 16. Firstly, it can be noted that as QP increases in both AI and RA configurations, a decrease in the objective image quality metric, peak signal-to-noise ratio (PSNR), is observed. In terms of a subjective quality, an evident degradation is perceptible from QP 47. Combining the results in Fig. 14 and Fig. 16, it can be said that discernible distortions in NR images manifest at a relatively high compression rate, specifically commencing at QP 42. Hence, to minimize subjective quality distortions in NR images, it is recommended to opt for a lower QP value, i.e., below 42, during the encoding process.

Fig. 16. NR images according to compression conditions and PSNR calculated based on the original. The original NR and four NRs when the POHs are encoded with four QP values (37, 42, 47, and 51) from left to right. The first row represents AI configuration results, and the second row represents RA configuration results. PSNR is calculated in the central 500x500 region of the images.

Download Full Size | PDF

ETRI provides the NR software that converts a POH to a corresponding NR. While this software includes a functionality to modify the reconstruction distance of the NR, it does not afford the capability to alter the viewing angle (i.e., the size and location of viewing window). For this reason, we conducted an indirect analysis using 1) error maps between the original POH and the decoded POH and 2) frequency domain images transformed from POH, to explore how compression impact the quality of NRs across viewing angles. For this purpose, we conducted two additional experiments.

The results of the first experiment, the absolute error map between the original POH and the decoded POH, are presented in Fig. 17. We checked whether the distribution of errors within the images is uniform in terms of a mean absolute error (MAE) for different QP values. It is evident that at QP 37 and QP 42, each block exhibits similar MAE values, whereas at QP 47 and QP 51, evident differences between the central and peripheral areas of the images and observed. Although there may be a degradation of NR images with an views for uniform distributions at QP 37 and QP 42. However, in the case of using higher QP values, the quality of reconstruction for off-axis views may deteriorate.

Fig. 17. Absolute error map between the original POHs and the decoded POHs. The results are obtained through experiments using the ETRI’s POH (cube), with each image depicting the average absolute error of block-patches. Each column corresponds to different QP values (37, 42, 47, and 51), where the first row shows AI configuration results, and the second row represents the RA configuration results. MAE denotes the mean absolute error for each block.

Download Full Size | PDF

In the second experiment, Fig. 18 represents the absolute error maps between frequency domain images transformed from the original POH and the compressed POH for the ’cube’. The central region of the error map indicates the loss of low-frequency components compared to the original POH, while moving away from the center signifies the extent of high-frequency component loss. Comparing the error maps between the frequency images of using QP 37 and QP 47 for both AI and RA configurations, as QP value increases, it is evident that the differences in high-frequency region becomes larger than those in low-frequency region. Given this noticeable loss in the high-frequency component, which can deteriorate the quality of NR, the similar suggestion to the first experiment can be drawn that it is recommended to use QP values of 42 or lower.

Fig. 18. Absolute error maps between frequency domain images transformed from the original POHs and decoded POHs generated from encoding with QP 37 and QP 47 for both AI and RA configurations. Each image was transformed using 2D Discrete Fourier Transform (DFT). The center area of the images represents the loss of low-frequency components, while moving away from the center indicates the loss of high-frequency components in the error map images.

Download Full Size | PDF

Finally, there is a unique visual artifact that is specific to holography and not typically observed in regular videos. This artifact is shown in the red dotted box in Fig. 15(b) and is caused by the distorted propagating wavefield that is emitted by the decoded hologram. The ETRI dataset’s POHs are based on the Fresnel propagation model, which assumes that the reconstruction plane is located at a particular distance. Thus, even a small error in the decoded POH can result in a substantial displacement error in the reconstruction plane. This issue is particularly noticeable when encoding at low bitrates, and it can lead to spurious reconstructions.

7. Conclusion

In this paper, we investigated the performance of various coding tools in VVC for encoding computer-generated POH videos. The experimental results demonstrate that the existing standard video codecs do not fully consider the unique characteristics of POHs, resulting in suboptimal coding performance. We also discussed why certain coding tools performed better or worse for hologram video coding compared to regular video coding. In particular, DF, ALF, and IBC demonstrated significant improvements. Finally, in our experiments, the activation of IBC, coupled with the concurrent activation of both TSM and all in-loop filters, yielded the best coding performance, representing the optimal encoding configuration for POH video of ETRI dataset in VVC. Given the large size of holographic data, it is essential to develop efficient coding solutions for successful deployment of holography technology. Our analysis of the limitations of existing video codecs can offer valuable insights for designing coding tools optimized for POH but also for other holograms that have a shared feature of randomness in fringe interference patterns.

Future research will focus on improving the coding efficiency of holograms by leveraging the special properties of POHs, such as 2$\pi$ circular periodicity [22]. We will also investigate the relationship between the coding artifacts of compressed digital holograms and the subjective and objective quality of NRs to enhance the perceptual quality of 3D objects for viewers.

Funding

Hanyang University (HY-2023-2571).

Acknowledgments

This work was supported by the research fund of Hanyang University (HY-2023-2571). We thank ETRI (Electronics and Telecommunications Research Institute) for providing the phase-only CGH database.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available.

References

1. W. Osten, A. Faridian, P. Gao, K. Körner, D. Naik, G. Pedrini, A. K. Singh, M. Takeda, and M. Wilke, “Recent advances in digital holography,” Appl. Opt. 53(27), G44 (2014). [CrossRef]

2. P. Hariharan, Basics of Holography (Cambridge University, 2002).

3. J. Geng, “Three-dimensional display technologies,” Adv. Opt. Photonics 5(4), 456 (2013). [CrossRef]

4. C. Maurer, A. Jesacher, S. Bernet, and M. Ritsch-Marte, “What spatial light modulators can do for optical microscopy,” Laser Photonics Rev. 5(1), 81–101 (2010). [CrossRef]

5. R. Corda and C. Perra, “Hologram domain data compression: Performance of standard codecs and image quality assessment at different distances and perspectives,” IEEE Trans. Broadcast. 66(2), 292–309 (2020). [CrossRef]

6. T. J. Naughton, Y. Frauel, B. Javidi, and E. Tajahuerce, “Compression of digital holograms for three-dimensional object reconstruction and recognition,” Appl. Opt. 41(20), 4124 (2002). [CrossRef]

7. P. A. Cheremkhin and E. A. Kurbatova, “Numerical comparison of scalar and vector methods of digital hologram compression,” in SPIE Proceedings, Y. Sheng, C. Yu, and C. Zhou, eds. (SPIE, 2016).

8. A. Shortt, T. J. Naughton, and B. Javidi, “Compression of digital holograms of three-dimensional objects using wavelets,” Opt. Express 14(7), 2625 (2006). [CrossRef]

9. A. E. Shortt, T. J. Naughton, and B. Javidi, “A companding approach for nonuniform quantization of digital holograms of three-dimensional objects,” Opt. Express 14(12), 5129 (2006). [CrossRef]

10. Y. Rivenson, A. Stern, and B. Javidi, “Overview of compressive sensing techniques applied in holography [invited],” Appl. Opt. 52(1), A423 (2013). [CrossRef]

11. D. Blinder, C. Schretter, H. Ottevaere, A. Munteanu, and P. Schelkens, “Unitary transforms using time-frequency warping for digital holograms of deep scenes,” IEEE Trans. Comput. Imaging 4(2), 206–218 (2018). [CrossRef]

12. T. Birnbaum, A. Ahar, D. Blinder, C. Schretter, T. Kozacki, and P. Schelkens, “Wave atoms for digital hologram compression,” Appl. Opt. 58(22), 6193 (2019). [CrossRef]

13. Y. Xing, M. Kaaniche, B. Pesquet-Popescu, and F. Dufaux, “Vector lifting scheme for phase-shifting holographic data compression,” Opt. Eng. 53(11), 112312 (2014). [CrossRef]

14. P. Gioia, A. Gilles, M. Cagnazzo, B. Pesquet, and A. E. Rhammad, “View-dependent compression of digital hologram based on matching pursuit,” in Optics, Photonics, and Digital Technologies for Imaging ApplicationsV. P. Schelkens, T. Ebrahimi, and G. Cristóbal, eds. (SPIE, 2018).

15. H. Zhang, W. Zhou, D. Leber, Z. Hu, X. Yang, P. W. M. Tsang, and T.-C. Poon, “Development of lossy and near-lossless compression methods for wafer surface structure digital holograms,” J. Micro/Nanolithogr., MEMS, MOEMS 14(4), 041304 (2015). [CrossRef]

16. “JPEG Pleno Holography,”, https://jpeg.org/jpegpleno/holography.html (2019).

17. R. K. Muhamad, A. Ahar, T. Birnbaum, A. Gilles, S. Mahmoudpour, and P. Schelkens, “JPEG Pleno Holography Common Test Conditions 8.0,” (2022).

18. J. P. Peixeiro, C. Brites, J. Ascenso, and F. Pereira, “Holographic data coding: Benchmarking and extending HEVC with adapted transforms,” IEEE Trans. Multimedia 20(2), 282–297 (2018). [CrossRef]

19. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Jpeg2000: Image compression fundamentals, standards and practice,” IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003). [CrossRef]

20. G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). [CrossRef]

21. “JPEG Pleno Database,”, http://plenodb.jpeg.org/ (2023).

22. K.-J. Oh, J. Kim, and H. Y. Kim, “A new objective quality metric for phase hologram processing,” ETRI Journal 44(1), 94–104 (2021). [CrossRef]

23. J.-S. Chen and D. P. Chu, “Improved layer-based method for rapid hologram generation and real-time interactive holographic display applications,” Opt. Express 23(14), 18143 (2015). [CrossRef]

24. E. Buckley, “Holographic laser projection technology,” SID Symposium Digest 39(1), 1074 (2008). [CrossRef]

25. B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (VVC) standard and its applications,” IEEE Trans. Circuits Syst. Video Technol. 31(10), 3736–3764 (2021). [CrossRef]

26. “Reference software of VVC (VTM),”, https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware (2020).

27. F. Bossen, J. Boyce, X. Li, V. Seregin, and K. Sühring, “VTM common test conditions and software reference configurations for SDR video,” JVET (2020).

28. G. Bjontegaard, “Improvements of the BD-PSNR model,” ITU-T (2008).

29. “Kodak Lossless True Color Image Suite,” https://r0k.us/graphics/kodak/ (2013).

30. C.-M. Fu, E. Alshina, A. Alshin, Y.-W. Huang, C.-Y. Chen, C.-Y. Tsai, C.-W. Hsu, S.-M. Lei, J.-H. Park, and W.-J. Han, “Sample adaptive offset in the HEVC standard,” IEEE Trans. Circuits Syst. Video Technol. 22(12), 1755–1764 (2012). [CrossRef]

31. X. Xu, S. Liu, T.-D. Chuang, Y.-W. Huang, S.-M. Lei, K. Rapaka, C. Pang, V. Seregin, Y.-K. Wang, and M. Karczewicz, “Intra block copy in HEVC screen content coding extensions,” IEEE J. Emerg. Sel. Topics Circuits Syst. 6(4), 409–419 (2016). [CrossRef]

32. X. Xu and S. Liu, “Overview of screen content coding in recently developed video coding standards,” IEEE Trans. Circuits Syst. Video Technol. 32(2), 839–852 (2022). [CrossRef]

33. “B-COM Holographic Videos,”, https://hologram-repository.labs.b-com.com/#/holographic-videos (2023).

34. A. Gilles, P. Gioia, N. Madali, A. E. Rhammad, and L. Morin, “Open access dataset of holographic videos for codec analysis and machine learning applications,” in International Conference on Quality of Multimedia Experience (2023).

35. C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” ACM Trans. Graph. 23(3), 600–608 (2004). [CrossRef]

36. A. Gilles, P. Gioia, R. Cozot, and L. Morin, “Computer generated hologram from multiview-plus-depth data considering specular reflections,” in International Conference on Multimedia & Expo Workshops (IEEE, 2016).

37. C. Chang, J. Xia, L. Yang, W. Lei, Z. Yang, and J. Chen, “Speckle-suppressed phase-only holographic three-dimensional display based on double-constraint gerchberg–saxton algorithm,” Appl. Opt. 54(23), 6994 (2015). [CrossRef]

38. C. Zhang, Y. Hu, W. Du, P. Wu, S. Rao, Z. Cai, Z. Lao, B. Xu, J. Ni, J. Li, G. Zhao, D. Wu, J. Chu, and K. Sugioka, “Optimized holographic femtosecond laser patterning method towards rapid integration of high-quality functional devices in microchannels,” Sci. Rep. 6(1), 1 (2016). [CrossRef]

39. C. Chen, B. Lee, N.-N. Li, M. Chae, D. Wang, Q.-H. Wang, and B. Lee, “Multi-depth hologram generation using stochastic gradient descent algorithm with complex loss function,” Opt. Express 29(10), 15089 (2021). [CrossRef]

Image type	Resolution	Bits per pixel (bpp)	Relative data size
4K image	3840 $\times$ 2160	10	1
Hologram image	16384 $\times$ 16384	16	51.8

Number of computer-generated POHs	CGH modeling	Resolution, Bit-depth	Number of frames per hologram	Pixel pitch ( $μ m$ )	Wavelength ( $n m$ )	Reconstruction distance ( $m$ )
6 (bumbameuboi, bunny, can, cube, dragon, head)	Layer-based Fresnel CGH from RGB+Depth source	1920 $\times$ 1080, 8 bpp	33	8	660 (Red)	0.25
					532 (Green)
					473 (Blue)

Test Tool		Color space & Sub-sampling method
		YCbCr 4:2:0	YCbCr 4:4:4	RGB 4:4:4
Anchor		$✓$	$✓$	$✓$
Transform-related	Transformskip	$✓$	N/A	N/A
In-loop Filter-related	LMCS	$✓$	N/A	N/A
	Deblocking	$✓$	N/A	N/A
	SAO	$✓$	N/A	N/A
	ALF	$✓$	N/A	N/A
	All in-loop filters	$✓$	N/A	N/A
SCC-related	IBC	$✓$	N/A	N/A

Reference / Test	Config.	BD-rate
Reference / Test	Config.	Y (R)	Cb (G)	Cr (B)
YCbCr4:4:4 / YCbCr4:2:0	AI	-47.9%	-80.6%	-79.3%
YCbCr4:4:4 / YCbCr4:2:0	RA	-54.0%	-98.0%	-96.5%
RGB4:4:4 / YCbCr4:2:0	AI	-69.9%	-84.2%	-84.7%
RGB4:4:4 / YCbCr4:2:0	RA	-79.1%	-96.8%	-95.9%
RGB4:4:4 / YCbCr4:4:4	AI	-42.2%	-44.0%	-47.4%
RGB4:4:4 / YCbCr4:4:4	RA	-55.3%	-62.8%	-60.9%

Test Video	YCbCr			RGB
Test Video	Y-Cb	Y-Cr	Cb-Cr	G-B	G-R	B-R
CTC	0.2828	0.2168	0.6157	0.9056	0.8722	0.7670
POH	0.5480	0.4376	0.2173	0.0011	0.0012	0.0013

Performance analysis of versatile video coding for encoding phase-only hologram videos

Abstract

1. Introduction

2. ETRI phase-only hologram video dataset

3. Overall framework of video coding standard

4. Performance analysis of VVC for phase-only holograms

4.1 Encoding experiment setup

4.2 Statistical analysis of intra-/inter-mode selections

4.3 Different color spaces and chroma subsampling methods

4.4 Performance analysis of transform-related tool

4.5 Performance analysis of in-loop filters

4.5.1 LMCS

4.5.2 Deblocking filter

4.5.3 SAO

4.5.4 ALF

4.6 Performance analysis of screen content coding (SCC) related tool

5. Additional performance analysis of VVC for POH with different characteristics

5.1 Different color spaces and chroma subsampling methods on B-COM dataset

5.2 Performance analysis of turn-on and turn-off encoding tests on B-COM dataset

6. Analysis on subjective video quality of numerical reconstructions

7. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (18)

Tables (14)

Equations (1)

Optics Express

	Intra-mode (100%)
	Planar	DC	Angular	MIP
POH	6.5%	39.5%	35.0%	18.9%
CTC	30.4%	6.2%	31.0%	32.4%

	Intra-mode (POH: 10.1% /CTC: 19.6%)			Inter-mode (POH: 89.9% /CTC: 80.4%)
	Planar	DC	Angular	MIP	Others	Skip
POH	0.7%	3.5%	3.8%	2.1%	19.7%	70.2%
CTC	5.5%	1.2%	6.0%	7.0%	27.1%	53.3%

Turn-off Test	Config.	BD-rate
Turn-off Test	Config.	Y	Cb	Cr
Transform skip	AI	0.9%	-0.2%	-0.1%
Transform skip	RA	8.0%	-6.8%	-5.6%

Turn-on Test	Config.	BD-rate
Turn-on Test	Config.	Y	Cb	Cr
IBC	AI	-14.0%	-11.8%	-11.1%
IBC	RA	-10.7%	-8.7%	-7.5%

Number of computer-generated POHs	CGH modeling	Resolution, Bit-depth	Number of frames per hologram	Pixel pitch ( $μ m$ )	Wavelength ( $n m$ )
Ballet4k	Layer-basedAngular Spectrum Method CGHfrom Multiview+Depth source	3840 $\times$ 2160,8 bpp	100	3.74	640 (Red)
Breakdancers2k	Layer-basedAngular Spectrum Method CGHfrom Multiview+Depth source	1920 $\times$ 1080,8 bpp	100	6.4	532 (Green)
Cars2k	Layer-basedAngular Spectrum Method CGHfrom RGB+Depth source	1920 $\times$ 1080,8 bpp	300	6.4	473 (Blue)

Configuration	Band offset	Edge offset
AI	1.9%	54.8%
RA	4.1%	50.7%

Configuration	Band offset	Edge offset
AI	1.9%	54.8%
RA	4.1%	50.7%