## Abstract

Three-dimensional (3D) light field displays require samples of image data captured from a large number of regularly spaced camera images to produce a 3D image. Generally, it is inefficient to generate these images sequentially because a large number of rendering operations are repeated in different viewpoints. The current 3D image generation algorithm with traditional single viewpoint computer graphics techniques is not sufficiently well suited to the task of generating images for the light field displays. A highly parallel multi-view polygon rasterization (PMR) algorithm for 3D multi-view image generation is presented. Based on the coherence of the triangular rasterization calculation among different viewpoints, the related rasterization algorithms including primitive setup, plane function, and barycentric coordinate interpolation in the screen space are derived. To verify the proposed algorithm, a hierarchical soft rendering pipeline with GPU is designed and implemented. Several groups of images of 3D objects are used to verify the performance of the PMR method, and the correct 3D light field image can be achieved in real time.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Many efforts have been devoted to research and implementation of three-dimensional (3D) technologies including capturing, processing, and display of 3D images [1–5]. Broad applications are discussed, such as entertainment, manufacturing, and biomedical applications. Because the content of the 3D light field image is difficult to obtain in the real scene, and the camera acquisition is easy to introduce errors, so the method of computer graphics is often used to generate the light field image.

For computer graphics, a computer graphics pipeline is a conceptual model that describes steps of a graphics system to perform to render a 3D scene to a 2D screen [6–10]. Based on the programmability of graphics pipelines, several generation methods for multi-view display were presented [11–20]. Different from the single viewpoint generation, the multi-view generation has high information density, and it is required to explore more specialized generation methods specially designed for that purpose. However, the current generation of 3D displays uses traditional single viewpoint computer graphics techniques, which is not sufficiently well suited to the task of synthesizing images for the 3D light field displays. The rasterization stage is non-programmable, and the structure of the pipeline is not easily customized. Instead, with programmable GPUs, unconventional rendering tasks multi-view rasterization can be implemented with the soft renderer [21].

Here, a parallel polygon rasterization method is presented for multi-view light field image generation, which can be finished in one pass. Due to the expansion of the view dimensions, a new primitive for parallelized polygon rasterization is constructed. The position relationship of vertices in the screen space above all viewpoints is obtained, and the vertex position in different can be computed directly with a reference viewpoint. The plane function and barycentric coordinate interpolation method are derived to compute the color of pixels. A soft rendering pipeline with GPU is designed and implemented. The proposed method is implemented and verified in several 3D scenes. Finally, the performance of our rendering pipeline is analyzed and compared with other multi-view image generation method.

## 2. Related work

Many different methods for computer-generated multi-view images were presented. With the programmable graphics pipeline, some methods were implemented, such as multiple ray cluster rendering [11] and multiple orthographic frustum combing [12]. Based on the path of light propagation and the optical structure of the display, the generation method of viewpoint vector rendering [13,14] and backward ray tracing [15,16] were presented. Some other methods are based on virtual viewpoint generation and image processing, such as multiple reference views depth-image based rendering (MDIBR) [17]. The multiple viewpoint rendering (MVR) [18] method considers the image set as a single spatio-perspective volume, and the coordinate position transformation of vertices between viewpoints is explored.

Efficient rasterization methods and architectures for real-time rendering were also developed [21]. Pineda presented a parallel algorithm for the rasterization of polygons which was particularly well suited for 3D Z-buffered graphics implementations [22]. The algorithm represents each edge of a polygon by a linear edge function with a value greater than zero on one side of the edge and less than zero on the opposite side. Based on the above algorithm, a novel polygon tiling algorithm in which recursive subdivision of image space is driven by coverage masks that classify a convex polygon as inside, outside, or intersecting cells in an image hierarchy is presented [23]. The resulting hierarchical polygon tiling algorithm performs subdivision and visibility computations very rapidly while only visiting cells in the image hierarchy that are crossed by visible edges in the output image. This method is widely used in GPU hardware rendering [24], but no longer applicable in the multi-view rendering.

As the GPU performance becomes almost irresistibly appealing, many applications of all sorts are recast to make use of GPUs. Although the current rendering pipeline is highly programmable, there are long-standing limitations in the hardware rasterization pipeline that restrict the set of algorithms that can benefit from hardware acceleration. Therefore, researchers presented several effective software rendering pipeline methods, such as Freepipe [25], CUDARaster [26], Piko [27], and cuRE [28]. The cuRE is a real-time graphics pipeline implemented entirely in software on a modern GPU and features a fully-concurrent, multi-stage, streaming design with dynamic load balancing, capable of operating efficiently within bounded memory. The power of a software approach lies in the ability to tailor the graphics pipeline to any given application. To validate our algorithm, cuRE is used as the rasterization software rendering framework to implement our method. Also, the dynamic parallelism technique provides flexibility for our pipeline design.

## 3. Parallel multi-view polygon rasterization method

The typical structure of the light field display system is shown in Fig. 1(a), the synthesized image on LCD can be obtained by computer-generated multi-view images, and the light rays from the synthesized image are integrated by the lenticular lens array and diffuser so that they reconstruct a 3D image of the virtual object [3]. According to the path of ray in the optical system, the multi-view image information in the synthesized image can be captured by multiple cameras. As shown in Fig. 1(b), the capture cameras are arranged in a regularly-spaced linear array located at the camera plane. The optical axis directions of cameras are identical, and all the cameras focus on the same plane at a certain depth with shear transform.

In traditional single-view rasterization rendering, we can only generate one viewpoint image at a time, so multiple rendering passes are required for the generation of multi-view images. Each viewpoint image is generated by reading in the model, transforming it into world space, repeatedly transforming the scene geometry based on the position of a moving camera, performing shading calculations. Although some operations such as view independent lighting can be performed once for the entire image sequence, most implementations are executed once per view. These operations cannot be performed in parallel in different viewpoints.

The PMR method draws 3D models inherently computes multiple viewpoints of the scene at once. The primitive shown in Fig. 2(b) is called multi-view triangle array (MTA) which is similar to 3-prism formed by 6 vertices. The PMR reduces the cost of producing the multi-view images by performing as many calculations as possible just once per sequence rather than once per viewpoint. Accordingly, the result of rasterization should be a 3D volume, where the X/Y-axes represent the index of horizontal and vertical pixels, and the Z-axis represents the viewpoint number (0∼N). By considering the image sequence as a whole, PMR reduces the cost of scene synthesis. In the algorithm, multiplications are transformed into accumulation operations as much as possible. For example, in the case of *n* viewpoints and *m* pixels covered per triangle, *38mn* multiplications are required in the traditional method, while only *30m+8mn* multiplications in our method. Compared with traditional methods, it has higher parallelism and lower time algorithm complexity.

Rasterization is the process where each individual primitive is broken down into discrete elements called fragments, based on the sample coverage of the primitive. In the rasterization stage, each triangle is processed through the following steps as shown in Fig. 2. Firstly, the primitive MTA is set up based on viewpoint re-projection. Secondly, the voxels covered by the MTA are found based on plane functions. Thirdly, each voxels’ properties are obtained based on barycentric coordinate interpolation. The details of the steps are as follows.

#### 3.1 MTA setup by viewpoint re-projection

There is a certain spatial relationship among vertex positions in different viewpoints, and the screen coordinates of vertices in different viewpoints can be transformed from that in each other viewpoints directly by view re-projection [29]. As shown in Fig. 3(a), vertices in the reference and destination viewpoint are corresponding one by one, and the basic mapping rule conforms to the following Eq. (1), where *n* is the sequence number of viewpoints, N is the total number of viewpoints, and where *T* is the distance of the between the reference and destination viewpoint in the *horizontal* direction.

Most computer graphics scene databases consist of polygons instead of just points. Polygons are graphics primitives that describe patches of surfaces. Most computer graphics algorithms and hardware are optimized for triangle drawing, sometimes even more so than for line drawing. As shown in Fig. 3(b), (a) triangle is defined by 3 non-collinear vertices, and the vertices of triangles in the destination viewpoint can be obtained by view re-projection.

In the rendering process with n viewpoints, the left-most viewpoint is marked as *view_0*, and the right-most viewpoint is marked as the *view_N*. The screen coordinates of vertices (*A _{n}*

*,*) in

**B**,_{n}**C**_{n}*n*-th viewpoint

*view_n*can be obtained by linear interpolation between

*view_0*(

*A*_{0}*,*) and

**B**,_{0}**C**_{0}*view_N*(

*A*_{N}*,*). The

**B**,_{N}**C**_{N}*view_0*is regarded as the reference viewpoint, and the positions of vertices

*view_n*are expressed as Eq. (3),

#### 3.2 Plane function

The edge function is a linear function which can be used to classify points on a 2D plane that is subdivided by a line, into three regions: the points to the “left” of the line, the points to the “right” of the line, and the points on the line [22]. In the single-view rasterization, the edge equation is a common method to calculate pixel coverage. As shown in Fig. 4(a), when the pixel q is on the same side of the three edges, the results of the 3 edge equations are all positive, and the pixel q is covered by the triangle ΔA_{n}B_{n}C_{n}. In multi-view rasterization, because of the addition of the dimension of viewpoint, the edge function is extended to the plane function. The 3 directed planes of an MTA can be represented by 3 plane functions as shown in Fig. 4(b). Each plane function is negative for points to the “left” of the plane, positive for points to the “right”, and zero for points on the plane. The four points (such as A_{0}, B_{0}, B_{N}, A_{N}) of on the same side are coplanar because the points of different viewpoints are only offset in the horizontal direction. For example, the normals of the 3 planes A_{0}B_{0}B_{N}A_{N}, B_{0}C_{0}C_{N}B_{N}, and C_{0}A_{0}A_{N}C_{N} can be expressed separately as

If the voxel q (x, y, z) on the same side of the three planes, which satisfies inequality Eq. (6), the voxel is covered by MTA.

The edge function is easy to incrementally update. For example, the plane functions neighbors are described as:

This property is well suited for a multi-view rasterizer, which steps through a polygon, generating one or a group of fragments at each step.#### 3.3 Barycentric coordinate interpolation

If voxels are covered by the MTA, the attribute values of these voxels need to be determined. In 3D models, the most common vertex attributes are color, normal, and texture coordinates. These attribute values of covered voxels can be obtained by interpolating 6 vertices of MTA. For the vertex interpolation of MTA, the linear interpolation is performed in the Z direction to obtain the areas of triangles, and then the barycentric coordinate interpolation of the triangle is performed. When the view ID is a constant, the plane function degenerates to an edge function. As shown in Fig. 4(b), for the *v-th* viewpoint, the edge function is expressed as

_{0}B

_{0}C

_{0}is expressed as

The attribute values of any points inside the triangle are expressed as the weighted average of 3 vertices. The weights (*W _{1}*,

*W*,

_{2}*W*) can be obtained by the barycentric coordinates method. Barycentric coordinates are used to express the position of any point located on the triangle with three scalars [30]. The location of this point includes any position inside the triangle, any position on any of the three edges of the triangles, or anyone of the three triangle's vertices themselves. According to the barycentric coordinates method, the barycentric coordinates can be obtained by the results of the edge functions and area of the triangle, as expressed in Eq. (11).

_{3}*x*,

*y*,

*v*) inside the MTA is expressed as:

## 4. Implementation of PMR with GPU

In this section, a hierarchical software rasterization rendering pipeline is designed. The structure of the pipeline is illustrated in Fig. 5. Our pipeline consists of four stages: primitive setup, tile rasterizer, fine rasterizer, and pixel encoding stage. Each stage is executed as a separate CUDA kernel launch with the dynamic parallelism technique [31]. The whole pipeline adopts a hierarchical structure to accelerate calculation, if it is judged that the pixels in the bounding box are not covered, the next stage will not be launched [32]. Primitive setup performs culling, clipping, snapping, and calculation of plane functions. Tile rasterizers generate perform different level axis-aligned bounding box (AABB) test, the tile outside the AABB is abandoned. Fine rasterizer, processes each frame buffer tile, computing exact coverage for the overlapping triangles, executing the shader. Finally, the multi-view images are encoding to the synthesized image. The resolution of our display device is 7680×4320, with 96 viewpoints. The resolution of each viewpoint is 1024×512, which is divided into 256×128×24 tiles. Each tile includes 4×4×4 voxels.

#### 4.1 Primitive setup

In this stage, the differentials, plane functions, and other data for the triangle are computed. These data are saved in global memory for interpolation of the various shading data in later stages. Each CUDA thread is given the task of processing one triangle created by a previously executed vertex shader. After reading the vertex positions, vertex clipping and culling are performed. If any of the vertices are outside the near or far clip plane, the triangle is processed by the clipper [33].

Multiple culling tests are performed for each triangle. Because of multiple view frustums in different viewpoints existing, the results of view culling tests are different. To keep both efficiency and correctness in mind, a conservative culling is performed using a single frustum body containing a collection of all the view frustums as shown in Fig. 6(a). Only primitives that are entirely or partially inside the view frustum need to be rendered. One way to speed up the rendering process is to compare the bounding box of each object to the view frustum. If the bounding box is outside the frustum, then the geometry it encloses can be omitted from rendering. If instead the bounding box is inside or intersecting the frustum, then the contents of that bounding box may be visible and must be sent through the rendering pipeline. Furthermore, if the area is negative or zero and backface culling is enabled. The backface culling results can be obtained by linear interpolation, so we only need to judge the values of *view_0* and *view_N* as shown in Fig. 6(b).

#### 4.2 Tile rasterization

According to the result of clipping and culling in the primitive setup stage, the tile rasterization stage is launched, as shown in Fig. 7. Triangles are fetched from the input queue in global memory, each thread is given the task of processing one triangle. Each CUDA thread block has one thread, and the number of blocks is equal to the number of triangles. The AABB test is performed to discard the tile which is not covered by MTA. The plane function calculations are performed on all the 3D tiles through several iterations, and the tiles covered by MTA are obtained. A 3D bitmask is used to record the results, and each bit in bitmask represents whether a tile is covered. Finally, according to the result of the AABB test, the fine rasterizer is launched, and the thread number is equal to the number of tiles. The information of triangles and bitmask is transformed into fine raster kernel function as input.

#### 4.3 Fine rasterization

Each thread is given the task of processing one tile, and the voxel coverage is computed through 4×4×4 iterations, as shown in Fig. 8. Calculate the coordinates of the current tile according to the thread number with the input triangle parameters. The surviving threads continue to execute the shader. Firstly, the plane function and barycenter coordinate interpolation of attributes calculations are performed. Then, calculate the depth and discard the fragment if the depth test fails. Finally, shading and other calculations are performed. After getting the color of each voxel, save it to the global memory frame buffer. To avoid memory conflicts, atomic operations should be used in the saving process. To reduce the influence of atomic operation on efficiency in parallel computing, we change the order of view calculation between different triangles, so that the color values of the same address are not modified at the same time.

#### 4.4 Pixel encoding stage

According to the parameters of the display device, the synthesized image can be obtained by pixel encoding from multi-view images [34,35]. Each sub-pixel of the synthesized image is from different viewpoints images, as shown in Fig. 9. The calculation rules of the view number *v* of each sub-pixel (*i*, *j*, *k*) are as follows:

*p*is the lens pitch,

_{u}*p*is the horizontal subpixel pitch, and

_{h}*N*is the total number of viewpoints.

*floor(x)*is a truncation function that maps a real number

*x*to the largest previous integer. Each thread is given the task of processing one pixel, which is regarded as the post-processing of the rendering pipeline. The computation of different threads are similar and non-interfering, so all the threads can be assigned in one thread block.

## 5. Results

#### 5.1 Experimental configuration

Here, a dense-viewpoint horizontal parallax light field display system shown in Fig. 10 is used, and its numerical parameters in the experiment are summarized in Table 1. In the experiments, a computer with a CPU Intel Core i9–9900K @ 3.60GHz and NVIDIA GeForce RTX 2080 GPU is used. The algorithm is implemented with OpenGL and CUDA10.2.

#### 5.2 Generation results

After calculation, the generated results are obtained. The multi-view images and the left-most and right-most viewpoint images are shown in Fig. 11. After pixel encoding, the synthesized image is generated as shown in Fig. 12. The reconstructed 3D images are displayed in different perspectives are shown in Fig. 13. Our algorithm can generate images of different viewpoints correctly at the same time.

#### 5.3 Performance analysis

The influence of resolution and viewpoint on the efficiency of each stage is tested, as shown in Fig. 14. The primitive setup and tile rasterizer are affected by the number of triangles. Fine raster is greatly affected by the number of triangles and the resolution. The encoding pixel stage is only related to the resolution. These results are consistent with expectations.

Our method is compared with the previous real-time generation methods, such as cuRE and MDIBR. The software renderer cuRE based on traditional single view rasterization is implemented with CUDA, which has the same framework as our method. The renderer presented by MDIBR [13] method is implemented with OpenGL. As shown in Fig. 15, when the total resolution is fixed, the calculation speed is not significantly affected by the change of the viewpoint number. Therefore, our method is suitable for multi-view image generation. Although the implementation in this paper is only implemented in software, our method is the fastest when the viewpoint number is greater than 10. For parallel computing with CUDA, optimization operations are used to ensure the compatibility of our program for a large number of viewpoints. So, these optimization operations may become an obstacle for efficiency improvement when the number of viewpoints is small, which causes lower efficiency when the viewpoint number is less than 10.

## 6. Conclusion

In summary, a highly parallel triangle rasterization algorithm for 3D generation is presented. Because of the addition of the dimension of viewpoint, the computation algorithms of primitive setup, fragment coverage, and interpolation are re-deduced. To verify the algorithm, a hierarchical GPU software renderer is designed. The experiment shows that the rasterizer can get the correct rasterization results, and real-time generating can be realized. The frame rate rises above 60 fps under the condition of 7680×4320 resolution, millions of vertices. In theory, compared with the traditional rasterization, the proposed algorithm exhibits obvious advantages in time complexity. If the hardware is used for our rasterization algorithm, the advantages of the algorithm will be more obvious.

## Funding

National Natural Science Foundation of China (61705014, 61905017); Fundamental Research Funds for the Central Universities (2019PTB-018, 2019RC13); National Key Research and Development Program of China (2017YFB1002900); BUPT Excellent Ph.D. Students Foundation (CX2020224).

## Disclosures

The authors declare no conflicts of interest. This work is original and has not been published elsewhere.

## References

**1. **M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics **10**(3), 512 (2018). [CrossRef]

**2. **G. Lv, Q. Wang, W. Zhao, and J. Wang, “3D display based on parallax barrier with multiview zones,” Appl. Opt. **53**(7), 1339–1342 (2014). [CrossRef]

**3. **J. Wen, X. Yan, X. Jiang, Z. Yan, F. Fan, P. Li, Z. Chen, and S. Chen, “Integral imaging based light field display with holographic diffusor: principles, potentials and restrictions,” Opt. Express **27**(20), 27441 (2019). [CrossRef]

**4. **X. Sang, X. Gao, X. Yu, S. Xing, Y. Li, and Y. Wu, “Interactive floating full-parallax digital three-dimensional light-field display based on wavefront recomposing,” Opt. Express **26**(7), 8883 (2018). [CrossRef]

**5. **X. Yu, X. Sang, X. Gao, D. Chen, B. Liu, L. Liu, C. Gao, and P. Wang, “Dynamic three-dimensional light-field display with large viewing angle based on compound lenticular lens array and multi-projectors,” Opt. Express **27**(11), 16024 (2019). [CrossRef]

**6. **S. Marschner, “The Graphics Pipeline,” in * Fundamentals of Computer Graphics* (A K Peters/CRC Press, 2019), pp. 159–182.

**7. **R. L. Cook, L. Carpenter, and E. Catmull, “The reyes image rendering architecture,” in Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1987 (1987), pp. 95–102.

**8. **T. Aila and S. Laine, “Understanding the efficiency of ray traversal on GPUs,” in Proceedings of the HPG 2009: Conference on High-Performance Graphics 2009 (2009), pp. 145–150.

**9. **M. Eldridge, H. Igehy, and P. Hanrahan, “Pomegranate: A fully scalable graphics architecture,” in Proceedings of the ACM SIGGRAPH Conference on Computer Graphics (2000), pp. 443–454.

**10. **D. B. Kirk and W. M. W. Hwu, *Programming Massively Parallel Processors: A Hands-on Approach*: 3rd Edition (2016).

**11. **S. Jiao, X. Wang, M. Zhou, W. Li, T. Hong, D. Nam, J.-H. Lee, E. Wu, H. Wang, and J.-Y. Kim, “Multiple ray cluster rendering for interactive integral imaging system,” Opt. Express **21**(8), 10070 (2013). [CrossRef]

**12. **S. L. Li, Q. H. Wang, Z. L. Xiong, H. Deng, and C. C. Ji, “Multiple orthographic frustum combing for real-time computer-generated integral imaging system,” J. Disp. Technol. **10**(8), 704–709 (2014). [CrossRef]

**13. **K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. **E90-D**(1), 233–241 (2007). [CrossRef]

**14. **Z. Yan, X. Yan, X. Jiang, and L. Ai, “Computational integral imaging reconstruction of perspective and orthographic view images by common patches analysis,” Opt. Express **25**(18), 21887 (2017). [CrossRef]

**15. **S. Xing, X. Sang, X. Yu, C. Duo, B. Pang, X. Gao, S. Yang, Y. Guan, B. Yan, J. Yuan, and K. Wang, “High-efficient computer-generated integral imaging based on the backward ray-tracing technique and optical reconstruction,” Opt. Express **25**(1), 330 (2017). [CrossRef]

**16. **J. Wen, X. Yan, X. Jiang, Z. Yan, Y. Wang, and J. Wang, “Nonlinear mapping method for the generation of an elemental image array in a photorealistic pseudoscopic free 3D display,” Appl. Opt. **57**(22), 6375 (2018). [CrossRef]

**17. **Y. Guan, X. Sang, S. Xing, Y. Li, and B. Yan, “Real-Time Rendering Method of Depth-Image-Based Multiple Reference Views for Integral Imaging Display,” IEEE Access **7**, 170545–170552 (2019). [CrossRef]

**18. **M. Halle, “Multiple viewpoint rendering,” in Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1998 (1998), pp. 243–254.

**19. **H. Liao, T. Dohi, and K. Nomura, “Autostereoscopic 3D display with long visualization depth using referential viewing area-based integral photography,” IEEE Trans. Vis. Comput. Graph. **17**(11), 1690–1701 (2011). [CrossRef]

**20. **B. Pang, X. Sang, S. Xing, X. Yu, D. Chen, B. Yan, K. Wang, C. Yu, B. Liu, C. Cui, Y. Guan, W. Xaing, and L. Ge, “High-efficient rendering of the multi-view image for the three-dimensional display based on the backward ray-tracing technique,” Opt. Commun. **405**, 306–311 (2017). [CrossRef]

**21. **J. McCormack and R. McNamara, “Tiled polygon traversal using half-plane edge functions, “ in Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware (2000), pp. 15–21.

**22. **J. Pineda, “A parallel algorithm for polygon rasterization,” in Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1988 (1988), pp. 17–20.

**23. **D. Crisu, S. Vassiliadis, S. D. Cotofana, and P. Liuha, “3D graphics tile-based systolic scan-conversion,” in Conference Record - Asilomar Conference on Signals, Systems and Computers (2004), Vol. 1, pp. 517–521.

**24. **L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, “Larrabee: A many-core x86 architecture for visual computing,” ACM Trans. Graph. **27**(3), 1–15 (2008). [CrossRef]

**25. **F. Liu, M. C. Huang, X. H. Liu, and E. H. Wu, “FreePipe: A programmable parallel rendering architecture for efficient multi-fragment effects,” in Proceedings of I3D 2010: The 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (2010), pp. 75–82.

**26. **S. Laine and T. Karras, “High-performance software rasterization on GPUs,” in Proceedings - HPG 2011: ACM SIGGRAPH Symposium on High Performance Graphics (2011), pp. 79–88.

**27. **A. Patney, S. Tzeng, K. A. Seitz, and J. D. Owens, “Piko: A framework for authoring programmable graphics pipelines,” ACM Trans. Graph. **34**(4), 1–13 (2015). [CrossRef]

**28. **M. Kenzel, B. Kerbl, Di. Schmalstieg, and M. Steinberger, “A high-performance software graphics pipeline architecture for the GPU,” ACM Trans. Graph. **37**(4), 1–15 (2018). [CrossRef]

**29. **W. R. Mark and G. Bishop, “* Efficient reconstruction techniques for post-rendering 3D image warping*,” Univ. North. Carolina Chapel Hill (1998).

**30. **P. B. Laval, “Mathematics for Computer Graphics - Barycentric Coordinates,” pp. 1–9, (2003).

**31. **J. Gómez-Luna and I. El Hajj, “CUDA dynamic parallelism,” in Programming Massively Parallel Processors: A Hands-on Approach: Third Edition (2017), pp. 275–304.

**32. **N. Greene, “Hierarchical polygon tiling with coverage masks,” in Proceedings of the ACM SIGGRAPH Conference on Computer Graphics (1996), pp. 65–74.

**33. **S. Jung, M. Hong, and M. H. Choi, “Free-form deformation axis aligned bounding box,” in * Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)* (LNCS, 2009), Vol. 5576, pp. 804–813.

**34. **X. Yu, X. Sang, D. Chen, P. Wang, X. Gao, T. Zhao, B. Yan, C. Yu, D. Xu, and W. Dou, “Autostereoscopic three-dimensional display with high dense views and the narrow structure pitch,” Chin. Opt. Lett. **12**(6), 060008 (2014).

**35. **X. Yan, J. Wen, Z. Yan, T. Zhang, and X. Jiang, “Post-calibration compensation method for integral imaging system with macrolens array,” Opt. Express **27**(4), 4834 (2019). [CrossRef]