A novel model for three dimensional (3D) interactive control of viewing parameters of integral imaging systems is established in this paper. Specifically, transformation matrices are derived in an extended homogeneous light field coordinate space based on interactive controllable requirement of integral imaging displays. In this model, new elemental images can be synthesized directly from the ones captured in the record process to display 3D images with expected viewing parameters, and no extra geometrical information of the 3D scene is required in the synthesis process. Computer simulation and optical experimental results show that the reconstructed 3D scenes with depth control, lateral translation and rotation can be achieved.
© 2012 OSA
Integral imaging (II)  is a promising technology for displaying three dimensional (3D) images with full parallax and continuous viewpoint. It has the advantages such as working with incoherent light and being observed without special glasses. An II system contains two processes: the record process and the display process. In the record process, a lenslet array is placed in front of the CCD. The light rays emitted from the 3D scene through the lenslet array are recorded as small images by the CCD. These small images are called elemental images, and formed an elemental image array (EIA). Then the EIAs are used to reconstruct 3D images by optical II displays or computer visualizations in the display process. Nowadays, various applications of II are reported, such as 3D object recognition [2–4], depth estimation and extraction [5–7], visualization of partially occluded objects [8–10] and live 3D TV [11–13].
However, II systems suffer from the pseudoscopic problem, in which the reconstructed 3D images are depth reversed. Meanwhile, the parameters of the 3D II displays usually do not match those of the pickup systems. Hence how to create EIAs with appropriate parameters from the original EIA captured in the record process becomes one of the important aspects in 3D II systems. And the viewing parameters of the reconstructed 3D images such as depth position, lateral translation position and rotation angle should be controlled interactively in the synthesis process simultaneously to obtain desired 3D images without changing the structure of the II displays. To achieve a pseudoscopic-to-orthoscopic (PO) conversion, Okano et al.  present a digital method in which each elemental image is rotated 180° around its center to display depth reversed orthoscopic scenes, but it has a drawback that EIAs created by this method can be displayed only in the virtual field. A two-step pickup model, which is first proposed by Ives , is widely used to realize a PO conversion. The principle of this method is to utilize a virtual display lenslet array to reproduce the 3D scene and then the new EIAs are picked up by a second lenslet array paralleled to the virtual display lenslet array. For interactive control of viewing parameters, the two-step pickup model is modified to generate EIAs to reconstruct 3D images at expected depth positions. Arai et al. [16,17] propose a method to control the depth position by locating the second lenslet array at a corresponding depth position, and new EIAs are calculated by introducing the depth information of the viewing points into the two-step pickup model. Then, a method named smart pixel mapping (SPIM) is proposed by Martínez-Corral et al., which permits a PO conversion of the 3D scene and a depth position control . They improve the SPIM method to calculate new EIAs which are fully adapted to the display monitor characteristics, and the update version of SPIM is called SPOC . Due to the parallel structure of the two lenslet arrays in this model, these methods cannot synthesize EIAs which reconstruct 3D images with rotated viewing angles. In 2009, Taguchi et al.  have developed an interactive controllable II system. They create EIAs by a layer-based rendering method to display 3D images with different convergence planes and parallax.
In this paper, we propose a novel model to create interactive controlled EIAs by a light field transformation in the homogeneous light field coordinate space which is extended from the conventional light field coordinate space. Specifically, this new transform model, referred as homogeneous light field transformation (HLFT), permits us to synthesize the EIAs from the elemental images picked up in the record process directly without geometrical information of the 3D scene. Transformation matrices are derived, by which we can realize the 3D images display with depth control, lateral translation and rotation. And the EIAs synthesized by the proposed method can be adapted to the display devices whose parameters are different from that of the pickup ones.
The paper is organized as follows. Section 2 is devoted to the principle of the HLFT model. In Section 3, we describe the calculation process of the HLFT model in detail. Section 4 is the experimental verification of the HLFT model and discussion. Finally, in Section 5, the achievements of the proposed method are summarized.
2. EIAs synthesis in interactive controllable II systems
In interactive controllable II systems, new EIAs are synthesized from the original EIAs to reconstruct 3D images with desired viewing parameters by appropriate transformation models.
2.1 Principle of the conventional two-step pickup model
In II systems, EIAs are picked up in the record process, and integrated into 3D images by the same lenslet array in the display process. However, the lenslet arrays in II displays might not be same as those in the original II pickup systems; in fact, most II displays have their own parameter settings which are different from those of the pickup system. Moreover, the reconstructed 3D images are usually pseudoscopic, which means the depth of objects in the reconstructed 3D scene is reversed. Hence the EIAs captured in the record process would not be suitable for displaying on the II monitors directly, it is necessary to synthesize EIAs with a PO conversion and the parameters matching in the II display. Meanwhile, the parameters of the lenslet array in the II display limit the display depth and viewing angle of the reconstructed 3D images. Thus the viewing parameters of the reconstructed 3D images, such as the depth position, lateral translation and rotation, should also be controlled interactively in the process of synthesizing the new EIAs to obtain expected reconstructed 3D images in the display process.
The two-step pickup model is proposed to synthesize EIAs to display orthoscopic 3D images, and its modified version SPOC can match the parameters of synthetic EIAs to the II displays. In the two-step pickup model, two virtual lenslet arrays are placed parallel to each other, as shown in Fig. 1 . The original EIA is used to reproduce the 3D scene around the reference plane through the first lenslet array, and then the virtual 3D scene is picked up by the synthetic lenslet array. The depth position control of the reconstructed 3D images could be achieved by adjusting the distance ds from the reference plane to the synthetic lenslet array. The depth information of the 3D scene, i.e., the position of the viewpoints [16,17] or the distance do from the first lenslet array to the reference plane [18,19], is required. However, in II displays, the depth information of the 3D scene is usually unknown. Furthermore, we cannot synthesize EIAs corresponding to rotated 3D images by this two-step pickup model for its parallel structure of the lenslet arrays.
2.2 Proposed HLFT model
To synthesize EIAs which reconstruct 3D images with interactive controllable viewing parameters, a light field transformation model is presented in this paper. In the proposed model, each pixel in the EIAs is considered as a ray in the light field of the 3D scene, and these rays are parameterized in the homogeneous light field coordinate space defined by the two planes of the lenslet array and its EIA. Therefore, the synthesizing process is represented as a light field projection transformation of the two homogeneous light field coordinate spaces defined by the original/synthetic lenslet array and their EIA planes.
The homogeneous light field coordinate space is extended from the conventional light field coordinate space. As shown in Fig. 2 , a light field coordinate space is formed by a global coordinate system XY on the lenslet array plane with its origin at the center of the lenslet array and local coordinate systems UV on the EIA plane with their origins at the center of each elemental image. A ray, whose function in the 3D spatial coordinate system XYZ is defined as (x-x0)/u0 = (y-y0)/v0 = z/g, where g is the gap between the lenslet array and the EIA plane, have the two different intersections of (x0, y0) on XOY plane and (u0, v0) in the corresponding UV coordinate system. Thus it can be expressed as (x0, y0, u0, v0) in the conventional light field coordinate space. Since a homogeneous coordinate is able to represent the transformations of both translation and rotation more concise by a matrix, the ray is expressed by a coordinate vector of t = (x0, y0, u0, v0, 1)′ in the homogeneous light field coordinate space.
The scheme for the HLFT model is shown in Fig. 3 . The original lenslet array and its EIA plane determine a homogeneous light field coordinate space So. The synthetic lenslet array and its EIA, which have the same parameters and are placed at the same position as the II display, determine another homogeneous light field coordinate space Ss. The projection transformation relationship of the light rays between these two coordinate spaces can be represented by a matrix C; and the generation of the interactive controlled EIAs can be implemented by a pixel mapping with a coordinate vector transformation. Specifically, assume a ray emitted from the 3D scene is recorded in the pickup EIA with a coordinate vector to = (xo, yo, uo, vo, 1)′ in coordinate space So, and it is denoted as ts = (xs, ys, us, vs, 1)′ in coordinate space Ss. The transformation relationship between these two vectors can be written as:Eq. (1).
Compared with conventional two-step pickup methods, the proposed model creates the interactive controlled EIAs by a linear transformation in the homogeneous light field coordinate space directly without reproducing the 3D scene. With this model, the rotation of the reconstructed 3D images can also be achieved by controlling the rotation angle of the synthetic lenslet array. And the knowledge of the reference plane or other geometrical information about objects in the 3D scene is not necessary in the transformation, since the transformation matrix is only determined by the relative position of the original lenslet array and the synthetic lenslet array.
3. Calculation of the EIAs with the HLFT model
The calculation of the interactive controllable EIAs with the HLFT model can be realized by two processes. The first process is to calculate the transform matrix C with the parameters set by users, and the second process is pixel mapping.
3.1 Translation and depth control
For the calculation of the EIAs with translation and depth control, the mapping between the two light field coordinate vectors to and ts is represented as
Assuming that the synthetic lenslet array which defined a spatial coordinate system XYZs is translated to the position (x, y, z) in the spatial coordinate system XYZo defined by the original lenslet array, the coordinate transformation of the two spatial coordinate systems is as follows:
If x, y = 0, the matrix in Eq. (4) becomes a depth control matrix to synthesize a new EIA used to display 3D images with the depth parameter set by users. It is easily seen in Fig. 3 that when the signs of the two parameters go and gs are opposite, the depth of the reconstructed 3D image would reverse, especially, when go < 0 and gs > 0, a PO conversion is done.
For a rotation, the coordinate vector to of a ray in the original light field coordinate system can be represented by the production of a rotation matrix Cr and the coordinate vector ts in the rotated homogeneous light field coordinate system:
If the synthetic lenslet array is rotated around the X-axis with an angle α, the relationship between these two spatial coordinate systems is
If the synthetic lenslet array is rotated around the Y-axis with an angle β, the conversion relation of the two spatial coordinate systems is
For a synthetic lenslet array rotates around the Z-axis with an angle γ, the conversion relationship is represented as
To synthesize a new EIA, the transform matrix should be calculated by the translation matrix and the rotation matrices before pixel mapping. For a synthetic EIA, whose lenslet array is translated to the point (x, y, z) in the original spatial coordinate system on the first step (expressed as a translation vector vt = (x, y, z, 1)), and then rotated in turn around the X, Y, Z axis with the angles α, β and γ, respectively (expressed as a rotation vector vr = (α, β, γ, 1)), the transform matrix C is written as
3.3 Pixel mapping
The second process is pixel mapping. Let Isijkl denote the value of a pixel in the synthetic EIA, which is in an elemental image of the ith row and the jth column, and has indexes k, l in the corresponding element image. Here if the pixel is at the center of the central elemental image, all the four subscripts of i, j, k, l are equal to zero. The corresponding coordinate vector of the pixel Isijkl is represented as ts = (xs, ys, us, vs, 1)′ in the synthetic homogeneous light field coordinate space:
Then the homogeneous coordinate vector to of this ray in the coordinate space defined by the original lenslet array and its EIA is calculated by Eqs. (1) and (12). The corresponding pixel (defined as Ioqrst) recorded in the original EIA can be written as
Thus the pixel values of the synthetic EIA can be obtained by mapping as follows:
4. Experimental results
To demonstrate the feasibility and validity of the HLFT model, we apply it to generate EIAs with different viewing parameters of the reconstructed 3D images, e.g., depth position, lateral translation, and rotation angle. Computer visualizations and optical experiments are performed to show the reconstructed 3D images.
4.1 Experimental setup and parameters
In the experiments, an apple, a pear and one and a half oranges are established in 3DS MAX software to form a virtual 3D scene. Then the elemental images are picked up by a synthetic aperture method proposed in , in which a virtual camera created by the software is translated to pick up all the elemental images one by one, as shown in Fig. 4(a) . The focal length of the camera is 15 mm, the aperture is 36 mm, and the field of view (FOV) is 100.4°. The apple, the pear and the center of the one and a half oranges are placed at the distance of 120 mm, 180 mm, and 130 mm from the camera, respectively. A set of 31 H × 31 V images with a pitch of p = 10 mm is obtained. The size of each elemental image is 36 × 36 mm with a resolution of 320 × 320 pixels. Figure 4(b) shows a portion of the elemental images.
To prove the proposed model, EIAs are generated from the original set of elemental images, and both computer simulations and optical experiments are performed to reconstruct the 3D images. The reconstruction lenslet array is composed of 27 × 27 lenslets with focal length f = 10.4 mm, pitch p = 7.55 mm, and the size of each elemental image is 29 × 29 pixels. The synthetic EIAs are placed at a distance of g = f = 10.4 mm behind the lenslet array to form an II display with large depth which is first proposed in . The synthetic EIAs are printed with a high-resolution inkjet printer (Cannon iX4000, the maximum resolution is 4800*1200 dpi) in the optical experiments. Due to the limited experimental conditions, we use two lenticular sheets crossed orthogonally instead of a lenslet array which is proposed in , as shown in Fig. 5 , and the overlapped part of the two cylindrical lenses corresponds to elemental lenses with rectangular aperture of the conventional lenslet array . In the computer simulations, the visualization calculations are realized in the computer with the method proposed in . A digital camera of SONY Cyber-shot DSC-TX100 with a 16.2 mega pixels CMOS sensor (a virtual observer in the computer visualization) is placed at a distance of 1.5 m from the lenslet array to capture the reconstructed 3D images. The left and the right perspectives are taken at 60 mm from the center view (approx. 2.3°), respectively.
4.2 Depth control
Figure 6 shows a synthetic EIA with a translation vector of vt = (0, 0, 0, 1)′. The image quality metrics of the mean squared error (MSE) and the peak signal-to-noise ratio (PSNR) are calculated to evaluate the synthetic EIA. The MSE between the synthetic EIA in Fig. 6 and the EIA picked up directly with the same parameters is 26.9, which shows the square error between these two EIAs is small. The value of PSNR is 33.8 dB, and it is greater than the commonly accepted value of 30 dB, which means the differences between these two EIAs are less than one thousandth of the scene information, as analyzed in . The values of these two metrics indicate that the EIA synthesized by the proposed model is very similar to the EIA which is directly picked up by the lenslet array. Thus the proposed model can be applied to synthesize EIAs when parameters of the II display are different from that of the record system.
Figure 7 shows the reconstruction results of the EIA in Fig. 6 with two different viewpoints. The 3D images reconstructed by the computer simulation are shown in Fig. 7(a), and the optical reconstruction results are shown in Fig. 7(b). It is seen that virtual orthoscopic 3D images are reconstructed, and all the fruits are behind the display screen. Additionally, Fig. 7 shows apparent granular noise caused by the wide pitch of the lenslet array, which results in the low resolution of the reconstructed 3D images.
To demonstrate the capability of the HLFT model to control the depth of the displayed 3D images, an EIA is created with its lenslet array moving towards the scene by 180 mm, that is, the displayed 3D image would be 180 mm towards the lenslet array, as shown in Fig. 8 .
The computer reconstructed 3D images are shown in Fig. 9(a) , while the optical reconstruction results in Fig. 9(b). It is seen that the fruits in Fig. 9 are a little bigger than that in Fig. 7, and the shift of the fruits from the left viewpoint to the right viewpoint is larger. It indicates the fruits in Fig. 9 are closer to the observer; and are in front of the display screen as real orthoscopic images as well. Thus, the depth control of the displayed 3D images is implemented based on the synthetic EIAs by utilizing the proposed model in both virtual and real field.
4.3 Translation and rotation
To verify the feasibility of HLFT model in generating EIAs which translate and rotate the reconstructed 3D images, two EIAs with different translation and rotation vectors are synthesized.
In Fig. 10 , an EIA is synthesized by the HLFT model with a rotation vector of (−20°, 30°, 0°, 1)′. A translation vector of (20 mm, 30 mm, 150 mm, 1)′ is used to keep the fruits at the lateral center of the image with an appropriate display depth.
Figure 11(a) shows the computational reconstruction results of the EIA shown in Fig. 10, and the optical results are shown in Fig. 11(b). It is obvious that the relative position between the fruits is different from that in Fig. 7 and Fig. 9, and the pose of the fruits is also rotated as upper-right views of the original scene.
Another EIA generated by the proposed model with a translation vector of (−40 mm, 0 mm, 150 mm, 1)′ and a rotation vector of (0°, −35°, 0°, 1)′ is shown in Fig. 12 .
Figure 13(a) shows the computational reconstruction images and Fig. 13(b) shows the optical reconstruction results. From Fig. 13 we can see that the positions and the pose of the fruits are changed, and the reconstructed 3D images seem to be left views of the original scene.
In order to evaluate the proposed model in quantity, EIAs are created with a translation vector vt = (30 mm, 40 mm, 0 mm, 1)′ and a rotation vector vr = (0°, 0°, 30°, 1)′, respectively. In the reconstructed 3D image with translation, the horizontal and vertical offsets of the target measured in the optical reconstruction experiment are 29.0 mm and 40.5 mm, respectively. For the rotated EIA, the rotation angle of the reconstructed target around the Z-axis is 28° measured in the optical reconstruction experiment. The measured result show good agreement with the pre-set viewing parameters in the model on the computer. There are slight differences which might be caused by equipment precision, operation errors, low resolution of the reconstructed target and so on.
In the proposed model, the 3D scene is considered as a light field composed of a lot of rays. The lenslet array and its EIA plane record the information of the light field, and the display process of an II system is a reproduction of the light field. Thus, when a transformation is done by the proposed model, the light field of the reconstructed 3D image is the same as that of the original 3D scene, but the depth position or the rotation angle is changed according to the translation vector and the rotation vector. And whether the reconstructed 3D image is virtual or real is determined by the relative position between the lenslet array and the objects in the light field, e.g., in Fig. 7, the reconstructed fruits are behind the screen and in Fig. 9 they are in front of the screen. Therefore, the proposed model can synthesize EIAs with depth control, and the reconstructed 3D images can be displayed in both the real and virtual field.
Since the viewing angle of II displays is usually limited by the parameters of the lenslet array, the proposed model could be helpful to display 3D images with a wider viewing angle by computational process without changing the hardware structure of the II display. For example, the viewing angle of the reconstruction lenslet array used in this paper is 39.9°, and the measured viewing angle of the optical setup without flipping is about 25°. Figure 10 shows an EIA which rotates around the X-axis of −20° and around the Y-axis of 30°, and Fig. 12 shows an EIA rotated around the Y-axis of −35°. With these two EIAs synthesized by the proposed model, we can enlarge the viewing angle of the 3D scene without changing the display lenslet array. However, the information recorded in the original elemental images cannot be increased by the proposed model, and all information in the synthetic EIA is from the EIA captured in the pickup process. Hence the maximum rotation angle should not exceed half of the FOV of the original lenslet array.
Since the rays from the 3D scene are sampled by the lenslet array and the pixels on the EIA plane simultaneously, and the bandwidth of the sampled light field signal without aliasing is mainly determined by not only the pitch of the lenslets but also the minimum and maximum depth of the objects in the 3D scene, as analyzed in [26–28], the effect of aliasing would degrade the quality of the synthetic EIAs and the reconstructed 3D images. In Fig. 7, Fig. 9, Fig. 11 and Fig. 13, the resolution of the reconstructed 3D images is not very high, and the granular noise is visible. This is caused by the wider pitch of the display lenslet array and that of the original elemental images. The change of the depth might increase the aliasing, and the effect appears in Fig. 8, Fig. 10 and Fig. 12, respectively. To improve the quality of the 3D images, more dense sampling is required, which would result in weak points of large amount of data and high computational cost. To solve this problem, parallel computing should be used to process the dense sampling data of the light field to meet the future applications. Digital anti-aliasing technology on the synthetic EIAs should also be developed to improve the quality of the 3D images.
In this paper, we have presented a general model for interactive control of viewing parameters in II systems with a transformation in the homogenous light field coordinate space. The proposed model can be applied to synthesize EIAs from the original elemental images without geometrical information of the 3D scene. Especially, the controllable viewing parameters include not only depth position but also rotation angle, which could enlarge the viewing angle of the reconstructed 3D images without changing the structure of II displays. Computer simulation and optical experimental results show the good performance of the proposed model, where orthoscopic 3D scenes are reconstructed at the positions with specific rotation angles controlled by users. For its concise matrix representation of the proposed model, parallel computing would improve the calculation efficiency, and this transformation technology would be of benefit in many fields in the future, especially live 3D TV systems.
This work is supported by the National Natural Science Foundation of China (61007014) and the Global Research Outreach Program from Samsung Electronics Co. Ltd (HX0112050112).
References and links
1. G. Lippmann, “Epreuves reversibles donnant la sensation du relief,” J. Phys. 7, 821–825 (1908).
6. I. Chung, J.-H. Jung, J. Hong, K. Hong, and B. Lee, “Depth extraction with sub-pixel resolution in integral imaging based on genetic algorithm,” in Digital Holography and Three-Dimensional Imaging, OSA Technical Digest (CD) (Optical Society of America, 2010), paper JMA3.
7. D.-C. Hwang, D.-H. Shin, S.-C. Kim, and E.-S. Kim, “Depth extraction of three-dimensional objects in space by the computational integral imaging reconstruction technique,” Appl. Opt. 47(19), D128–D135 (2008). [CrossRef]
8. J.-H. Jung, K. Hong, G. Park, I. Chung, J.-H. Park, and B. Lee, “Reconstruction of three-dimensional occluded object using optical flow and triangular mesh reconstruction in integral imaging,” Opt. Express 18(25), 26373–26387 (2010). [CrossRef]
9. D.-H. Shin, B.-G. Lee, and J.-J. Lee, “Occlusion removal method of partially occluded 3D object using sub-image block matching in computational integral imaging,” Opt. Express 16(21), 16294–16304 (2008). [CrossRef]
11. W. Matusik and H. Pfister, “3D TV: a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes,” ACM Trans. Graph. 23, 814–824 (2004). [CrossRef]
13. Y. Taguchi, T. Koike, K. Takahashi, and T. Naemura, “TransCAIP: A live 3D TV system using a camera array and an integral photography display with interactive control of viewing parameters,” IEEE Trans. Vis. Comput. Graph. 15(5), 841–852 (2009). [CrossRef]
15. H. E. Ives, “Optical properties of a Lippmann lenticulated sheet,” J. Opt. Soc. Am. A 21(3), 171–176 (1931). [CrossRef]
16. J. Arai, M. Kawakita, and F. Okano, “Effects of sampling on depth control in integral imaging,” Proc. SPIE 7237, 723710, 723710-12 (2009). [CrossRef]
18. M. Martínez-Corral, B. Javidi, R. Martínez-Cuenca, and G. Saavedra, “Formation of real, orthoscopic integral images by smart pixel mapping,” Opt. Express 13(23), 9175–9180 (2005). [CrossRef]
19. H. Navarro, R. Martínez-Cuenca, G. Saavedra, M. Martínez-Corral, and B. Javidi, “3D integral imaging display by smart pseudoscopic-to-orthoscopic conversion (SPOC),” Opt. Express 18(25), 25573–25583 (2010). [CrossRef]
22. D.-H. Shin, M. Cho, and E.-S. Kim, “Computational implementation of asymmetric integral imaging by use of two crossed lenticular sheets,” ETRI J. 27(3), 289–293 (2005). [CrossRef]
23. H. Navarro, R. Martínez-Cuenca, A. Molina-Martín, M. Martínez-Corral, G. Saavedra, and B. Javidi, “Method to remedy image degradations due to facet braiding in 3D integral-imaging monitors,” J. Display Technol. 6(10), 404–411 (2010). [CrossRef]
24. H.-B. Xie, X. Zhao, Y. Yang, J. Bu, Z. L. Fang, and X. C. Yuan, “Cross-lenticular lens array for full parallax 3-D display with Crosstalk reduction,” Sci. China Technolog. Sci. 55(3), 735–742 (2012). [CrossRef]
25. R. Damasevicius and G. Ziberkas, “Energy consumption and quality of approximate image transformation,” Electron. Electr. Eng. 120, 79–82 (2012).
26. J. X. Chai, X. Tong, S. C. Chan, and H. Y. Shum, “Plenoptic sampling,” in Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’00) (ACM Press, 2000), pp. 307–318.
27. C. Zhang and T. Chen, “Spectral analysis for sampling image-based rendering data,” IEEE Trans. Circ. Syst. Video Tech. 13(11), 1038–1050 (2003). [CrossRef]