A novel technique generating arbitrary view images in perspective and orthographic geometry based on integral imaging is proposed. After capturing three-dimensional object using a lens array, disparity estimation is performed for the pixels at the selected position of each elemental image. According to the estimated disparity, appropriate parts of elemental images are mapped to synthesize new view images in perspective or orthographic geometry. As a result, the proposed method is capable of generating new view images at arbitrary positions with high resolution and wide field of view.
© 2008 Optical Society of America
View image generation is a crucial issue for three-dimensional (3D) displays and 3D optical information processing. 3D displays are evolving from two-view based method to multi-view based method for providing more natural and fatigue-free 3D images. Currently, 9 or 13 view 3D displays are already commercially available and endeavor to increase the number of views is on-going. Recently, even super multi-view displays which aim to deliver more than a few tens of view images to observer have also been reported.
View image at arbitrary position is also important in the field of 3D optical information processing. Many 3D optical correlator techniques require number of view images for detecting and locating 3D objects in a given signal space. Moreover, some techniques utilize view image in orthographic projection geometry (also called sub-image or directional image) for scale and rotation invariant detection, which prevents direct image capturing using multiple cameras. View images in dense sampling grid can also serve as input images for generating 3D hologram. For these 3D optical information processing techniques, the number of the view images is a key factor determining their performance. Therefore more view images are more desirable.
As required number of the view images increases, the acquisition of the view images becomes more important. Naive approach to capture view images using same number of cameras becomes immediately impractical with the increase of the number of views. Alternate method is intermediate view generation (IVR) technique which synthesizes new view image based on the dense disparity map obtained from the two view images. IVR, however, still requires at least two cameras, thus the calibration of cameras remains as a problem. Small number of captured views also limits the accuracy of the view generation.
Recently, integral imaging has been attracting great attention as an effective method for 3D information processing. Integral imaging is a technique for capturing and displaying 3D scene using a lens array.[7–9] Numerous elemental lenses of the lens array sample the ray space at their principal points in the pickup process, generating same number of elemental images. Since the set of the elemental images represents the ray space of the 3D object, 3D information of the object is embedded in the elemental images. Hence, by utilizing the elemental images, 3D information of the object can be manipulated. Based on this property, several techniques including 3D shape and texture reconstruction or 3D image sensing have been proposed, recently.[10–11]
In this paper, we propose a novel view image generation technique based on integral imaging. Figure 1 shows the concept of the proposed method. After obtaining elemental images of the 3D object using a lens array, the proposed method finds disparity for the central or directional pixel of each elemental image. Based on the disparity information, the appropriate patches from the elemental images are mapped to the view image plane, synthesizing a new view image. Owing to dense sampling grid of the lens array, the proposed method is capable of generating not only the perspective projection images but also the orthographic projection images with high accuracy which cannot be obtained by direct camera capturing. The proposed method requires only one camera and a lens array making the overall system compact and alleviating the camera calibration process. Since the patches of the elemental images are collected and interpolated appropriately, the full resolution of the elemental images are maintained in the generated new view images. The correspondence analysis is performed only for one pixel per elemental image, which reduces the computational cost.
The proposed method is based on Passalis et al.’s 3D shape and texture reconstruction method. Passalis et al. firstly demonstrated the reconstruction of the 3D object mesh model overcoming small field of view (FOV) of the elemental image. The main contributions of the proposed method are the mapping of the elemental image patches to the view image plane and the orthographic projection view generation through the directional pixel disparity detection. The mapping of the elemental image patches makes it possible to use full resolution of the elemental images with the disparity detection only for selected pixels. Orthographic projection view generation based on integral imaging with high resolution is firstly demonstrated by the proposed method, to the authors’ knowledge. Moreover, the use of the directional pixels instead of the central pixels in the orthographic projection view generation enhances the robustness against the disparity estimation error. In the followings, we explain the principle of the proposed method and verify its feasibility using simulations and experiments.
2. Disparity estimation and projection geometry
Disparity is defined as a position difference of corresponding points between view images. In the integral imaging pickup setup shown in Fig. 2, the disparity is determined by the position difference of the corresponding points between the elemental images. In order to find the disparity, a correspondence should be matched between the elemental images. Among various methods for correspondence matching, we used multi-baseline stereo algorithm in this paper. The disparity of Ep1,q1(u,ν) denoting elemental image pixel value at (u,ν) of [p 1, q 1]-th elemental image, is determined as a value minimizing sum of the sum of the squared difference (SSSD) which is given by
where d is the disparity in pixel unit, W is the window around (u,ν), and N is the neighboring elemental image index around [p 1, q 1]-th elemental image. 4 neighboring elemental images, i.e. up, down, left, right to the given elemental image, were used for the correspondence matching in all simulations and experiments in this paper. After the disparity is measured, the depth Z can be directly calculated from the disparity and the pickup system parameters as:
where s is the pixel pitch of the CCD, f and φ are the focal length and the lens pitch of the elemental lens, respectively.
The accuracy of the disparity detection has a direct effect on the quality of the generated view image in the proposed method. The factor limiting the accuracy of the disparity detection in the proposed method is small baseline, which is the separation between input images in the correspondence matching. The baseline coincides with the elemental lens pitch in the proposed method, and hence it is generally smaller than in usual stereo vision cases using multiple cameras. Small baseline of the proposed method leads to small disparity at a given depth, making it hard to detect precise disparity value. In order to keep the disparity to a reasonable value, the depth should be small since the disparity is inversely proportional to the depth. Consequently, the proposed method is most efficiently applied to the view generation of the 3D objects located close to the lens array. Maximum detectable depth can be found as fφ/s by letting d=1 in Eq. (2).
Projection geometry determines the way of mapping 3D object to the view plane. Figure 3 shows perspective and orthographic projection geometry. In perspective projection geometry, the object space is sampled at a grid diverging from a vanishing point, assuming uniform pixel spacing in the view plane. Usual pinhole modeled camera has the perspective projection geometry. Hence each elemental image in the integral imaging pickup setup has perspective projection geometry. On the contrary, in orthographic projection geometry, the object space is sampled at a parallel grid as shown in Fig. 3(b) without any vanishing point. Sub-image (or called directional image), which is a collection of the pixels at the same position in every elemental image, has the orthographic projection geometry.
In fact, the elemental images and the sub-images already represent the view image in perspective or orthographic projection geometry at many viewing positions. The FOV and the resolution of the naive elemental images and the sub-images, however, are seriously limited. The FOV of the elemental image is limited by FOV=2tan-1(φ/2f) and the resolution of the sub-image does not exceed the number of the elemental images, which are too small or too coarse for real applications. The purpose of the proposed work is to eliminate such limitations on the FOV and the resolution. Based on the disparity information of the selected pixels in elemental images, the proposed method generates wide FOV and high resolution view images in perspective or orthographic geometry. In addition, the proposed method can synthesize the view image at arbitrary position, not limited to the fixed principal points of the elemental lenses like naive elemental images and sub-images. In following sections, we explain the principle of the proposed method.
3. View image generation in perspective projection geometry
Figure 4 shows the one-dimensional (1D) geometry of the proposed method. First, for every elemental image, the correspondence analysis is performed for the central pixel, i.e. one pixel located at the center of each elemental image, to find its disparity. Based on the detected disparities of the central pixels C1, C2, …,CN in elemental images, the positions of the corresponding object points O1, O2, …, ON are calculated using Eq. (2) as shown in Fig. 4(a). The object points O1, O2, …, ON, then, are projected to the view image plane at a given view point position. Consequently, the central pixels C1, C2, …, CN in elemental images are projected to the view plane points V1, V2, …, VN in the generated view image as shown in Fig. 4(b).
The area between the view plane points V1, V2, …, VN in generated view image is linearly interpolated with the texture in corresponding area of the elemental images. The area between Vn-1 and Vn on the view plane corresponds to the object area defined by On-1 and On as shown in Fig. 4(c), hence is interpolated with the elemental image area between Cn and C′n-1, where C′n-1 denotes the corresponding point of Cn-1 in the [n]-th elemental image as shown in Fig. 4(d). In fact, the choice of the elemental image that used for the area filling is arbitrary. In Fig. 4(d), the area between Cn-1 and C′n in [n-1]-th elemental image or any corresponding area in other elemental images can be used instead of the area between Cn and C′n-1 in [n]-th elemental image, since they all correspond to the same area between Vn-1 and Vn in the view image plane. The simultaneous use of the all corresponding areas in multiple elemental images has a potential to enhance the resolution of the generated view image. In this case, since the simple averaging over multiple elemental images does not provide any real improvement of the resolution, more sophisticated algorithm like super-resolution with sub-pixel precision dense disparity map is required. In this paper, only one elemental image is selected and used for the view image area filling in order to alleviate the need for dense disparity map detection with sub-pixel precision. A selection rule used in the simulations and experiments in this paper is to pick the elemental image in which the corresponding elemental image area is located around the principal axis in order to minimize the effect of elemental lens aberration.
Extending to two-dimensional (2D) case, the area inside the triangle defined by Vm,n, Vm+1,n, Vm,n+1 on the view image plane is filled using the corresponding area in the [m, n]-th elemental image defined by Cm,n, C′m+1,n, C′m,n+1 as shown in Fig. 5. More specifically, the pixel value of a view plane point V satisfying
is given by that of the elemental image point E whose location is determined by
where a and b are from Eq. (3). Similarly, the area inside the triangle defined by Vm,n, Vm-1,n, Vm,n-1 on the view image plane is also interpolated using corresponding are defined by Cm,n, C′m-1,n, C′m,n-1 in the [m, n]-th elemental image. Each elemental image accounts for two triangular areas on the view image plane, filling whole view image plane without any missing holes. In case of overlapping of the triangular areas on the view image plane, the Z-buffering technique is utilized to select correct pixel value such that correct occlusion is synthesized.
Since the geometric structure of the object is reconstructed using the disparity information of all central pixels of the elemental images, whole object area covering the lens array can be contained in the generated view image, which alleviates the FOV limitation of each elemental image. Moreover, since texture in all elemental images is projected into the generated view image, the resolution of the elemental images is fully utilized.
4. View image generation in orthographic projection geometry
View image in orthographic projection geometry can also be generated in similar way except a pixel used for the disparity estimation. Figure 6 shows the geometry of the proposed method. Referring to Fig. 6(a), if desired orthographic projection angle is θ, the pixels at angle θ with respect to the optic axes of the elemental lenses (called directional pixels) are calculated for disparity detection and projected to the view image plane. The area between projected view image points is textured by the corresponding area in the elemental images similarly as the perspective view generation explained in previous section. Therefore the area between the view image points Vn-1 and Vn is textured using the area between Dn and D′n-1 in n-th elemental image as shown in Figs. 6(c) and (d). Figure 7 shows 2D view of the elemental image mapping.
Since the projection grid is parallel in the orthographic projection geometry, the view image is fully characterized by the projection angle (θx, θy) rather than view point position (X, Y, Z) as in perspective projection geometry case. The available range of the projection angle is limited by the maximum deviation of the directional pixels from the optic axis of the elemental lens. Therefore the range of the projection angle (θx, θy) is given by
One notable point is that the robustness against the disparity estimation error is enhanced in the orthographic projection view generation by the use of the directional pixels instead of central pixels. Figure 8 shows the effect of the disparity error on the generated view. As shown in Fig. 8(a), disparity estimation error on the central pixel Cn-1 in the perspective projection case, leads to incorrect view plane image point Ve,n-1 which is a projection of the central pixel Cn-1 on the view plane through the incorrect object point Oe,n-1. This incorrect location of the view image point results in the geometric distortion of the object in the generated view. In the orthographic projection case, however, this geometric distortion arisen from the disparity error is much reduced due to the use of the directional pixels instead of the central pixels. As shown in Fig. 8(b), the location of the view plane image point Vn-1 is not changed even though there is disparity estimation error on the directional pixel Dn-1. Hence the directional pixels are always projected to correct locations in the view image plane, reducing the geometric distortion of the object in the generated view. Only the texture between Vn and Vn-1 is affected by the disparity error, i.e. the elemental image area between Dn and D′e,n-1 is used instead of the correct area between Dn and D′n-1 for filling the view image area between Vn and Vn-1.
5. Simulation result
Simulation was performed using elemental images produced digitally assuming a pinhole array. The pinhole array consists of 40×40 pinholes with 1 mm spacing. The pixel pitch of the elemental image is assumed to be 0.05 mm so that each elemental image consists of 20×20 pixels. The gap between the pinhole array and the elemental image plane is 3.3 mm. The object used in the simulation is a hexahedron shape 3D model shown in Fig. 9(a) and located at 25 mm from the pinhole array. The prepared elemental images are shown in Fig. 9(b). The proposed method was applied to these elemental images. Correspondence analysis for each central or directional pixel was done with a window size of 5 pixels using 4 neighboring elemental images, i.e. up, down, left, and right elemental images. Median filtering and Gaussian smoothing are applied to the detected disparity to get smooth disparity map eliminating any spike noise.
Figure 10 shows the simulation results. Detected disparity maps for central pixels and directional pixels are shown in Figs. 10(a) and (b), respectively. Using the disparity map for the central pixels, the view images in perspective projection geometry were synthesized at several view points. Figure 10(c) shows the resultant view images in perspective projection geometry. The (X, Y, Z) coordinates shown on the left-top of each image represent the view-point position measured in mm unit. Left 9 images in Fig. 10(c) show the perspective changes in case of the lateral view-point translation, while right 3 images show the perspective changes with the longitudinal view-point translation. We can see that different perspectives of the object are successfully generated according to the view-points.
Slight deformation found in Fig. 10(c) originates from the estimation error of central pixel disparity map. A possible approach for reducing this deformation is the use of the sophisticated correspondence matching algorithms incorporating multiple elemental images more wisely. Figure 10(d) shows the generated view images in orthographic projection geometry. The (θx, θy) on each image represents the viewing direction for the orthographic projection. Parallel edges of the object still remain parallel in the generated image assuring that the generated images are orthographic. Note that the deformation found in Fig. 10(c) is not visible in orthographic view case shown in Fig. 10(d) due to enhanced robustness against the disparity estimation error as explained in section 4.
Figure 11 shows the generated view images in comparison with the elemental image or sub-image. Elemental image in Fig. 11(a) is taken from the set of the elemental images shown in Fig. 9(b). We can see the FOV limitation of the elemental image is alleviated in the view image generated by the proposed method in perspective projection geometry. The sub-image in Fig. 11(b) is also synthesized by pixel reassembling from the set of elemental images shown in Fig. 9(b). The resolution enhancement of the proposed method in orthographic projection geometry is obvious from Fig. 11(b).
6. Experimental result
We verified the proposed method experimentally. Figure 12 shows the experimental setup. A Fresnel-type depth control lens and the imaging optics were used to get parallel pickup direction. System parameters are listed in Table 1. The object was located at 20 mm from the lens array. The experimental results are shown in Figs. 13–15. Figure 13(a) is the captured elemental images. The resolution of each elemental image is 27×27 pixels, resulting in 864×891 pixels for the whole elemental images. The estimated disparity maps are shown in Fig. 13(b) for central pixels and in Fig. 13(c) for directional pixels at (5°, 5°).
The synthesized view images in perspective and orthographic projection geometry are shown in Figs. 14(a) and (b) and their movie versions are shown in Figs. 15(a) and (b), respectively. In the experiment, the angle range wherein the view image in orthographic projection geometry was synthesized was -6.4°~+6.4° in horizontal and vertical directions. From Figs. 14(a)–(b) and 15(a)–(b), we can see that the view images are successfully generated at arbitrary view-positions from one set of elemental images by the proposed method, which confirms the feasibility of the proposed method.
A novel method to generate arbitrary view image in perspective or orthogonal projection geometry using the lens array is proposed. Since only one camera and one lens array are used, the system is compact. Collective use of the elemental images enables view image generation with high resolution and wide FOV. Consequently the proposed method provides efficient way to acquire numerous view images for multi-view or super multi-view 3D displays. The feasibility of the proposed method is verified by simulations and experiments.
This research was partly supported by the MIC (Ministry of Information and Communication), Korea, under the ITRC (Information Technology Research Center) Support program supervised by the IITA (Institute of Information Technology Advancement) (IITA-2008-C1090-0801-0018). This work was partly supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (The Regional Research Universities Program/Chungbuk BIT Research-Oriented University Consortium)
References and links
1. W. J. Matusik and H. Pfister, “3D TV: a scalable system for realtime acquisition, transmission, and autostereoscopic display of dynamic scenes,” ACM Trans. Graphics 23, 814–824 (2004). [CrossRef]
2. H. Nakanuma, H. Kamei, and Y. Takaki, “Natural 3D display with 128 directional images used for human-engineering evaluation,” in Stereoscopic Displays and Virtual Reality Systems XII, A. J. Woods, M. T. Bolas, J. O. Merritt, and I. E. McDowall, eds., Proc. SPIE5664, 28–35 (2005). [CrossRef]
3. J. Rosen, “Three-dimensional joint transform correlator,” Appl. Opt. 37, 7438–7544 (1998). [CrossRef]
4. J.-H. Park, J. Kim, and B. Lee
, “Three-dimensional optical correlator using a sub-image array,” Opt. Express 13, 5116–5126 (2005), http://www.opticsinfobase.org/abstract.cfm?URI=oe-13-13-5116. [CrossRef]
5. D. Abookasis and J. Rosen, “Computer-generated holograms of three-dimensional objects synthesized from their multiple angular viewpoints,” J. Opt. Soc. Am. A 20, 1537–1545 (2003). [CrossRef]
6. L. Zhang, D. Wang, and A. Vincent, “Adaptive reconstruction of intermediate views from stereoscopic images,” IEEE Trans. Circuits Syst. Video Technol. 16, 102–113 (2006). [CrossRef]
7. B. Lee, S. Jung, and J.-H. Park, “Viewing-angle-enhanced integral imaging by lens switching,” Opt. Lett. 27, 818–820 (2002). [CrossRef]
9. D.-H. Shin, S.-H. Lee, and E.-S. Kim, “Optical display of true 3D objects in depth-priority integral imaging using an active sensor,” Opt. Commun. 275, 330–334 (2007). [CrossRef]
10. G. Passalis, N. Sgouros, S. Athineos, and T. Theoharis, “Enhanced reconstruction of three-dimensional shape and texture from integral photography images,” Appl. Opt. 46, 5311–5320 (2007). [CrossRef]
12. M. Okutomi and T. Kanade, “A multiple-baseline stereo,” IEEE Trans. Pattern Anal. Mach. Intell. 15, 353–363 (1993). [CrossRef]
13. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision (2nd ed., Cambridge university press, Cambridge, 2000).
14. B. Lee, J.-H. Park, and S.-W. Min, “Three-dimensional display and information processing based on integral imaging,” in Digital Holography and Three-Dimensional Display, T.-C. Poon, eds. (Springer, New York, USA, 2006) Chapter 12.
15. J.-H. Park, S. Jung, H. Choi, Y. Kim, and B. Lee, “Depth extraction by use of a rectangular lens array and one-dimensional elemental image modification,” Appl. Opt. 43, 4882–4895 (2004). [CrossRef]
16. R. Hardie, K. Barnard, and E. Armstrong, “Joint MAP registration and high-resolution image estimation using a sequence of undersampled images,” IEEE Trans. Image Processing 6, 1621–1633 (1997). [CrossRef]
17. R. L. Cook, L. Carpenter, and E. Catmull, “The Reyes image rendering architecture,” ACM SIGGRAPH Computer Graphics 21, 95–102 (1987). [CrossRef]