This paper realizes a computational integral imaging reconstruction method via scale invariant feature transform (SIFT) and patch matching to improve the visual quality of reconstructed 3D view images. To our knowledge, the 3D view images reconstructed from the elemental images suffer from artifacts, which leads to degradations in the visual quality. To prevent image degradation, in this paper, we use the correct regions obtained from the view images taken directly from the original object or use patch matching to replace the distorted regions. However, the initial matching regions could not meet our requirements owing to the limitations of the equipment and the inevitable shortcomings of the experimental operation. To solve these problems, we adopt SIFT descriptors and perspective transform to get the satisfying correct regions. We present the simulation and experimental results of the 3D view images and the evaluation of the quality of the corresponding images to test the performance of the proposed method. The simulation and experimental results indicate that the proposed method can significantly improve the visual quality of the 3D view images and verify the feasibility and effectiveness of the proposed method.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Three-dimensional (3D) imaging and display techniques have received a significant amount of interest owing to their various applications in scientific research, medical treatment, industry and so on. The authors in  proposed an ordered-dithering halftone algorithm to increase the gray levels of the time-multiplexing light field display. This team introduced a weighted average optimization to the image synthesis process  the following year, this method trades off the reconstructed depth range and the image sharpness for multi-projector-type light field displays. A simple and effective method for producing large-scale polymer microlens arrays using screen printing is proposed in . The study in  utilized integral imaging to construct a flat-panel see-through 3D display. Integral imaging, proposed by Lippmann in 1908, is one of the most promising 3D display technologies because it provides full color parallax, continuous viewing points and operates with incoherent light [5–12]. There are two basic parts in a typical integral imaging system: acquisition and display. In the acquisition process, a 3D scene is picked up by a recording device such as a lenslet array with a CCD camera or a camera array. The 3D information from the different view points on the same plane or on the same spherical surface is recorded in the elemental images. In the display process, the elemental images are displayed by a projector array or on a display panel with a lenslet array to reconstruct the 3D scene. Display is the inverse process of acquisition. The display process introduced above denotes the conventional optical reconstruction. This method always requires special optical equipment and provides low-resolution 3D scene owing to the limitations of the devices. To overcome the disadvantages of optical reconstruction, computational integral imaging reconstruction (CIIR) is considered a feasible method for presenting the 3D scene more clearly. In addition, computational reconstruction can transform the elemental images to be observed on a 2D display panel, which makes image processing and information extraction easier. The CIIR method rebuilds a 3D scene from the elemental images by using a computer and previous researches have presented some methods [15–19]. The researchers in  proposed a monospectral II encryption approach with a monospectral camera array and realized an optical 3D images encryption and reconstruction by using the geometric calibration algorithm in the monospectral synthetic aperture integral imaging system . In , the authors utilized image interpolation algorithms to obtain resolution-enhanced elemental images and magnified them to reconstruct the 3D scene to improve the visual quality. However, the interpolation process may cause extra pixel errors. Some researchers presented a method based on a technique for rearrangement of the pixels in the elemental image in . Although this method can rapidly reconstruct a 3D image, the size of the reconstructed 3D image may be not incorrect. The researchers used ray tracing and auto-focusing to rebuild the computational 3D view image in . This method solved the out-of-focus blur but performed poorly when the distance between the view point and lenslet array was large.
In this paper, we additionally utilized SIFT to adjust the view images. SIFT descriptors, proposed by Lowe, consist of location, scale, orientation and feature vectors . The location and scale are determined through extreme point searching in the Gaussian image pyramid. The orientation is obtained according to the gradient distribution of the pixels in the feature point neighborhood. The feature vector is a 128-dimension vector that records the detailed gradient information of the local region at the feature point. Owing to the excellent performance in scale invariant feature searching, SIFT descriptors have been widely used in image retrieval [21,22], image registration [23–25], and object recognition [26–28].
In this paper, we propose a resolution-enhanced CIIR method based on SIFT and patch matching that can improve the visual quality of the reconstructed 3D image. This method utilizes the similarity between the view image at the reconstructed view point and the initial reconstructed 3D image, and adjusts the view image using a matching algorithm based on SIFT descriptors. After adjustment, the new view image is considered as a template for the later reconstruction process. The proposed CIIR method can provide high image quality to the reconstructed 3D scene. In the following sections, we explain the principle of the proposed method and verify its feasibility by simulation and experiments.
2. Proposed computational integral imaging reconstruction method
The proposed integral imaging pickup process and reconstruction process are shown in Fig. 1. In the pickup process, one path uses the conventional pickup method to obtain the elemental image array and the other path uses a recording device to obtain the view image at . Here, represent the coordinates of the view point to be reconstructed. is used to reconstruct the initial view image at . SIFT descriptors are extracted from and to obtain the matching point pairs by using the matching method that compares Euclidean distances. The matching point pairs are filtered by random sample consensus (RANSAC) algorithm. Then, we use the final matching point pairs to calculate the perspective matrix . We performed perspective transformation on image according to to obtain image , which has the same resolution as image . Image , which we call the template image, is used to compare with to determine the distorted regions. We segment the template image according to the locations of the distorted regions. The regions segmented from , called correct regions, are used to construct or guide the construction of the distorted regions. The reconstruction process is the reverse process of the partial pickup process, as shown in Fig. 1. More details are introduced in Section 2.2.
2.1. Optical analysis of integral imaging
From the above introduction, we consider as the template image for the reconstruction because the 3D scene in the display process (as shown in Fig. 2) has similar spatial frequency distribution with the object in the pickup process. We also think that once the reconstructed view point is decided, the view image obtained from the object at location (the point corresponding to the decided view point) is exactly like the reconstructed image obtained from the 3D scene at the decided view point location when the conditions (ambient light, the location and the distance from to and to , etc.) are the same. Figure 2 illustrates the schematic diagram of integral imaging.
As shown in Fig. 2, we analyzed the spatial frequency distribution of the original object and reconstructed 3D scene in the integral imaging system to prove the above introduction. In the spatial domain, the light waves scattered from the object can be illustrated by29] of the 3D space and (the symbolrepresents the unit vector), and is the wave vector with amplitude and direction so, (). The light waves from (here) are represented by angular spectrum in Eq. (2).Eq. (2), is the angular spectrum distribution of, and are the continuous spectrum components. However, the information in the light waves from recorded by the pickup device is discrete. Different elemental images represent different pickup orientations. Therefore, when we adopt an lenslet array to obtain the elemental images, the light waves picked up by the pickup device can be represented by Eq. (3):Eq. (3). Apparently, Eq. (3) is the sampling of Eq. (1) which means the reconstructed 3D scene is a type of sampled object . Thus, if the view point is determined, the view image to be reconstructed can be obtained from the object .
2.2. Computational reconstruction of integral imaging based on SIFT and patch matching
The green parallel lines represent a reconstruction view and all green points in Fig. 3 compose the reconstructed image of the corresponding view. The resolution of the reconstructed image using the conventional method is the same as the number of lenslets in the lenslet array. Thus, the resolution of the reconstructed image by this method is significantly lower and cannot satisfy the demand for high visual quality. Moreover, periodically extracting pixels from the elemental image array demands that the distance between the viewer and the lenslet array be large enough to ensure that the angle between the viewer and each lenslet is equal. However, this demand is not suitable in the actual situation.
Owing to the disadvantage of the conventional method, we extract pixels non-periodically from the elemental image array to improve the conventional method, as shown in Fig. 4.
The distance between the display plane and lenslet array , the distance between the lenslet array and imaging plane , and the focal length obey the Gaussian imaging formula. Hence, the mathematic relation of, , and isFig. 4) as the center and obtain a patch with lateral resolution (the calculation for vertical resolution is the same). In our method, the maximum value of is calculated by Eq. (5):
According to the introduction in Section 2.1, the reconstructed 3D image is the sampling of the original object, implying that if we record the original object at the same view point as in the reconstructed process, we will obtain a view image (we called template image) with no distortion. In addition, the pickup resolution of the recording device is high. Thus, the template image has high resolution. However, the view image obtained from the original object might be a little different from the template image because of the inappropriate focal length, slightly inaccurate distance between the camera and the original object, and some unavoidable problems in actual camera acquisition like shaking. To solve the above problems, we adjust the view image obtained from the original object according to the initial reconstructed image, which is obtained by the improved non-periodic pixels extracting method. The main steps of the adjustment process are introduced at the beginning of Section 2 and Fig. 1. After adjustment, the template image becomes a high definition ideal template. Then, we need to find the exact distortion region in the reconstructed image. In our method, we utilize the peak signal to noise ratio (PSNR) to find the distortion region. If the size of the elemental image array is , the reconstructed image is composed of patches. Therefore, we divide the template image into patches and label the patches of the reconstructed image from left to right and top to bottom, respectively, as and the template image is also labeled this way. Then, we calculate the PSNR of every pair of same labeled patches. In our method, if there is a patch in the reconstructed image with PSNR less than , we consider this patch to be distorted. The value of in our simulation was 25 and those in the experiments were 20, 25 and 30. Then, we record the label of this distorted patch. All labels of the distorted regions will be used to extract the corresponding patches in the template image. In our simulation and experiments, we designed two methods for the next processes: Method 1 directly transmits the patches of the template image that have the same labels as the distorted regions to the reconstruction process with the elemental images and labels of the distorted regions. Method 2 uses the patches of the template image in the distorted regions to find the good regions in the corresponding elemental image. Here, we calculate the value of PSNR of the patch from the template image and the same-size patch in the corresponding elemental image. The moving direction of the patch from the template image is shown in Fig. 5 (the yellow arrow) and the patch moves one pixel every time. In this process, we define the patch in elemental images with the maximum value of PSNR as a good region for reconstruction. Then, we record the location of this good region in elemental image. Finally, the elemental images, the labels of the distorted regions and locations of the good regions will be transmitted to the reconstruction process.
In the reconstruction process, the improved non-periodic pixels extracting method for integral imaging computational reconstruction is used to get the initial reconstructed image. Then, the decoded labeled patches are used to cover the initial reconstructed image at the corresponding regions and the final reconstructed image are obtained. However, this method (Method 1) will cause a large transmission burden. Method 2 utilizes the locations of the good regions in the elemental images to find the good regions and cover the initial reconstructed image with the good regions according to the labels of the distorted regions. If the number of reconstructed images is large, Method 2 is a better choice.
3. Simulation and experimental results
In this section, the simulation and experimental results are presented to verify the improvement of our proposed method for CIIR.
3.1. Simulation results
The elemental images used in simulation were produced digitally through a virtual camera array. The camera array consisted of 50 × 50 cameras with 1.00 mm spacing and 3.00 mm focal length. Each elemental image consisted of 200 × 200 pixels. The object used in the simulation was a combination scene of one soccer ball and two Rubik's cubes located 21 mm away from the camera array, as shown in Fig. 6(a). The elemental images are shown in Fig. 6(b) and the details in the yellow box are shown in the upper right corner. The proposed method is applied to these elemental images. The parameters used in the computational reconstruction of the simulation are listed in Table 1. The viewing angle was approximately 4.5° in this simulation.
As shown in Fig. 7, each image contains three view images known as left view at (12, 25, 329), front view at (25, 25, 329), and right view at (38, 25, 329) from left to right. The scale of inset coordinates that present spatial position is in the millimeter scale in Fig. 7. Owing to the limitations regarding the length of this paper, only three view images are shown for each set of simulation. The resolution of each view image reconstructed by conventional CIIR method  is in our simulation. Here, we up-sampled each reconstructed view image to for convenient comparison. As shown in Fig. 7(a), we can barely see the soccer ball and the sharpness of the Rubik's cubes is worse. The visual quality of Figs. 7(b) and 7(c) are almost the same and it is evident that the edge of the object away from the imaging plane has blocking artifacts. In comparison with Figs. 7(b) and 7(c), the regions in the same positions of Figs. 7(d) and 7(e) are smooth. In our simulation, we took 30 pixels for each patch, indicating that . Therefore, the resolution of each view image in Fig. 7 is. The details of the reconstructed images are presented in red boxes.
The size of the compressed image patches via JPEG (in Method 1) of Fig. 7(d) is listed in Table 2, which is acceptable when the number of reconstructed images is small. If the image patches are compressed by JPEG2000 or HEVC, the size will become smaller for the same image quality. The size of the compressed location of the good regions in elemental images (in Method 2) of Fig. 7(e) is also listed in Table 2. The extra data size produced in the proposed Method 2 is evidently smaller than that in the proposed Method 1, and the visual quality of Method 1 and Method 2 are almost the same. Therefore, we prefer to adopt Method 2 when the number of reconstructed images is large. The number of distorted patches is also listed in Table 2. Here we add two sets of view data to make the simulation more convincing.
The PSNR and structural similarity index (SSIM) of the reconstructed results at different view points are listed in Table 3. The PSNR and SSIM of the proposed method are evidently higher than those of the conventional method  and the method mentioned in . The results obtained by the proposed method are good, considering both the quality of subjective vision and objective image quality, implying that the proposed method can improve the visual quality of the reconstructed 3D view image.
3.2. Experimental results
We design two systems to verify our proposed method in this section. The setups of System 1 and System 2 are shown in Figs. 8(a) and 8(b), respectively. In System 1, we used one camera to simulate the camera array by moving this camera 1mm each time while capturing. However, the cost of camera array is very high. Therefore, several acquisition processes can only be performed by one camera and lenslet array. In System 2, a camera and lenslet array were used to capture the elemental images. Although the elemental images captured through the lenslet array contained scattered light, the proposed method is still effective. Designing these two systems makes our experiment more convincing. The parameters used in computational reconstruction are listed in Table 4 and Table 5. Figures 9(a) and 9(b) show the elemental images used in System 1 and System 2, and the details are shown in the green and red boxes, respectively. In the acquisition process, the images captured by the camera were cut into appropriate sizes owing to the large acquisition resolution of the camera.
Figure 10 shows the experimental reconstructed results of System 1 at different view points. Similar to the simulation, the resolution of each view image reconstructed using the conventional method  was and was up-sampled to . As shown in Fig. 10(a), the view images are too blurry to see the edge of the Rubik's cube. The poor visual quality of Fig. 10(a) is because method  only extracts one pixel for each lenslet. Although up-sampling appears to improve the resolution, the view image has insufficient effective information from the elemental images for reconstruction. As shown in Figs. 10(b) and 10(c), these two sets of view images look similar. Due to the unavoidable errors in experimental operation while capturing, Figs. 10(b) and 10(c) have several distortions. However, the reason for distortions in Fig. 10(b) is a little different from Fig. 10(c). The ray tracing CIIR method in  is not effective when the view point is far away from the lenslet array (or the camera array) and the elemental images are close to the lenslet array (or camera array). Thus, many adjacent pixels in the reconstructed image extract the pixel at the same position from the elemental images. As for Fig. 10(c), the improved non-periodic pixels extracting CIIR method takes a patch from each elemental image that would easily cause blocking artifacts. This is because this method may take the wrong patch when the reconstruction position is far from the view point or the reconstructed imaging plane. Figures 10(d) and 10(e) show the reconstructed results of System 1 using the proposed reconstruction method (Method 1 and Method 2). In our experiments, the values of PSNR in the process of distorted region searching (the value of in Section 2.2) are 20, 25 and 30. Owing to the limitations regarding this length of the paper, the value of PSNR is set to 20 in Figs. 10(d) and 10(e). The reconstructed view images that are demonstrated in Figs. 10(d) and 10(e) have fewer blocking artifacts than Fig. 10(b), and their visual quality is improved. Table 6 lists the extra size of data needed to be transmitted in System 1. The PSNR and SSIM of the reconstructed results in System 1 at different view point are listed in Table 7.
Figure 11 illustrates the experimental reconstructed results of System 2 at different view points. The resolution of each image in Fig. 11 is Figure 11(b) shows the reconstructed results using method , whose visual quality is worse than Fig. 11(c) because method  requires the elemental images to be highly accurate. However, in the actual operation of the experiment, the occurrence of errors is inevitable owing to human factors. The reconstructed view images in Fig. 11(d) are clearer but the experimental results in Fig. 11(e) are worse than those in Fig. 11(c) because the lenslet array used in our experiments disperses the light propagated from the object, which causes luminance of the elemental images and the template images captured by the camera are inconsistent. All these problems lead to mismatches when using the proposed Method 2. The extra size of data needed to be transmitted for System 2 is listed in Table 8. The PSNR and SSIM of the reconstructed results in our experiments at different view point are listed in Table 9.
As shown in Fig. 10 and Fig. 11, the sharpness of the reconstructed results using the proposed method is better than those using method  and method , the artifacts are eliminated using the proposed method, and the color accuracy of proposed method is also better.
The reconstructed results of System 1 and System 2 are shown in Figs. 12 and 13 respectively, when the values of PSNR in the process of distorted region searching (the value of in Section 2.2) are 20, 25 and 30. Higher value of PSNR means the reconstructed view image is more similar to the template image while using the proposed Method 1. For the proposed Method 2, when the value of PSNR is higher, the number of distorted patches is larger. However, because the good patches are obtained from the corresponding elemental images instead of template images, the visual quality of the good patches obtained by patch matching is limited. Thus, the PSNR and SSIM of the proposed Method 2 is lower than those of Method 1, as shown in Table 7 and Table 9. Therefore, the visual quality of the view images reconstructed using Method 2 will not be improved continuously when the value of PSNR in the process of distorted region searching increases.
In the proposed method, the distortion region is repaired under the guidance of the template image. As mentioned in Section 2, the template image is adjusted according to the initial reconstructed image, which is reconstructed by the improved non-periodic pixels extracting CIIR method. This indicates that the initial reconstructed image should be similar to the view image captured from the original object. Otherwise, the template image may not be obtained owing to the high number of mismatches in the SIFT descriptors matching.
This paper presents a CIIR method based on SIFT that can reconstruct 3D view images with high visual quality. Unlike the conventional reconstruction method, the proposed method utilizes SIFT to adjust the view images of the original object which we called template images, and the view images of the reconstructed 3D scene are improved under the guidance of the template images. The proposed method has been proven effective in improving the visual quality of 3D view image reconstruction in computational integral imaging. The feasibility of the proposed method was verified through simulations and experiments. Moreover, due to the limitations of our experimental conditions, we can only provide computational results currently. Our next work will verify the 3D image quality of the proposed method on an optical display platform.
National Key Research and Development Plan of 13th five-year (2017YFB0404800); National Natural Science Foundation of China (NSFC) (61631009); Fundamental Research Funds for the Central Universities (2017TD-19).
1. C. Su, Q. Zhong, Y. Peng, L. Xu, R. Wang, H. Li, and X. Liu, “Grayscale performance enhancement for time-multiplexing light field rendering,” Opt. Express 23(25), 32622–32632 (2015). [CrossRef] [PubMed]
2. Q. Zhong, Y. Peng, H. Li, and X. Liu, “Optimized image synthesis for multi-projector-type light field display,” J. Disp. Technol. 12(12), 1745–1751 (2016). [CrossRef]
3. X. Zhou, Y. Peng, R. Peng, X. Zeng, Y. A. Zhang, and T. Guo, “Fabrication of large-scale microlens arrays based on screen printing for integral imaging 3D display,” ACS Appl. Mater. Interfaces 8(36), 24248–24255 (2016). [CrossRef] [PubMed]
5. X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: sensing, display, and applications [Invited],” Appl. Opt. 52(4), 546–560 (2013). [CrossRef] [PubMed]
6. Y. Chen, X. Wang, J. Zhang, S. Yu, Q. Zhang, and B. Guo, “Resolution improvement of integral imaging based on time multiplexing sub-pixel coding method on common display panel,” Opt. Express 22(15), 17897–17907 (2014). [CrossRef] [PubMed]
7. N. Chen, J. Yeom, J. H. Jung, J. H. Park, and B. Lee, “Resolution comparison between integral-imaging-based hologram synthesis methods using rectangular and hexagonal lens arrays,” Opt. Express 19(27), 26917–26927 (2011). [CrossRef] [PubMed]
8. H. Deng, Q. H. Wang, L. Li, and D. H. Li, “An integral-imaging three-dimensional display with wide viewing angle,” J. Soc. Inf. Disp. 19(10), 679–684 (2011). [CrossRef]
12. X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved SR reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019). [CrossRef]
14. X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26(9), 11084–11099 (2018). [CrossRef] [PubMed]
16. M. Cho and B. Javidi, “Computational reconstruction of three-dimensional integral imaging by rearrangement of elemental image pixels,” J. Disp. Technol. 5(2), 61–65 (2009). [CrossRef]
17. K. Inoue, M. Lee, B. Javidi, and M. Cho, “Improved 3D integral imaging reconstruction with elemental image pixel rearrangement,” J. Opt. 20(2), 025703 (2018). [CrossRef]
18. Y. Yuan, S. Yu, X. Wang, and J. Zhang, “Resolution enhanced 3D image reconstruction by use of ray tracing and auto-focus in computational integral imaging,” Opt. Commun. 404, 73–79 (2017). [CrossRef]
19. B. Cho, P. Kopycki, M. Martinez-Corral, and M. Cho, “Computational volumetric reconstruction of integral imaging with improved depth resolution considering continuously non-uniform shifting pixels,” Opt. Lasers Eng. 111, 114–121 (2018). [CrossRef]
20. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. 60(2), 91–110 (2004). [CrossRef]
21. J. Kalpana and R. Krishnamoorthi, “Color image retrieval technique with local features based on orthogonal polynomials model and SIFT,” Multimedia Tools Appl. 75(1), 49–69 (2016). [CrossRef]
22. G. A. Montazer and D. Giveki, “Content based image retrieval system using clustered scale invariant feature transforms,” Optik (Stuttg.) 126(18), 1695–1699 (2015). [CrossRef]
23. Y. Zhu, S. Cheng, V. Stankovic, and L. Stankovic, “Image registration using BP-SIFT,” J. Vis. Commun. Image Represent. 24(4), 448–457 (2013). [CrossRef]
24. G. Lv, S. W. Teng, and G. Lu, “Enhancing SIFT-based image registration performance by building and selecting highly discriminating descriptors,” Pattern Recognit. Lett. 84, 156–162 (2016). [CrossRef]
25. S. W. Teng, M. T. Hossain, and G. Lu, “Multimodal image registration technique based on improved local feature descriptors,” J. Electron. Imaging 24(1), 013013 (2015). [CrossRef]
27. J. Yu, F. Zhang, and J. Xiong, “An innovative sift-based method for rigid video object recognition,” Math. Probl. Eng. 2014, 138927 (2014). [CrossRef]
28. S. Luo, W. Mou, K. Althoefer, and H. Liu, “Novel tactile-sift descriptor for object shape recognition,” IEEE Sens. J. 15(9), 5001–5009 (2015). [CrossRef]
29. J. W. Goodman, Introduction to Fourier Optics (Roberts and Company Publishers, 2005), Chap. 3.