Abstract

Synthetic aperture integral imaging using monocular video with arbitrary camera trajectory enables casual acquisition of three-dimensional information of any-scale scenes. This paper presents a novel algorithm for computational reconstruction and imaging of the scenes in this SAII system. Since dense geometry recovery and virtual view rendering are required to handle such unstructured input, for less computational costs and artifacts in both stages, we assume flat surfaces in homogeneous areas and take full advantage of the per-frame edges which are accurately reconstructed beforehand. A dense depth map of each real view is first estimated by successively generating two complete, named smoothest- and densest-surface, depth maps, both respecting local cues, and then merging them via Markov random field global optimization. This way, high-quality perspective images of any virtual camera array can be synthesized simply by back-projecting the obtained closest surfaces into the new views. The pixel-level operations throughout most parts of our pipeline allow high parallelism. Simulation results have shown that the proposed approach is robust to view-dependent occlusions and lack of textures in original frames and can produce recognizable slice images at different depths.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Integral imaging, first proposed in 1908 [1], is the most promising naked-eye three-dimensional (3D) display technique so far. It can produce full-color, full-parallax, and quasi-continuous 3D effect based on an elemental image (EI) array. By using a micro-lens array and a single camera, each EI records the scene information in a particular viewing direction. Normally, the EIs have low resolution which is limited by the camera field of view, and suffer from aberrations due to the small aperture of the micro-lenses. These issues can be avoided by using the synthetic aperture integral imaging (SAII) [2], that obtains multiple perspective images with a camera array or a translated camera (Fig. 1). However, traditional acquisition methods (e.g. [3,4]) have high hardware or calibration costs, inconvenience of dismounting and moving, or limitations of captured areas. In view of above drawbacks, this paper is devoted to the SAII system which uses an unstructured (i.e. casually shot) monocular video (Fig. 6). This manner is cheaper, simpler, faster, and also capable of acquiring all information of any-scale scenes, e.g. from small objects to enormous buildings.

 

Fig. 1 Principle of traditional SAII technique. A group of perspective images are recorded by a camera array, and then projected inversely through a corresponding pinhole array. The projections on each depth plane are overlapped and accumulated, generating a slice image.

Download Full Size | PPT Slide | PDF

Since the camera trajectory is arbitrary and the scene geometry is unknown, we introduce a novel approach for computational reconstruction and imaging of 3D scenes in this SAII system. Our pipeline goes through two main stages to generate the perspective images for a given virtual camera array. The first stage, dense geometry recovery, reconstructs a dense 3D scene model from the calibrated video frames, which is then used in the second stage, virtual view rendering, to synthesize new views. Note that in fact it is also practical to straightly produce the slice images with calculated camera extrinsic parameters and depth information, but warping the massive frames many times is computationally expensive. Instead, the slice image generation can be more simply simulated by shifting and accumulating the virtual perspective images.

One major challenge to this pipeline lies in that, directly adopting classic schemes on the seriously redundant video data would bring high computational costs in both stages. Another problem is caused by homogeneous surfaces (with insufficient textures for multi-view matching) in the scene, which are difficult to recover for most algorithms. The resulted geometry errors or holes might lead to artifacts in the synthesized views and displayed 3D images.

The work of Kim et al. [5] has shown that, accurate 3D edges can be calculated by adeptly exploiting the input dense view samples, and subsequently used to recover the bounded homogeneous areas. Their pipeline focuses on handling regular light fields, not suitable for unstructured videos. Inspired by this sparse-to-dense strategy, our SAII approach (Fig. 2) is based on the high-quality edge depth maps of densely sampled frames, which are reconstructed beforehand using the ray-wise algorithm of [6] (our previous research).

 

Fig. 2 Pipeline of the proposed SAII approach.

Download Full Size | PPT Slide | PDF

In the proposed approach, the dense geometry recovery stage depends on two complete depth maps. The smoothest-surface depth map assumes flat surfaces in homogeneous areas, and uses the per-frame edge depths to perspectively fit the smoothest possible surface in between. This depth map probably produces interpolants that connect fore- and background edges but occlude the real surfaces (Fig. 5(d)), particularly when the real and virtual views have distinct viewing directions. The densest-surface depth map is calculated by propagating the estimates with the highest surface density among the former depth maps. This depth map is able to effectively remove the wrong interpolants, because basically all these errors have large slopes (thus producing low-density surfaces) and would be replaced by the correct depths from other views. But it possibly induces depth discontinuities which are not aligned with object edges (Fig. 5(f)). Therefore, we borrow the idea of Tao et al. [7] to combine the local cues of smoothness and density, by merging the above depth maps of each frame via global minimization of Markov random fields (MRFs). The optimized depth maps of all real views are satisfactory in terms of both cues. This allows us to synthesize new perspective images simply by back-projecting the closest surfaces in the virtual view rendering stage.

Most of the techniques we employ are performed per pixel, thus supporting high parallelism on graphics processing units (GPUs). Simulation results show that the presented approach is robust to view-dependent occlusions and lack of textures in original frames, that enables recognizable slice images at different depths to be produced in the display part of our SAII system.

2. Related work

3D scene acquisition

For more freedom than typical camera-array-based acquisition [8], some researches try to make the cameras emancipate from grid distribution. In [9] and [10], the cameras are arbitrarily placed but their optical axes must keep parallel. Besides, the strategy of [10] has to bound all cameras on the same plane. These methods need a great amount of work for camera calibration. Combining sparse view capturing and virtual view synthesis [11] is a good solution to simplify the acquisition equipment. Wang et al. [12] use camera pose estimation to make the cameras distributed on any curved surface. Their more flexible strategy requires no calibration even if the camera arrangement changes. For lower hardware costs, [13, 14], [15], and [16] separately propose axially, laterally, and obliquely distributed acquisition with a single camera, so only one calibration is sufficient. All the above methods have difficulties in acquisition around large scenes. The Kinect sensor is first used for integral imaging in [17]. Such depth cameras are able to move freely and directly output the scene geometry in real time. Zhang et al. [18] utilize the KinectFusion technique for higher accuracy and completeness of the model. However, depth cameras are only suitable for close scenes owing to their short sensing distances. The methods based on (micro-)lens array have similar drawback.

Scene reconstruction from video

Early approaches of structure-from-motion (SfM) [19] and simultaneous localization and mapping (SLAM) [20] generate sparse detectable 3D points from monocular video, conveying little semantics of the scene. Advanced SLAM algorithms obtain dense surfaces by replacing [21] or combining [22] point features with planes. Some video-oriented multi-view stereo (MVS) methods [23,24] realize fast or interactive reconstruction, but they are only usable for object modeling, yield low coverage for homogeneous surfaces, or rely on optical flow. To solve the matching ambiguities due to lack of textures, some MVS work employ energy minimization constrained by a smoothness term [24], domain transform filter [25], or a multi-resolution manner [26], but might still produce depth discontinues for large regions. Especially when occlusion occurs, the results of these techniques are not completely visibility-consistent. Consistent models can be obtained by simply removing the estimates which are inconsistent among small-baseline neighboring views or violate the occlusion consistency across large-baseline non-neighbors [24]. Bundle optimization [27], tetrahedra carving [28], cross-view filtering [25,26], and edge-based interview invalidation [6,29] have also been adopted.

Virtual view synthesis

Computer generated integral imaging (CGII) and image-based rendering (IBR) are two mainstream methods of virtual view synthesis. CGII uses a known 3D scene model and renders novel views through a virtual lens array. It includes parallel group rendering [30], viewpoint vector rendering [31], viewpoint-independent rendering [32], and multi-viewpoint rendering [33]. Xing et al. [34] accelerate the CGII process via backward ray-tracing. Differently, IBR synthesizes views from a set of images. It can be loosely classified into three categories according to how much geometry is used [35]. Most light field rendering methods [36] fall into the first category. The number of their input images is large and the viewpoints are strictly constrained on a grid. The second category rely on implicit geometry representation, e.g. point correspondences [37] and optical flow [38]. The methods requiring dense and explicit geometry estimates obtained from a few unordered images [39] belong to the third category. One main drawback lies in that depth errors normally lead to tearing and ghosting in the virtual view, especially in the occluded and textureless regions. The artifacts can be avoided with a soft geometry model [40] or Bayesian formulation [39] to robustly respect the depth uncertainty. The majority of the latter two categories only apply to wide-baseline images. Very few IBR approaches have been concentrated on handling monocular videos. Some work [41] depend on small-baseline image warping without serious occlusions. Chaurasia et al. synthesize distinct views using silhouette-aware [42] and local shape-preserving [43] warps to address the missing or unreliable geometry. In contrast, our method is much simpler and can be implemented more efficiently, while yielding comparable rendering quality.

3. The proposed approach

The pipeline of the proposed SAII method is shown in Fig. 2. In this paper, the object boundaries and textured regions are collectively called edges. The edges are typically aligned with color discontinuities, and thus easy to detect and to reconstruct from images by imposing the photometric constraint. Considering this, our approach is aimed at outputting a set of slice images for a virtual camera array, using the frames of an unstructured video and their edge depth maps.

In the pre-processing stage, the SfM algorithm of Resch et al. [19] is first executed to compute the camera extrinsics of all frames, 3D positions of features, and their associated visibilities. According to these information, we select a dense subset of views from the small-baseline frames for sufficient triangulation angles (letting each pair of neighboring view samples have a viewing angle of about 1°). For convenience, hereafter the mentioned frames and real views all correspond to the view samples. For each frame, the edge pixels are extracted if the magnitude of color gradient calculated with a 3 × 3 Sobel operator is larger than 2.5. We use the scheme of [6] to estimate edge depths, because it considers individual visual rays to achieve high precision and more robustness on unconstrained camera movements.

Dense geometry recovery and virtual view rendering are the next two stages of our pipeline, and also the most important. We describe them in detail below.

3.1. Dense geometry recovery

This stage calculates the corresponding dense depth maps for all edge depth maps obtained in the pre-processing stage. Most of 3D reconstruction methods struggle on recovering large areas of homogeneous surfaces, particularly in the presence of heavy occlusions. We do this by first generating two complete depth maps for each real view with different local cues respected, and then merging them in a MRF global optimization process.

3.1.1. Smoothest-surface depth map

Noting that planar geometry is very common in the real world, we use intra-view diffusion of edge depths to fit the smoothest possible surface for textureless areas in each real view, even if the actual geometry is curved. We name the interpolation result as smoothest-surface depth map.

Figure 3 illustrates the problem of classic isotropic diffusion scheme on depth data, and the idea of perspective diffusion developed in our paper. The target is to interpolate a depth value for the mid-pixel p = (x, y) of two pixels p1 = (xw, y) and p2 = (x + w, y) with depths d1 and d2, respectively. By using isotropic diffusion, p would be assigned the average depth dw=12(d1+d2). However, due to perspective distortion, the corresponding 3D interpolant Pw lies behind the point Pr, which is on a flat surface connecting the 3D points of p1 and p2, i.e. P1 and P2. It is because the 2D pixel grid is not exactly projected to an uniform grid on the slanted surface. A solution to this problem is adopting an anisotropic strategy, e.g. defining a weighting function. We instead employ a simpler, perspective diffusion method based on the visual ray. Note in Fig. 3 that, since Pr is the intersection between the visual ray OPr and the 3D line P1P2¯, both can be calculated, p can get its depth dr by back-projecting Pr onto the image plane. Therefore, we first formulate our diffusion strategy to allow for that initial depths are scattered. Then the interpolants separately obtained in horizontal and vertical directions are incorporated into 2D diffusion task.

 

Fig. 3 Depth diffusion from two pixels p1 and p2 to the mid-pixel p. Point Pw represents the projected isotropic interpolant, defining the diffused depth as dw=12(d1+d2). Pr is the assumed point on the principle surface (green), and dr is the corresponding real depth. O is the camera center.

Download Full Size | PPT Slide | PDF

The diffusion process performs iterative convolution with a 4-point stencil, only on the pixels without edge depth to preserve depth discontinuities. Let the diffused result of the ith edge depth map Dedgei in the tth iteration be Dti, and D0iDedgei for initialization. In order to assign p the only available depth if one of p1 and p2 has no estimate, we include the indicator function δ(·) (evaluated to 1 for a known depth and 0 for others) to denote the horizontally diffused depth as

dx=δ1δ2dr+(1δ1δ2)(δ1Dti(p1)+δ2Dti(p2))
where δ1:=δ(Dti(p1)) and δ2:=δ(Dti(p2)). The vertically diffused value dy is calculated similarly using p3 = (x, yw) and p4 = (x, y + w). We interpolate these two depths for each non-edge pixel p using
Dt+1i(p)=δ(dx)dx+δ(dy)dyδ
with δΣ = δ (dx) + δ(dy) for normalization. If no available depth is found for diffusion, i.e. δΣ = 0, the result at p remains unchanged temporarily. The smoothest-surface depth map Dsmoothesti is determined as soon as the sum of per-pixel depth differences between two consecutive iterations is smaller than 0.001. Thereby, a small stencil size w might slow down the depth transportation. We use the stencil-shrinking solver [44] by initializing w for each interpolated pixel with the Euclidean distance from the nearest edge pixel.

 

Fig. 4 Edge depth map and our obtained closest-surface depth map, as well as 3D points reconstructed by isotropic and proposed perspective diffusion methods. The floor and ceiling are enlarged for more clear comparison. Isotropic interpolation creates distorted surfaces while ours produces smooth and flat results.

Download Full Size | PPT Slide | PDF

Figure 4 shows an example of Dedgei and our obtained Dsmoothesti. It also compares the resulted 3D points using isotropic and proposed perspective diffusion approaches. The distortion from isotropic diffusion produces uneven surfaces on the floor and an oscillation on the large-area homogeneous ceiling (Fig. 4(d)). In contrast, ours generates flat surfaces in both cases (Fig. 4(e)). See Figs. 5(b) and 5(c) for another example.

Dedgei probably contains outliers due to insufficient contrast, aliasing artifact, or reflection in the original scene, resulting in inaccurate estimates in Dsmoothesti. For the latter global optimization stage (Sec. 3.1.3), we calculate a confidence map for Dsmoothesti:

Csmoothesti(p)={0,ifs(p)<sleft(p)ors(p)<sright(p)1exp(min(|s(p)sleft(p)|2,|s(p)sright(p)|22σs2),otherwise
s(p) denotes the depth score of Dsmoothesti(p). sleft(p) and sright(p) are the depth scores computed by shifting the projections of p in all secondary images to the left and right along the epipolar lines by one pixel, respectively. See [6] for details of the secondary image selection and definition of the depth score. In principle, s(p) should represent the highest local maximum of all depth scores at p. In this case, Eq. (3) uses the Gaussian function of the difference between s(p) and sleft(p) or sright(p) to assign p a higher confidence if s (p) is markedly higher. Here, σs = 0.3 is used. A larger value can make the confidence map smoother, that would induce more blurness in the finally obtained depth map.

3.1.2. Densest-surface depth map

As shown in Figs. 5(c) and 5(d), intra-view depth diffusion possibly creates wrong surfaces in front of the real geometry between fore- and background edges. These surfaces are generally heavily slanted for the currently processed view, thus containing low-density 3D points. Based on this observation, we remove the interpolation errors by propagating the depths, which have the highest surface density, across all {Dsmoothesti}. In this paper, the propagation result is named as densest-surface depth map, and denoted by Ddensesti for the ith frame.

 

Fig. 5 Comparisons of the smoothest-surface, densest-surface, and optimal depth maps of the same view, as well as their reconstructed 3D points. The point clouds in (d), (f), and (i) are rotated for better visualization.

Download Full Size | PPT Slide | PDF

To measure the density of per-view reconstructed points, we calculate a scale map Si for each Dsmoothesti, where Si(p) is defined as the maximum distance between the 3D points produced from p and its 4-neighbors. A smaller scale value means that a locally denser surface is recovered. Let Pji(Dj(q)) and Tji(Dj(q)) represent the pixel and depth obtained by projecting the pixel q from the jth depth map Dj to the ith real view, respectively. Then the calculation of Ddensesti can be formulated as follows:

(j*,q*)=argmin(j,q)Sj(q),s.tPji(Dsmoothestj(q))=pDdensesti(p)=Tj*i(Dsmoothestj*(q*))

Figures 5(e) and 5(f) present an example of our generated Ddensesti and its reconstructed 3D points. It can be seen that the wrong depth interpolants of Dsmoothesti in Fig. 5(c) are substantially corrected in Ddensesti, producing sharp depth discontinuities at object boundaries, and the corresponding surfaces connecting fore- and background edges in Fig. 5(d) are eliminated from the scene geometry. However, the above density-based cross-view propagation leads to layered surfaces in some homogeneous areas, where the geometry is expected to be flat. This problem will be addressed in Sec. 3.1.3.

For the following global optimization stage (Sec. 3.1.3), we also calculate a confidence map for Ddensesti, by using the confidences of the depths that comprise Ddensesti:

Cdensesti(p)=Csmoothestj*(q*)ccolor(p,q*),ccolor(p,q*)=exp(Ii(p)Ij*(q*)22σc2
In Eq. (5), the Gaussian function c color of the color difference between p and the pixel q* projected to p is incorporated as a penalty factor, for the cases that p and q* belong to different surfaces. We set σc = 8 for synthetic scenes and 15 for real-world scenes, i.e., a larger value can enhance the noise-robustness of depth estimation. In our implementation of Eq. (4), we improve the accuracy of Ddensesti by only selecting the pixels q* satisfying ccolor(p, q*) ≥ 0.5.

3.1.3. MRF global optimization

Since Dsmoothesti and Ddensesti are estimated both relying on local cues, and their characteristics are mutually complementary, we merge them to handle the depth ambiguities considering the image structure. To this end, we adopt the method of [7] to globally minimize the MRFs defined as

Doptimali=argminZsourceλsourcejWsourcej|ZjZsourcej|+λflat(x,y)(|Zjx|(x,y)+|Zjy|(x,y))+λsmooth(x,y)|ΔZj|(x,y)
where Doptimali is the obtained optimal depth map. Zsource and Wsource denote the initial data term:
{Zsource1,Zsource2}={Dsmoothesti,Ddensesti}{Wsource1,Wsource2}={Csmoothesti,Cdensesti}

In Eq. (6), λsource is used to adjust the proportion of input two depth maps. λflat controls the Laplacian constraint to enforce flatness of Doptimali. λsmooth controls the second derivative kernel to guarantee that Doptimali is overall smooth. To implement this MRF formulation, we set λsource = 1 and λflat = λsmooth = 2, and use the solving approach proposed in [7]. Afterward, we further refine Doptimali by removing the depths whose recomputed scale values are larger than 0.001. This post-processing step eliminates the small quantity of remaining wrong interpolants and outliers induced by the rank deficient problem during the optimization.

The merging results of Figs. 5(c) and 5(e) without and with refinement are shown in Figs. 5(g) and 5(h). It is obvious that, after globally optimizing, the incorrect connections between the objects on distinct depth planes in Dsmoothesti are almost cleaned up, while the layered geometry of homogeneous surfaces in Ddensesti become smooth again. The same conclusion can be drawn by comparing the point clouds shown in Figs. 5(d), 5(f), and 5(i).

3.2. Virtual view rendering

With the dense and accurate depth maps {Doptimali} of individual real views, we can efficiently synthesize a perspective image for any virtual view. Assume that a virtual camera array has been built in the world coordinate system of our reconstructed 3D model, i.e., the intrinsic and extrinsic (rotation and translation) parameters of each virtual camera, as well as the distance between neighboring virtual cameras are all known. We render the perspective image Iv of the vth virtual camera by back-projecting the 3D-point estimates of all {Doptimali} to each pixel p in this view, and then picking the color of the point projected to p which has the least distant depth. This simple view synthesis process can be described by

(j*,q*)=argmin(j,q)Tjν(Doptimalj(q)),s.t.Pjν(Doptimalj(q))=pIν(p)=Iframej*(q*)
where Iframej* denotes the j*th frame sample. See Fig. 9 for our rendered results.

 

Fig. 6 Trajectory of the video camera (gray) and visualization of the 15 × 15 virtual camera array (green) used for each tested scene in our experiments. All virtual cameras have the same intrinsic parameters with the real camera.

Download Full Size | PPT Slide | PDF

4. Results

4.1. Datasets and compared methods

Three monocular video datasets with large homogeneous surfaces and unconstrained camera trajectories (Fig. 6) have been used to test our SAII system. Bathroom is a synthetic scene with ground truth depth maps created in Blender software. Building and Boxes are real-world scenes without known geometry. Building is captured by a camera mounted on a flying drone, while Boxes by a hand-held camera. For each dataset, a subset of 100 frames were pre-selected as the real views, and all acquired images have resolution of 1920 × 1080.

 

Fig. 7 Comparisons of depth maps against the ground truth. We present (a) the depth maps, (b) relative error maps, and (c) distribution measuring how much of the estimated depth has a smaller relative error than a given threshold. In (b), the blue pixels have no depth, the red have an error larger than 0.01, the green have no ground truth data, and the pixels with an error between 0 and 0.01 are marked gray 0 to 255.

Download Full Size | PPT Slide | PDF

 

Fig. 8 Edge depth maps and our estimated dense depth maps for the real-world scenes.

Download Full Size | PPT Slide | PDF

Since dense geometry recovery is the key stage of our proposal, we compare our results with three existing depth map estimation schemes: [45] (BAI), [5] (KIM), and [6] (WEI). BAI is a classic MVS approach for sparsely sampled views, that uses the Patch-Match technique on a few secondary images, region growing, and post-checking to remove the outliers inconsistent across views. The light field reconstruction method KIM performs ray-oriented depth sweeping at edges (with 1024 hypotheses), and recovers textureless surfaces by detecting edges in gradually downscaled images. WEI deals with video input and diffuses edge depths like ours, but invalidates wrong interpolants via interview checking. Experiments were run on multithreaded CPUs using OpenMP, and NVIDIA GeForce GTX 680 (KIM) and 1080 Ti (others) GPUs.

4.2. Evaluation

 

Fig. 9 Synthesized images for a real view of real-world scenes using (a) the smoothest-surface, (b) densest-surface, and (c) optimal depth maps. (c) improves (a) and (b) by 1.67 dB and 1.09 dB in term of PSNR for Building, as well as 4.08 dB and 1.81 dB for Boxes. The parts in red circles are enlarged for easier comparison. Original images are given by (d).

Download Full Size | PPT Slide | PDF

Figure 7 utilizes the ground truth for depth map comparisons among ours and other techniques, and also shows the completeness curves for varying relative error thresholds. Figures 7(a) and 7(b) demonstrate that the homogeneous surfaces reconstructed by BAI and KIM both suffer from discontinuities, where WEI and ours have better performance. However, according to Fig. 7(c), the depths of WEI have the most distribution on large relative errors (> 0.002), but the amount of its high-accuracy estimates is smaller than our algorithm. We computed the mean relative errors of these four approaches, obtaining 0.0053, 0.0075, 0.0043, and 0.0040 in turn. Therefore, ours yields the highest overall accuracy of depths. Although the refinement step of the proposed method produces several holes (only accounting for about 1.3% of the image resolution) in the depth map, our experiments confirm that they have negligible influence on the perspective image synthesis stage (see Fig. 9). Figure 8 shows our estimated depth maps for the real-world scenes.

To evaluate the effect of global depth optimization (Sec. 3.1.3) in our pipeline, Fig. 9 compares the synthesized images of an arbitrarily selected real view for peak signal-to-noise ratio (PSNR) calculation, using the smoothest-surface (Fig. 9(a)), densest-surface (Fig. 9(b)), and optimal (Fig. 9(c)) depth maps. Incorrect connections between objects can be found in Fig. 9(a), and many noises degrade the image qualities in Fig. 9(b). Comparatively, Fig. 9(c) achieves PSNR improvements of 1.67 dB and 1.09 dB with respect to the former two for Building (see the enlarged railings and stripes). For Boxes, the rendering qualities are further enhanced by up to 4.08 dB and 1.81 dB, respectively (see the enlarged letters).

Figure 10 shows the perspective images synthesized by our method for the virtual camera arrays. For all tested scenes, we used a 15 × 15 virtual camera array illustrated in Fig. 6. In the enlarged images, the geometry and textures of each scene can be clearly recognized. Figure 11 presents three of the slice images for each scene, which were generated from Fig. 10. By observing the regions focused on different depth planes, we can easily identify the objects and textures in the scenes. Especially, the toothpaste in Bathroom, the railings in Building, and the numbers and colored pens in Boxes are all recognizable.

 

Fig. 10 Synthesized images for all virtual views (left) using our estimated depth maps, among which four images (highlighted in yellow) of each scene are enlarged (right).

Download Full Size | PPT Slide | PDF

 

Fig. 11 Slice images at three distances produced by our SAII approach. The focused regions labeled by red circles are enlarged for better visualization.

Download Full Size | PPT Slide | PDF

On average, our SAII system took approximately 23 minutes on pre-processing, 1.6 minutes on calculation of the smoothest-surface and densest-surface depth maps (for 100 real views), 49 minutes on global optimization, and 1.2 minutes on synthesis of new perspective images (for 125 virtual views), thus totally 74.8 minutes for each scene. Note that, the global optimization required around 29.4 seconds for each view, because this step was implemented using the public Matlab code of [7]. It is expected to significantly accelerate our whole pipeline if the code is converted to GPU processing. We also implemented the strategy of directly applying the video frames to produce one slice image. There was little difference in both quality and speed between its results and ours. However, this method was about 1.4 minutes slower when we shifted the depth of the slice image, as explained in Sec. 1.

5. Conclusions

The SAII based on unstructured videos provides the scene information acquisition more freedom than other existing SAII techniques. This paper has presented a new algorithm of computational 3D scene reconstruction and imaging for it. Considering that the edges already capture the primary characteristic structures in many scenes, and can be precisely reconstructed by exploiting the sufficient input data, we proposed to recover dense scene geometry and render virtual views using the edge depth maps of densely sampled frames. For each frame sample, two complete depth maps are calculated from the sparse depth initializers separately relying on the smoothness and density of local 3D surfaces, and afterward globally merged. This strategy enables higher robustness of our method to occlusions and texture absence in the real views. The accurate depth estimates allow efficient and high-quality synthesis of new perspective images, which finally generate clearly distinguishable slice images.

Discussion and future work

The reported approach is particularly suitable for large-scale indoor and urban environments which are mainly composed of prominent edges and planar homogeneous surfaces, even if the surfaces are occluded in some captured images. But it should be noted that, since our method fits the smoothest possible surfaces from edges, it can handle arbitrary scene structure. Although our proposal is limited to the scenes that only exhibit Lambertian reflection and are captured under ideal conditions, e.g. non-Lambertian properties, varying illumination, or rolling shutter would lead to erroneous depth values and 3D images, these are the common problems that most 3D reconstruction and view synthesis techniques still struggle to overcome. In these cases, it may be helpful to respect depth uncertainty during virtual view rendering for stronger tolerance to unreliable scene geometry. This extension is an interesting direction for future research. Moreover, due to the limitations of the experimental conditions, currently we can only provide computational results. Our next work will be to verify the 3D image quality achieved by the proposed method on an optical SAII platform.

Funding

National Key Research and Development Program of China (2017YFB0404800); National Natural Science Foundation of China (NSFC) (61631009); Fundamental Research Funds for the Central Universities (2017TD-19).

References

1. G. Lippmann, “La photographie integrale,” C.R Acad. Sci. 146, 446–451 (1908).

2. J. S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27, 1144–1146 (2002). [CrossRef]  

3. X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26, 11084–11099 (2018). [CrossRef]   [PubMed]  

4. X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019). [CrossRef]  

5. C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013). [CrossRef]  

6. J. Wei, B. Resch, and H. P. A. Lensch, “Dense and occlusion-robust multi-view stereo for unstructured videos,” in Conf. Comput. and Robot Vis., (2016).

7. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in IEEE Intern. Conf. Comput. Vis., (2013).

8. N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

9. M. DaneshPanah, B. Javidi, and E. A. Watson, “Three dimensional imaging with randomly distributed sensors,” Opt. Express 16, 6368–6377 (2008). [CrossRef]   [PubMed]  

10. X. Xiao, M. DaneshPanah, M. Cho, and B. Javidi, “3d integral imaging using sparse sensors with unknown positions,” J. Disp. Technol. 6, 614–619 (2010). [CrossRef]  

11. D. C. Schedl, C. Birklbauer, and O. Bimber, “Optimized sampling for view interpolation in light fields using local dictionaries,” Comput. Vis. Image Und. 168, 93–103 (2018). [CrossRef]  

12. J. Wang, X. Xiao, and B. Javidi, “Three-dimensional integral imaging with flexible sensing,” Opt. Lett. 39, 6855–6858 (2014). [CrossRef]   [PubMed]  

13. R. Schulein, M. DaneshPanah, and B. Javidi, “3d imaging with axially distributed sensing,” Opt. Lett. 34, 2012–2014 (2009). [CrossRef]   [PubMed]  

14. D. Shin and M. Cho, “3d integral imaging display using axially recorded multiple images,” J. Opt. Soc. Korea 17, 410–414 (2013). [CrossRef]  

15. M. Guo, Y. Si, Y. Lyu, S. Wang, and F. Jin, “Elemental image array generation based on discrete viewpoint pickup and window interception in integral imaging,” Appl. Opt. 54, 876–884 (2015). [CrossRef]   [PubMed]  

16. Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016). [CrossRef]  

17. S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016). [CrossRef]  

18. J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016). [CrossRef]  

19. B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).

20. J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conf. Comput. Vis., (2014).

21. A. P. Gee, D. Chekhlov, W. Mayol-Cuevas, and A. Calway, “Discovering planes and collapsing the state space in visual slam,” in British Machine Vis. Conf., (2007).

22. A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).

23. R. A. Newcombe and A. J. Davison, “Live dense reconstruction with a single moving camera,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2010).

24. Z. Kang and G. Medioni, “Progressive 3d model acquisition with a commodity hand-held camera,” in IEEE Winter Conf. Appl. Comput. Vis., (2015).

25. R. Rzeszutek and D. Androutsos, “A framework for estimating relative depth in video,” Comput. Vis. Image Und. 133, 15–29 (2015). [CrossRef]  

26. J. Wei, B. Resch, and H. P. A. Lensch, “Multi-view depth map estimation with cross-view consistency,” in British Machine Vis. Conf., (2014).

27. G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009). [CrossRef]   [PubMed]  

28. C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vis. Conf., (2013).

29. J. Wei, B. Resch, and H. P. A. Lensch, “Dense and scalable reconstruction from unstructured videos with occlusions,” in Intern. Symp. on Vis., Modeling and Visual., (2017).

30. S. W. Min, J. Kim, and B. Lee, “New characteristic equation of three-dimensional integral imaging system and its applications,” Jpn. J. Appl. Phys. Lett. 44, L71–L74 (2005). [CrossRef]  

31. K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D, 233–241 (2007). [CrossRef]  

32. K. Yanaka, “Integral photography using hexagonal fly’s eye lens and fractional view,” Proc. SPIE 6803, 68031K (2008).

33. M. Halle, “Multiple viewpoint rendering,” in Proc. Comput. Graph. Interactive Technol., (1998).

34. S. Xing, X. Sang, X. Yu, C. Duo, B. Pang, X. Gao, S. Yang, Y. Guan, B. Yan, J. Yuan, and K. Wang, “High-efficient computer-generated integral imaging based on the backward ray-tracing technique and optical reconstruction,” Opt. Express 25, 330–338 (2017). [CrossRef]   [PubMed]  

35. H. Y. Shum, S. C. Chan, and S. B. Kang, in Image-based rendering, (Springer-Verlag, 2006).

36. G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

37. S. M. Seitz and C. R. Dyer, “View morphing,” in Proc. Comput. Graph. Interactive Technol., (1996).

38. S. Vedula, S. Baker, and T. Kanade, “Image-based spatio-temporal modeling and view interpolation of dynamic events,” ACM Trans. Graph. 24, 240–261 (2005). [CrossRef]  

39. S. Pujades, F. Devernay, and B. Goldluecke, “Bayesian view synthesis and image-based rendering principles,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2014).

40. E. Penner and L. Zhang, “Soft 3d reconstruction for view synthesis,” ACM Trans. Graph. 36, 235 (2017). [CrossRef]  

41. F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).

42. G. Chaurasia, O. Sorkine, and G. Drettakis, “Silhouette aware warping for image based rendering,” Comput. Graph. Forum 30, 1223–1232 (2011). [CrossRef]  

43. G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013). [CrossRef]  

44. S. Jeschke, D. Cline, and P. Wonka, “A gpu laplacian solver for diffusion curves and poisson image editing,” ACM Trans. Graph. 28, 116 (2009). [CrossRef]  

45. C. Bailer, M. Finckh, and H. Lensch, “Scale robust multi view stereo,” in European Conf. Comput. Vis.V, (2012).

References

  • View by:
  • |
  • |
  • |

  1. G. Lippmann, “La photographie integrale,” C.R Acad. Sci. 146, 446–451 (1908).
  2. J. S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27, 1144–1146 (2002).
    [Crossref]
  3. X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26, 11084–11099 (2018).
    [Crossref] [PubMed]
  4. X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
    [Crossref]
  5. C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
    [Crossref]
  6. J. Wei, B. Resch, and H. P. A. Lensch, “Dense and occlusion-robust multi-view stereo for unstructured videos,” in Conf. Comput. and Robot Vis., (2016).
  7. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in IEEE Intern. Conf. Comput. Vis., (2013).
  8. N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).
  9. M. DaneshPanah, B. Javidi, and E. A. Watson, “Three dimensional imaging with randomly distributed sensors,” Opt. Express 16, 6368–6377 (2008).
    [Crossref] [PubMed]
  10. X. Xiao, M. DaneshPanah, M. Cho, and B. Javidi, “3d integral imaging using sparse sensors with unknown positions,” J. Disp. Technol. 6, 614–619 (2010).
    [Crossref]
  11. D. C. Schedl, C. Birklbauer, and O. Bimber, “Optimized sampling for view interpolation in light fields using local dictionaries,” Comput. Vis. Image Und. 168, 93–103 (2018).
    [Crossref]
  12. J. Wang, X. Xiao, and B. Javidi, “Three-dimensional integral imaging with flexible sensing,” Opt. Lett. 39, 6855–6858 (2014).
    [Crossref] [PubMed]
  13. R. Schulein, M. DaneshPanah, and B. Javidi, “3d imaging with axially distributed sensing,” Opt. Lett. 34, 2012–2014 (2009).
    [Crossref] [PubMed]
  14. D. Shin and M. Cho, “3d integral imaging display using axially recorded multiple images,” J. Opt. Soc. Korea 17, 410–414 (2013).
    [Crossref]
  15. M. Guo, Y. Si, Y. Lyu, S. Wang, and F. Jin, “Elemental image array generation based on discrete viewpoint pickup and window interception in integral imaging,” Appl. Opt. 54, 876–884 (2015).
    [Crossref] [PubMed]
  16. Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016).
    [Crossref]
  17. S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
    [Crossref]
  18. J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
    [Crossref]
  19. B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).
  20. J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conf. Comput. Vis., (2014).
  21. A. P. Gee, D. Chekhlov, W. Mayol-Cuevas, and A. Calway, “Discovering planes and collapsing the state space in visual slam,” in British Machine Vis. Conf., (2007).
  22. A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).
  23. R. A. Newcombe and A. J. Davison, “Live dense reconstruction with a single moving camera,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2010).
  24. Z. Kang and G. Medioni, “Progressive 3d model acquisition with a commodity hand-held camera,” in IEEE Winter Conf. Appl. Comput. Vis., (2015).
  25. R. Rzeszutek and D. Androutsos, “A framework for estimating relative depth in video,” Comput. Vis. Image Und. 133, 15–29 (2015).
    [Crossref]
  26. J. Wei, B. Resch, and H. P. A. Lensch, “Multi-view depth map estimation with cross-view consistency,” in British Machine Vis. Conf., (2014).
  27. G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009).
    [Crossref] [PubMed]
  28. C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vis. Conf., (2013).
  29. J. Wei, B. Resch, and H. P. A. Lensch, “Dense and scalable reconstruction from unstructured videos with occlusions,” in Intern. Symp. on Vis., Modeling and Visual., (2017).
  30. S. W. Min, J. Kim, and B. Lee, “New characteristic equation of three-dimensional integral imaging system and its applications,” Jpn. J. Appl. Phys. Lett. 44, L71–L74 (2005).
    [Crossref]
  31. K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D, 233–241 (2007).
    [Crossref]
  32. K. Yanaka, “Integral photography using hexagonal fly’s eye lens and fractional view,” Proc. SPIE 6803, 68031K (2008).
  33. M. Halle, “Multiple viewpoint rendering,” in Proc. Comput. Graph. Interactive Technol., (1998).
  34. S. Xing, X. Sang, X. Yu, C. Duo, B. Pang, X. Gao, S. Yang, Y. Guan, B. Yan, J. Yuan, and K. Wang, “High-efficient computer-generated integral imaging based on the backward ray-tracing technique and optical reconstruction,” Opt. Express 25, 330–338 (2017).
    [Crossref] [PubMed]
  35. H. Y. Shum, S. C. Chan, and S. B. Kang, in Image-based rendering, (Springer-Verlag, 2006).
  36. G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).
  37. S. M. Seitz and C. R. Dyer, “View morphing,” in Proc. Comput. Graph. Interactive Technol., (1996).
  38. S. Vedula, S. Baker, and T. Kanade, “Image-based spatio-temporal modeling and view interpolation of dynamic events,” ACM Trans. Graph. 24, 240–261 (2005).
    [Crossref]
  39. S. Pujades, F. Devernay, and B. Goldluecke, “Bayesian view synthesis and image-based rendering principles,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2014).
  40. E. Penner and L. Zhang, “Soft 3d reconstruction for view synthesis,” ACM Trans. Graph. 36, 235 (2017).
    [Crossref]
  41. F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).
  42. G. Chaurasia, O. Sorkine, and G. Drettakis, “Silhouette aware warping for image based rendering,” Comput. Graph. Forum 30, 1223–1232 (2011).
    [Crossref]
  43. G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013).
    [Crossref]
  44. S. Jeschke, D. Cline, and P. Wonka, “A gpu laplacian solver for diffusion curves and poisson image editing,” ACM Trans. Graph. 28, 116 (2009).
    [Crossref]
  45. C. Bailer, M. Finckh, and H. Lensch, “Scale robust multi view stereo,” in European Conf. Comput. Vis.V, (2012).

2019 (1)

X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
[Crossref]

2018 (2)

X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26, 11084–11099 (2018).
[Crossref] [PubMed]

D. C. Schedl, C. Birklbauer, and O. Bimber, “Optimized sampling for view interpolation in light fields using local dictionaries,” Comput. Vis. Image Und. 168, 93–103 (2018).
[Crossref]

2017 (2)

2016 (3)

Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016).
[Crossref]

S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
[Crossref]

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

2015 (2)

2014 (1)

2013 (3)

C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
[Crossref]

D. Shin and M. Cho, “3d integral imaging display using axially recorded multiple images,” J. Opt. Soc. Korea 17, 410–414 (2013).
[Crossref]

G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013).
[Crossref]

2011 (1)

G. Chaurasia, O. Sorkine, and G. Drettakis, “Silhouette aware warping for image based rendering,” Comput. Graph. Forum 30, 1223–1232 (2011).
[Crossref]

2010 (1)

X. Xiao, M. DaneshPanah, M. Cho, and B. Javidi, “3d integral imaging using sparse sensors with unknown positions,” J. Disp. Technol. 6, 614–619 (2010).
[Crossref]

2009 (4)

R. Schulein, M. DaneshPanah, and B. Javidi, “3d imaging with axially distributed sensing,” Opt. Lett. 34, 2012–2014 (2009).
[Crossref] [PubMed]

G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009).
[Crossref] [PubMed]

S. Jeschke, D. Cline, and P. Wonka, “A gpu laplacian solver for diffusion curves and poisson image editing,” ACM Trans. Graph. 28, 116 (2009).
[Crossref]

F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).

2008 (2)

K. Yanaka, “Integral photography using hexagonal fly’s eye lens and fractional view,” Proc. SPIE 6803, 68031K (2008).

M. DaneshPanah, B. Javidi, and E. A. Watson, “Three dimensional imaging with randomly distributed sensors,” Opt. Express 16, 6368–6377 (2008).
[Crossref] [PubMed]

2007 (1)

K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D, 233–241 (2007).
[Crossref]

2005 (2)

S. W. Min, J. Kim, and B. Lee, “New characteristic equation of three-dimensional integral imaging system and its applications,” Jpn. J. Appl. Phys. Lett. 44, L71–L74 (2005).
[Crossref]

S. Vedula, S. Baker, and T. Kanade, “Image-based spatio-temporal modeling and view interpolation of dynamic events,” ACM Trans. Graph. 24, 240–261 (2005).
[Crossref]

2002 (1)

1908 (1)

G. Lippmann, “La photographie integrale,” C.R Acad. Sci. 146, 446–451 (1908).

Agarwala, A.

F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).

Allié, V.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Androutsos, D.

R. Rzeszutek and D. Androutsos, “A framework for estimating relative depth in video,” Comput. Vis. Image Und. 133, 15–29 (2015).
[Crossref]

Babon, F.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Bailer, C.

C. Bailer, M. Finckh, and H. Lensch, “Scale robust multi view stereo,” in European Conf. Comput. Vis.V, (2012).

Baker, S.

S. Vedula, S. Baker, and T. Kanade, “Image-based spatio-temporal modeling and view interpolation of dynamic events,” ACM Trans. Graph. 24, 240–261 (2005).
[Crossref]

Bao, H.

G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009).
[Crossref] [PubMed]

Barreiro, J. C.

S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
[Crossref]

Bimber, O.

D. C. Schedl, C. Birklbauer, and O. Bimber, “Optimized sampling for view interpolation in light fields using local dictionaries,” Comput. Vis. Image Und. 168, 93–103 (2018).
[Crossref]

Birklbauer, C.

D. C. Schedl, C. Birklbauer, and O. Bimber, “Optimized sampling for view interpolation in light fields using local dictionaries,” Comput. Vis. Image Und. 168, 93–103 (2018).
[Crossref]

Bischof, H.

C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vis. Conf., (2013).

Boisson, G.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Bureller, O.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Calway, A.

A. P. Gee, D. Chekhlov, W. Mayol-Cuevas, and A. Calway, “Discovering planes and collapsing the state space in visual slam,” in British Machine Vis. Conf., (2007).

Chai, T.

G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

Chan, S. C.

H. Y. Shum, S. C. Chan, and S. B. Kang, in Image-based rendering, (Springer-Verlag, 2006).

Chaurasia, G.

G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013).
[Crossref]

G. Chaurasia, O. Sorkine, and G. Drettakis, “Silhouette aware warping for image based rendering,” Comput. Graph. Forum 30, 1223–1232 (2011).
[Crossref]

Chekhlov, D.

A. P. Gee, D. Chekhlov, W. Mayol-Cuevas, and A. Calway, “Discovering planes and collapsing the state space in visual slam,” in British Machine Vis. Conf., (2007).

Chen, Y.

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

Cho, M.

Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016).
[Crossref]

D. Shin and M. Cho, “3d integral imaging display using axially recorded multiple images,” J. Opt. Soc. Korea 17, 410–414 (2013).
[Crossref]

X. Xiao, M. DaneshPanah, M. Cho, and B. Javidi, “3d integral imaging using sparse sensors with unknown positions,” J. Disp. Technol. 6, 614–619 (2010).
[Crossref]

Cho, Y.

K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D, 233–241 (2007).
[Crossref]

Cline, D.

S. Jeschke, D. Cline, and P. Wonka, “A gpu laplacian solver for diffusion curves and poisson image editing,” ACM Trans. Graph. 28, 116 (2009).
[Crossref]

Cremers, D.

J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conf. Comput. Vis., (2014).

Dai, Q.

G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

DaneshPanah, M.

Davison, A. J.

R. A. Newcombe and A. J. Davison, “Live dense reconstruction with a single moving camera,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2010).

Dellaert, F.

A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).

Devernay, F.

S. Pujades, F. Devernay, and B. Goldluecke, “Bayesian view synthesis and image-based rendering principles,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2014).

Donoser, M.

C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vis. Conf., (2013).

Dorado, A.

S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
[Crossref]

Drettakis, G.

G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013).
[Crossref]

G. Chaurasia, O. Sorkine, and G. Drettakis, “Silhouette aware warping for image based rendering,” Comput. Graph. Forum 30, 1223–1232 (2011).
[Crossref]

Du, J.

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

Duchene, S.

G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013).
[Crossref]

Duo, C.

Dyer, C. R.

S. M. Seitz and C. R. Dyer, “View morphing,” in Proc. Comput. Graph. Interactive Technol., (1996).

Engel, J.

J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conf. Comput. Vis., (2014).

Finckh, M.

C. Bailer, M. Finckh, and H. Lensch, “Scale robust multi view stereo,” in European Conf. Comput. Vis.V, (2012).

Gao, X.

Gee, A. P.

A. P. Gee, D. Chekhlov, W. Mayol-Cuevas, and A. Calway, “Discovering planes and collapsing the state space in visual slam,” in British Machine Vis. Conf., (2007).

Gendrot, R.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Gleicher, M.

F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).

Goldluecke, B.

S. Pujades, F. Devernay, and B. Goldluecke, “Bayesian view synthesis and image-based rendering principles,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2014).

Gross, M.

C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
[Crossref]

Guan, Y.

Guo, M.

Hadap, S.

M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in IEEE Intern. Conf. Comput. Vis., (2013).

Halle, M.

M. Halle, “Multiple viewpoint rendering,” in Proc. Comput. Graph. Interactive Technol., (1998).

Hog, M.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Hong, S.

S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
[Crossref]

Hoppe, C.

C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vis. Conf., (2013).

Jang, J. S.

Javidi, B.

Jeschke, S.

S. Jeschke, D. Cline, and P. Wonka, “A gpu laplacian solver for diffusion curves and poisson image editing,” ACM Trans. Graph. 28, 116 (2009).
[Crossref]

Jia, J.

G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009).
[Crossref] [PubMed]

Jin, F.

Jin, H.

F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).

Kanade, T.

S. Vedula, S. Baker, and T. Kanade, “Image-based spatio-temporal modeling and view interpolation of dynamic events,” ACM Trans. Graph. 24, 240–261 (2005).
[Crossref]

Kang, S. B.

H. Y. Shum, S. C. Chan, and S. B. Kang, in Image-based rendering, (Springer-Verlag, 2006).

Kang, Z.

Z. Kang and G. Medioni, “Progressive 3d model acquisition with a commodity hand-held camera,” in IEEE Winter Conf. Appl. Comput. Vis., (2015).

Kerbiriou, P.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Kim, C.

C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
[Crossref]

Kim, J.

S. W. Min, J. Kim, and B. Lee, “New characteristic equation of three-dimensional integral imaging system and its applications,” Jpn. J. Appl. Phys. Lett. 44, L71–L74 (2005).
[Crossref]

Kim, S. T.

Klopschitz, M.

C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vis. Conf., (2013).

Kundu, A.

A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).

Langlois, T.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Lee, B.

S. W. Min, J. Kim, and B. Lee, “New characteristic equation of three-dimensional integral imaging system and its applications,” Jpn. J. Appl. Phys. Lett. 44, L71–L74 (2005).
[Crossref]

Lensch, H.

C. Bailer, M. Finckh, and H. Lensch, “Scale robust multi view stereo,” in European Conf. Comput. Vis.V, (2012).

Lensch, H. P. A.

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and scalable reconstruction from unstructured videos with occlusions,” in Intern. Symp. on Vis., Modeling and Visual., (2017).

J. Wei, B. Resch, and H. P. A. Lensch, “Multi-view depth map estimation with cross-view consistency,” in British Machine Vis. Conf., (2014).

B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and occlusion-robust multi-view stereo for unstructured videos,” in Conf. Comput. and Robot Vis., (2016).

Li, F.

A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).

Li, L.

Li, X.

X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
[Crossref]

X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26, 11084–11099 (2018).
[Crossref] [PubMed]

Li, Y.

A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).

Lippmann, G.

G. Lippmann, “La photographie integrale,” C.R Acad. Sci. 146, 446–451 (1908).

Liu, F.

F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).

Liu, Y.

X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
[Crossref]

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

Lyu, Y.

Malik, J.

M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in IEEE Intern. Conf. Comput. Vis., (2013).

Martinez-Corral, M.

S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
[Crossref]

Mayol-Cuevas, W.

A. P. Gee, D. Chekhlov, W. Mayol-Cuevas, and A. Calway, “Discovering planes and collapsing the state space in visual slam,” in British Machine Vis. Conf., (2007).

Medioni, G.

Z. Kang and G. Medioni, “Progressive 3d model acquisition with a commodity hand-held camera,” in IEEE Winter Conf. Appl. Comput. Vis., (2015).

Min, S. W.

K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D, 233–241 (2007).
[Crossref]

S. W. Min, J. Kim, and B. Lee, “New characteristic equation of three-dimensional integral imaging system and its applications,” Jpn. J. Appl. Phys. Lett. 44, L71–L74 (2005).
[Crossref]

Newcombe, R. A.

R. A. Newcombe and A. J. Davison, “Live dense reconstruction with a single moving camera,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2010).

Pang, B.

Park, K. S.

K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D, 233–241 (2007).
[Crossref]

Penner, E.

E. Penner and L. Zhang, “Soft 3d reconstruction for view synthesis,” ACM Trans. Graph. 36, 235 (2017).
[Crossref]

Piao, Y.

Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016).
[Crossref]

Pollefeys, M.

B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).

Pritch, Y.

C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
[Crossref]

Pujades, S.

S. Pujades, F. Devernay, and B. Goldluecke, “Bayesian view synthesis and image-based rendering principles,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2014).

Qu, H.

Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016).
[Crossref]

Ramamoorthi, R.

M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in IEEE Intern. Conf. Comput. Vis., (2013).

Rehg, J. M.

A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).

Resch, B.

J. Wei, B. Resch, and H. P. A. Lensch, “Multi-view depth map estimation with cross-view consistency,” in British Machine Vis. Conf., (2014).

B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and scalable reconstruction from unstructured videos with occlusions,” in Intern. Symp. on Vis., Modeling and Visual., (2017).

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and occlusion-robust multi-view stereo for unstructured videos,” in Conf. Comput. and Robot Vis., (2016).

Rzeszutek, R.

R. Rzeszutek and D. Androutsos, “A framework for estimating relative depth in video,” Comput. Vis. Image Und. 133, 15–29 (2015).
[Crossref]

Saavedra, G.

S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
[Crossref]

Sabater, N.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Sang, X.

Schedl, D. C.

D. C. Schedl, C. Birklbauer, and O. Bimber, “Optimized sampling for view interpolation in light fields using local dictionaries,” Comput. Vis. Image Und. 168, 93–103 (2018).
[Crossref]

Schops, T.

J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conf. Comput. Vis., (2014).

Schubert, A.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Schulein, R.

Seitz, S. M.

S. M. Seitz and C. R. Dyer, “View morphing,” in Proc. Comput. Graph. Interactive Technol., (1996).

Shin, D.

Shum, H. Y.

H. Y. Shum, S. C. Chan, and S. B. Kang, in Image-based rendering, (Springer-Verlag, 2006).

Si, Y.

Solkine-Hornung, A.

B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).

Sorkine, O.

G. Chaurasia, O. Sorkine, and G. Drettakis, “Silhouette aware warping for image based rendering,” Comput. Graph. Forum 30, 1223–1232 (2011).
[Crossref]

Sorkine-Hornung, A.

C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
[Crossref]

Sorkine-Hornung, O.

G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013).
[Crossref]

Tao, M. W.

M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in IEEE Intern. Conf. Comput. Vis., (2013).

Vandame, B.

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

Vedula, S.

S. Vedula, S. Baker, and T. Kanade, “Image-based spatio-temporal modeling and view interpolation of dynamic events,” ACM Trans. Graph. 24, 240–261 (2005).
[Crossref]

Wang, J.

Wang, K.

Wang, L.

G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

Wang, O.

B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).

Wang, Q. H.

X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
[Crossref]

X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26, 11084–11099 (2018).
[Crossref] [PubMed]

Wang, S.

Wang, X.

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

Wang, Y.

X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
[Crossref]

Watson, E. A.

Wei, J.

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and occlusion-robust multi-view stereo for unstructured videos,” in Conf. Comput. and Robot Vis., (2016).

J. Wei, B. Resch, and H. P. A. Lensch, “Multi-view depth map estimation with cross-view consistency,” in British Machine Vis. Conf., (2014).

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and scalable reconstruction from unstructured videos with occlusions,” in Intern. Symp. on Vis., Modeling and Visual., (2017).

Wong, T. T.

G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009).
[Crossref] [PubMed]

Wonka, P.

S. Jeschke, D. Cline, and P. Wonka, “A gpu laplacian solver for diffusion curves and poisson image editing,” ACM Trans. Graph. 28, 116 (2009).
[Crossref]

Wu, G.

G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

Xiao, X.

J. Wang, X. Xiao, and B. Javidi, “Three-dimensional integral imaging with flexible sensing,” Opt. Lett. 39, 6855–6858 (2014).
[Crossref] [PubMed]

X. Xiao, M. DaneshPanah, M. Cho, and B. Javidi, “3d integral imaging using sparse sensors with unknown positions,” J. Disp. Technol. 6, 614–619 (2010).
[Crossref]

Xing, S.

Xing, Y.

Yan, B.

Yanaka, K.

K. Yanaka, “Integral photography using hexagonal fly’s eye lens and fractional view,” Proc. SPIE 6803, 68031K (2008).

Yang, S.

Yu, X.

Yuan, J.

Zhang, G.

G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009).
[Crossref] [PubMed]

Zhang, H. L.

Zhang, J.

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

Zhang, L.

E. Penner and L. Zhang, “Soft 3d reconstruction for view synthesis,” ACM Trans. Graph. 36, 235 (2017).
[Crossref]

Zhang, M.

Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016).
[Crossref]

Zhang, Q.

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

Zhao, M.

X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26, 11084–11099 (2018).
[Crossref] [PubMed]

G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

Zhou, X.

X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
[Crossref]

X. Li, M. Zhao, Y. Xing, H. L. Zhang, L. Li, S. T. Kim, X. Zhou, and Q. H. Wang, “Designing optical 3d images encryption and reconstruction using monospectral synthetic aperture integral imaging,” Opt. Express 26, 11084–11099 (2018).
[Crossref] [PubMed]

Zimmer, H.

C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
[Crossref]

ACM Trans. Graph. (6)

C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32, 73 (2013).
[Crossref]

S. Vedula, S. Baker, and T. Kanade, “Image-based spatio-temporal modeling and view interpolation of dynamic events,” ACM Trans. Graph. 24, 240–261 (2005).
[Crossref]

E. Penner and L. Zhang, “Soft 3d reconstruction for view synthesis,” ACM Trans. Graph. 36, 235 (2017).
[Crossref]

F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content preserving warps for 3d video stabilization,” ACM Trans. Graph. 28, 1–9 (2009).

G. Chaurasia, S. Duchene, O. Sorkine-Hornung, and G. Drettakis, “Depth synthesis and local warps for plausible image-based navigation,” ACM Trans. Graph. 32, 1–12 (2013).
[Crossref]

S. Jeschke, D. Cline, and P. Wonka, “A gpu laplacian solver for diffusion curves and poisson image editing,” ACM Trans. Graph. 28, 116 (2009).
[Crossref]

Appl. Opt. (1)

C.R Acad. Sci. (1)

G. Lippmann, “La photographie integrale,” C.R Acad. Sci. 146, 446–451 (1908).

Comput. Graph. Forum (1)

G. Chaurasia, O. Sorkine, and G. Drettakis, “Silhouette aware warping for image based rendering,” Comput. Graph. Forum 30, 1223–1232 (2011).
[Crossref]

Comput. Vis. Image Und. (2)

R. Rzeszutek and D. Androutsos, “A framework for estimating relative depth in video,” Comput. Vis. Image Und. 133, 15–29 (2015).
[Crossref]

D. C. Schedl, C. Birklbauer, and O. Bimber, “Optimized sampling for view interpolation in light fields using local dictionaries,” Comput. Vis. Image Und. 168, 93–103 (2018).
[Crossref]

IEEE Trans. Pattern Anal. Mach. Intell. (1)

G. Zhang, J. Jia, T. T. Wong, and H. Bao, “Consistent depth maps recovery from a video sequence,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009).
[Crossref] [PubMed]

IEICE Trans. Inf. Syst. (1)

K. S. Park, S. W. Min, and Y. Cho, “Viewpoint vector rendering for efficient elemental image generation,” IEICE Trans. Inf. Syst. E90-D, 233–241 (2007).
[Crossref]

J. Disp. Technol. (2)

X. Xiao, M. DaneshPanah, M. Cho, and B. Javidi, “3d integral imaging using sparse sensors with unknown positions,” J. Disp. Technol. 6, 614–619 (2010).
[Crossref]

S. Hong, A. Dorado, G. Saavedra, J. C. Barreiro, and M. Martinez-Corral, “Three-dimensional integral-imaging display from calibrated and depth-hole filtered kinect information,” J. Disp. Technol. 12, 1301–1308 (2016).
[Crossref]

J. Opt. Soc. Korea (1)

Jpn. J. Appl. Phys. Lett. (1)

S. W. Min, J. Kim, and B. Lee, “New characteristic equation of three-dimensional integral imaging system and its applications,” Jpn. J. Appl. Phys. Lett. 44, L71–L74 (2005).
[Crossref]

Opt. Express (3)

Opt. Lasers Eng. (2)

X. Li, Y. Wang, Q. H. Wang, Y. Liu, and X. Zhou, “Modified integral imaging reconstruction and encryption using an improved sr reconstruction algorithm,” Opt. Lasers Eng. 112, 162–169 (2019).
[Crossref]

Y. Piao, H. Qu, M. Zhang, and M. Cho, “Three-dimensional integral imaging display system via off-axially distributed image sensing,” Opt. Lasers Eng. 85, 18–23 (2016).
[Crossref]

Opt. Lett. (3)

Optik-Int. J. Light. Electron Opt. (1)

J. Zhang, X. Wang, Q. Zhang, Y. Chen, J. Du, and Y. Liu, “Integral imaging display for natural scene based on kinectfusion,” Optik-Int. J. Light. Electron Opt. 127, 791–794 (2016).
[Crossref]

Proc. SPIE (1)

K. Yanaka, “Integral photography using hexagonal fly’s eye lens and fractional view,” Proc. SPIE 6803, 68031K (2008).

Other (18)

M. Halle, “Multiple viewpoint rendering,” in Proc. Comput. Graph. Interactive Technol., (1998).

S. Pujades, F. Devernay, and B. Goldluecke, “Bayesian view synthesis and image-based rendering principles,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2014).

C. Bailer, M. Finckh, and H. Lensch, “Scale robust multi view stereo,” in European Conf. Comput. Vis.V, (2012).

B. Resch, H. P. A. Lensch, O. Wang, M. Pollefeys, and A. Solkine-Hornung, “Scalable structure from motion for densely sampled videos,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2015).

J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conf. Comput. Vis., (2014).

A. P. Gee, D. Chekhlov, W. Mayol-Cuevas, and A. Calway, “Discovering planes and collapsing the state space in visual slam,” in British Machine Vis. Conf., (2007).

A. Kundu, Y. Li, F. Dellaert, F. Li, and J. M. Rehg, “Joint semantic segmentation and 3d reconstruction from monocular video,” in European Conf. Comput. Vis., (2014).

R. A. Newcombe and A. J. Davison, “Live dense reconstruction with a single moving camera,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2010).

Z. Kang and G. Medioni, “Progressive 3d model acquisition with a commodity hand-held camera,” in IEEE Winter Conf. Appl. Comput. Vis., (2015).

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and occlusion-robust multi-view stereo for unstructured videos,” in Conf. Comput. and Robot Vis., (2016).

M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in IEEE Intern. Conf. Comput. Vis., (2013).

N. Sabater, G. Boisson, B. Vandame, P. Kerbiriou, F. Babon, M. Hog, R. Gendrot, T. Langlois, O. Bureller, A. Schubert, and V. Allié, “Dataset and pipeline for multi-view light-field video,” in IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (2017).

J. Wei, B. Resch, and H. P. A. Lensch, “Multi-view depth map estimation with cross-view consistency,” in British Machine Vis. Conf., (2014).

H. Y. Shum, S. C. Chan, and S. B. Kang, in Image-based rendering, (Springer-Verlag, 2006).

G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in IEEE Conf. Comput. Vis. Pattern Recognit., (2017).

S. M. Seitz and C. R. Dyer, “View morphing,” in Proc. Comput. Graph. Interactive Technol., (1996).

C. Hoppe, M. Klopschitz, M. Donoser, and H. Bischof, “Incremental surface extraction from sparse structure-from-motion point clouds,” in British Machine Vis. Conf., (2013).

J. Wei, B. Resch, and H. P. A. Lensch, “Dense and scalable reconstruction from unstructured videos with occlusions,” in Intern. Symp. on Vis., Modeling and Visual., (2017).

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1
Fig. 1 Principle of traditional SAII technique. A group of perspective images are recorded by a camera array, and then projected inversely through a corresponding pinhole array. The projections on each depth plane are overlapped and accumulated, generating a slice image.
Fig. 2
Fig. 2 Pipeline of the proposed SAII approach.
Fig. 3
Fig. 3 Depth diffusion from two pixels p1 and p2 to the mid-pixel p. Point Pw represents the projected isotropic interpolant, defining the diffused depth as d w = 1 2 ( d 1 + d 2 ) . Pr is the assumed point on the principle surface (green), and dr is the corresponding real depth. O is the camera center.
Fig. 4
Fig. 4 Edge depth map and our obtained closest-surface depth map, as well as 3D points reconstructed by isotropic and proposed perspective diffusion methods. The floor and ceiling are enlarged for more clear comparison. Isotropic interpolation creates distorted surfaces while ours produces smooth and flat results.
Fig. 5
Fig. 5 Comparisons of the smoothest-surface, densest-surface, and optimal depth maps of the same view, as well as their reconstructed 3D points. The point clouds in (d), (f), and (i) are rotated for better visualization.
Fig. 6
Fig. 6 Trajectory of the video camera (gray) and visualization of the 15 × 15 virtual camera array (green) used for each tested scene in our experiments. All virtual cameras have the same intrinsic parameters with the real camera.
Fig. 7
Fig. 7 Comparisons of depth maps against the ground truth. We present (a) the depth maps, (b) relative error maps, and (c) distribution measuring how much of the estimated depth has a smaller relative error than a given threshold. In (b), the blue pixels have no depth, the red have an error larger than 0.01, the green have no ground truth data, and the pixels with an error between 0 and 0.01 are marked gray 0 to 255.
Fig. 8
Fig. 8 Edge depth maps and our estimated dense depth maps for the real-world scenes.
Fig. 9
Fig. 9 Synthesized images for a real view of real-world scenes using (a) the smoothest-surface, (b) densest-surface, and (c) optimal depth maps. (c) improves (a) and (b) by 1.67 dB and 1.09 dB in term of PSNR for Building, as well as 4.08 dB and 1.81 dB for Boxes. The parts in red circles are enlarged for easier comparison. Original images are given by (d).
Fig. 10
Fig. 10 Synthesized images for all virtual views (left) using our estimated depth maps, among which four images (highlighted in yellow) of each scene are enlarged (right).
Fig. 11
Fig. 11 Slice images at three distances produced by our SAII approach. The focused regions labeled by red circles are enlarged for better visualization.

Equations (8)

Equations on this page are rendered with MathJax. Learn more.

d x = δ 1 δ 2 d r + ( 1 δ 1 δ 2 ) ( δ 1 D t i ( p 1 ) + δ 2 D t i ( p 2 ) )
D t + 1 i ( p ) = δ ( d x ) d x + δ ( d y ) d y δ
C smoothest i ( p ) = { 0 , if s ( p ) < s left ( p ) or s ( p ) < s right ( p ) 1 exp ( min ( | s ( p ) s left ( p ) | 2 , | s ( p ) s right ( p ) | 2 2 σ s 2 ) , otherwise
( j * , q * ) = arg min ( j , q ) S j ( q ) , s . t P j i ( D smoothest j ( q ) ) = p D densest i ( p ) = T j * i ( D smoothest j * ( q * ) )
C densest i ( p ) = C smoothest j * ( q * ) c color ( p , q * ) , c color ( p , q * ) = exp ( I i ( p ) I j * ( q * ) 2 2 σ c 2
D optimal i = arg min Z source λ source j W source j | Z j Z source j | + λ flat ( x , y ) ( | Z j x | ( x , y ) + | Z j y | ( x , y ) ) + λ smooth ( x , y ) | Δ Z j | ( x , y )
{ Z source 1 , Z source 2 } = { D smoothest i , D densest i } { W source 1 , W source 2 } = { C smoothest i , C densest i }
( j * , q * ) = arg min ( j , q ) T j ν ( D optimal j ( q ) ) , s . t . P j ν ( D optimal j ( q ) ) = p I ν ( p ) = I frame j * ( q * )

Metrics