Occlusion is one of the most important issues in light-field depth estimation. In this paper, we propose a light-field multi-occlusion model with the analysis of light transmission. By the model, occlusions in different views are discussed separately. An adaptive algorithm of anti-occlusion in the central view is proposed to obtain more precise consistency regions (unoccluded views) in the angular domain and a subpatch approach of anti-occlusion in other views is presented to optimize the initial depth maps, where depth boundaries are better preserved. Then we propose a curvature confidence analysis approach to make depth evaluation more accurate and it is designed in an energy model to regularize the depth maps. Experimental results demonstrate that the proposed algorithm achieves better subjective and objective quality in depth maps compared with state-of-the-art algorithms.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
With 3D and VR (virtual reality) technology growing quickly, depth estimation becomes one of the most popular research issues for the last few years. Because light-field camera can provide fruitful information captured by multiple viewpoints from a single shot [1–4], various approaches [5–9] (e.g. epipolar plane image, focal stack, and angular patch) have been proposed to develop light field depth estimation algorithms. Wanner et al.  utilized structure tensor to obtain dominant directions on epipolar plane images. Tao et al. [11, 12] firstly combined the defocus cue and the correspondence cue, then the shadow cue was added to make depth map more precise. Jeon et al.  made use of the phase shift theorem in the Fourier domain to estimate the sub-pixel shifts of sub-aperture images. These algorithms fail on occlusions because angular pixels may be observed from different objects due to occlusions, and photo-consistency assumption no longer holds.
Recently, Wang et al. [14, 15] explicitly modeled occlusions by using edge orientation to separate the angular patch into two equal regions by a straight line. The method improved depth results for line single-occlusion situations, but a straight line is not enough to deal with multi-occlusion situations. Zhu et al.  used k-means to separate the patch for selecting unoccluded views, and thus improves the performance with single-occlusion situation. However, the algorithm depends on the existing K-means strategy clustering effect, and it can not handle this situation with more complex clusters. Williem et al. [17,18] proposed a novel angular entropy and an adaptive defocus response to estimate the depths, but some regions are still over-smoothing because angular entropy is too random for complex details. There are also some other algorithms handling occlusions [19–21] and Each method has its own advantages and disadvantages.
In this paper, we propose a novel algorithm to solve the multi-occlusion problem. Our depth estimation algorithm is developed from occlusion in different views. For occlusion in the central view, an adaptive connected-domain selection algorithm is proposed to accurately separate the spatial patch. By the light-field multi-occlusion model between the spatial domain and the angular domain, the divided spatial patch can be mapped to the angular patch and the consistency regions (unoccluded views) can be obtained more precisely. The first moment and the second moment cues are applied on the consistency region to estimate depth. For occlusion in other views, we present a subpatch idea to design an estimated algorithm, which can obtain more accurate initial depth than the method without handling it. In order to evaluate initial depth more precisely, we present a creative way for confidence analysis, which considers curvature factor for the first time. Finally, a Markov Random Field (MRF) energy model is built to regularize depth maps. As shown in Fig. 1, the proposed algorithm provides much sharper boundaries and more precise details. Experimental results demonstrate that compared with state-of-the-art algorithms on light field, our method can obtain more accurate depth. The main contributions of this paper is as follows:
- We analyze the light transmission model and derive the corresponding relationship on different occlusion situations. Then we extend it to multi-occlusion so that it can handle more complex situations.
- We take different occlusions into consideration and propose corresponding algorithms to obtain unoccluded views precisely. A novel confidence analysis is designed in energy model to regularize depth maps better.
2. Light-field multi-occlusion model
In this section, we first analyze different occlusion cases before we introduce the proposed light-field multi-occlusion model. There is a consensus that edge pixels and edges are considered as candidate occlusion pixels and candidate occlusion boundaries respectively and many algorithms start from it. However, some points near edges are not occluded in the central view, but occluded in some other views as Fig. 2 shows. So different from previous work, we analyze different multi-occlusion situations from the perspective of light. It can also be extended to any shape occlusions and deal with different occlusion situations.
One occluder analysis:
For analyzing multi-occlusion better, we firstly take one occluder as an example as shown in Fig. 3. A point (x0, y0, F) located on the focal plane at the depth of F, and an occluder at Z1 plane has a straight edge, which contains the point (X1, Y1, Z1). (X0, Y0, F) is projected to Z1 plane at (X0, Y0, Z1) and the occluder point (X1, Y1, Z1) is projected to F plane at (x1, y1, F). Assume that the connection between two points is perpendicular to the edge, so the formed line is the distance between center point and the edge.
In Fig. 3(a), for the central view (u0, v0), the normal vector of the edge at Z1 plane can be expressed by (X1 − X0, Y1 −Y0) and the actual distance Dgt is modulus of the normal vector. When the light at Z1 plane is projected to f plane, these relationships can be obtained by projection principle.Fig. 3(b), the main lens is refocused on the depth f. Similarly, by the reserved light model these can be obtained.
Then considering the distance, we firstly assume that (X0, Y0) is the edge pixel in central view. Dgt, Dspatial, Dangular are zero. From the analysis above, it can be seen that the vector (x0 − x1, y0 − y1) of the edge in spatial domain is consistent with the vector (x0 − x1, y0 − y1) of the boundary between occluded views and unoccluded views in the angular domain as shown in Fig. 4(a). We call it boundary-consistency. Secondly, we assume that (X0, Y0) is near the edge pixel in central view. Dgt is non-zero, so Dspatial is different from Dangular because of the distance as shown in the right of Fig. 4(b).
Then we extend the model to multi-occlusion model as shown in Fig. 5. Take the case of two occluders, a point (X0, Y0, F) located on the focal plane at the depth of F. An occluder at Z1 plane has a straight edge, which contains the point (X1, Y1, Z1), and another occluder at Z2 plane has a oblique edge with point (X2, Y2, Z2). Let the point (X0, Y0) be the origin, any point (X, Y) to the origin can form a vector (X − X0, Y − Y0).
And for the central view (u0, v0), the two edges can be expressed by
As shown in Fig. 5(a), if the point is not occluded in the spatial domain, its vector must be between the vectors of two edges. Assume that (X1, Y1, Z1) is the upper boundary and (X2, Y2, Z2) is the lower boundary, the vector of an unoccluded point , so we can infer that it should satisfy the following constraints.
Then considering Fig. 5(b) in the angular domain, the main lens is refocused on depth F. Similarly, an unoccluded view (uuo, vuo) can form the vector , if the point (x0, y0, F) can be observed by the view, it also obeys the rule.
From the analysis above, when D1gt, D2gt are zero, Eq. (4) and Eq. (5) have the same constraints with one-to-one correspondence, so (u0, v0) and (x0, y0) share the same separation line. So for occlusion in central view, boundary-consistency exists in any occlusions. This helps obtaining the consistency region (unoccluded views) in the angular domain. when D1gt, D2gt are non-zero, the situation becomes very complex. Occlusions with varying depths have different offsets with the central point. Unfortunately, since depth estimation is unavailable, it is inapplicable to obtain the offset distance. So for occlusion in other views, an alternative approach is designed in our algorithm to obtain the consistency region approximately.
3. Initial depth estimation
The angular patch of each unoccluded pixel exhibits photo-consistency in all regions while the angular patch of each occluded pixel exhibits photo-consistency in part of regions. The key issue is selecting the consistency region (unoccluded views vs. occluded views) in the angular patch for each occluded pixel. In this section, we present how to select the consistency region in the angular domain, and introduce how to obtain an initial depth map by the consistency region based on different properties for the central view and the other views.
3.1. Anti-occlusion in the central view
3.1.1. Adaptive connected-domain selection
Edge detection is applied to the central view (pinhole) image to obtain an edge map. These edge pixels and edges are considered as candidate occlusion pixels and candidate occlusion boundaries respectively. For each edge pixel we extract an edge patch with pixel p being the center, and the patch is in the same size with the angular resolution of light field. Four-Connected Components Labeling algorithm [22,23] is applied on the edge patch to label the spatial patch and these pixels with the same label compose a region. So the patch is divided into several regions because of different labels.
In addition, Connected Components Labeling algorithm can not label these edge line pixels, so in order to label them, we design a method which fuses the color distance in Eq. (6) and the space distance in Eq. (7), which are shown as follows.
According to the boundary-consistency mentioned in section 2, angular patches have the same labels with edge patches. The region including the center pixel p is selected into the consistency region Ωp. Compared with state-of-the-art algorithms, our method can divide the patch into several regions adaptively and correctly for multi-occlusion. As an example, Fig. 6 shows the process result for a multi-occlusion point. Compared with other algorithms, the consistency region selected by our method is more accurate.
3.1.2. Depth estimation
We refocus the light field data to various depths, the light field 4D shearing is performed as follows .
In order to analyze regional pixel consistency, we calculate the first moment and the second moment of the region Ωp. For pixel p, Ωp is selected by adaptive connected-domain selection algorithm if p is an edge point, otherwise Ωp is the whole patch. Specific calculation methods are as follows. The first moment of error is performed by
3.2. Anti-occlusion in other views
For pixels around the occlusion edge, previous work [15,16] just dilated edges with a rough size, so depth edges are blurry because of the uncertain size. In our method, we design a filter to determine these points. Since consistency region of these points occluded in other views are imprecise and they should have larger cost than other points. We calculate the mean and variance of all points’ cost in the scene, which are used as parameters of a filter to identify these points and mark them.
For the marked points, instead of measuring the cost in consistency region which is affected by occlusion in other views, we search a subregion to avoid influence of occlude in other views. In our method, the angular patch (9 × 9) is divided into 9 subpatches (3 × 3). The total cost Cα(p) of each subpatch p is performed byFig. 7 shows the process results. The initial depth map in Fig. 7(b) is obtained by Eq. (12), and the areas occluded in other views (close to the edge) in the initial depth map obtain imprecise depth. In Fig. 7(c) these areas are detected by our method. And in Fig. 7(d), most of these areas are corrected to the accurate depth by our algorithm and sharper boundaries are well preserved.
4. Depth regularization
Given the initial depth obtained in Section 3, we present in this section the approach of depth refinement with global regularization. More specifically, we incorporate curvature based confidence analysis into the data cost, and the depth evaluation is more accurate.
4.1. Curvature based confidence analysis
In order to analyze the confidence, we select two pixels as an example. The two pixels are marked in red and yellow color in the initial depth map in Fig. 8(a). Their Depth-Cost (D-C) curves with the horizontal axis and the vertical axis being total cost and different depths respectively show in Fig. 8(b). The D-C curve of the red point is very different from that of the yellow point. Especially, near the minimum value curve of the red point has a sharper change than that of the yellow point and the minimum value of the red curve is smaller.
For each pixel, if the consistency region is precisely selected, it will exhibit photo-consistency and when reaching the lowest cost, the total cost is sure to be very small and more focused at its minimal. Therefore, these pixels have a higher confidence with more accurate initial depth. For example, the depth of red point is more reliable than that of yellow point. Based on this finding, we propose a method to estimate confidence of initial depth as followsEq. (17). At last, the confidence of pixel (x, y) is obtained by Figure 9 shows the result. Figure 9(c) shows the difference between ground truth and initial depth map where brighter pixels have larger difference. Figure 9(d) is the corresponding confidence map where darker pixels have larger confidence. The confidence map is consistent with the difference map mostly.
4.2. Final depth regularization
Finally, given the initial depth and the confidence cue, we refine the result with global regularization by using data cost and smoothness cost. More specifically, the initial depth map is regularized with a Markov Random Field (MRF) model and the problem is further casted to minimizing the energy as follows.
Based on the confidence, data cost is defined as a Gaussian function, as follows.
The smooth energy cost controls the smoothness constraint between two neighboring pixels, which is defined as24]  .
5. Experimental results
In order to evaluate the performance of the proposed method, we test it on both synthetic and real light field datasets. The synthetic datasets are from public datasets provided by 4D Light Field Dataset , and it contains light field images and ground truth for comparing. Using the stratified, test and training images of the 4D Light Field Dataset, we compare all results with recent algorithms (, , , , [14,15],  and [17,18]) in various aspects. Details of the resulting disparity maps of these existing methods can be found in . Then the real light field database is created by Stanford University . In experiments, depths of all methods are set to [0,100] for fair comparison and the given disparity range on each image is provided by the synthetic dataset or [−1,1] in the real dataset. Our proposed algorithm is implemented in MATLAB and VS2015 on a PC with a 3.2GHz CPU.
5.1. Evaluation on synthetic datasets
Table 1 shows the averaged evaluation metrics in general, stratified and photorealistic performance evaluation on the synthetic images from 4D Light Field Dataset . Lower scores indicate better performance for all metrics. It can be seen that our method outperforms these state-of-the-art algorithms in many aspects and shows a better comprehensive performance.
In these metrics, Mean Squared Error (MSE) is a more reasonable metric. So for better evaluation, we show more details on each synthetic dataset compared with these algorithms in Table 2 where our algorithm reports the lowest scores among all algorithms and gives more reliable depth results. In addition, the proposed algorithm consumes less time than most of these methods.
Figure 10 shows the detailed results of dino dataset in 4D Light Field Dataset. As shown in the figures, especially for the tiny hole highlighted with red boxes, only our method can obtain the accurate result in this complex occlusion case, while other algorithms produce blurs. As to the blue boxes and green boxes covering object edges, ,  and  over-smooth and miss it. ,  and  have some loss in line boundaries, and there are many blurs on boundaries of objects because of pixels occluded in other views.  obtains rough boundaries and misses complex occlusions, and our results can give sharper boundaries and better details.
Figure 11 shows the results for 4D Light Field Dataset . As shown in the figures, for the net part in the first row, the proposed method not only provides good details but also avoids being over-sharp. For the region around the statue in the second row, the region is occluded in other view.  and  just smooth it without handling it explicitly.  and  are blurry in these parts.  and  adopt a coarse method, so there are many blurs on boundaries of statue, these depths are incorrect and  is better than other algorithms but still over-smooth some parts. Our algorithm solves the occlusion problem, so the boundary around the statue is clear and precise. For the last two rows, our results also show the better performances. In short, our algorithm outperforms these algorithms in terms of multi-occluder occasions, complex details and occlusions in other views. In summary, no matter in the objective or the subjective perspective, the proposed algorithm outperforms other algorithms.
5.2. Evaluation on real datasets
Figure 12 compares results on real scenes database created by Stanford University, and these images are captured by Lytro Illum . It can be seen that our results still preserve details of the scene well and avoid being over-sharp, and these results are the same as that of synthetic datasets. Only our method well preserves the structure of the near plant and the net (first row), and the precise details of wooden net (third row). And only our method can capture the little objects. For example, the little plant on the bottom right corner can be well obtained in our method (second row). Moreover, for complex scenes (fourth row), our method also gives a more detailed depth map than other methods, and reproduces the thin structure of the branch without burrs, and keeps the details without expanding the boundaries (final row).
In this paper, a thoughtful depth estimation algorithm is proposed which is robust not only in single-occluder occasions but also in multi-occluder scenarios. we build a light-field model from the multi-occlusion situation and prove that boundaries between occluded views and unoccluded views in angular domain correspond to edges of the occluders in spatial domain. Based on the fact, an adaptive connected-domain selection algorithm is proposed to obtain more accurate consistency regions in angular domain for occlusion in the central view. Considering the occlusion in different views, we develop a subpatch approach for anti-occlusion in other views to keep sharper boundaries. A novel confidence analysis which considers the curvature factor is proposed to obtain more precise confidence value for better depth evaluation. The final depth estimation is optimized by using the MRF framework which fuse the confidence analysis and initial depth map. our algorithm outperforms other algorithms on synthetic datasets and real-world scenes, and can be used in a range of applications such as 3D reconstruction, VR and AR scene.
National Nature Science Foundation of China (61871437,61702384).
1. T. Georgiev, Z. Yu, and A. Lumsdaine, “Lytro camera technology: theory, algorithms, performance analysis,” Int. Soc. Opt. Eng. 8667, 1–10 (2013).
2. N. Ren, L. Marc, B. Mathieu, D. Gene, H. Mark, and H. Pat, “Light field photography with a hand-held plenoptic camera,” Comput. Sci. Tech. Rep. 2, 1–11 (2005).
3. H. Mark, “Focusing on everything,” IEEE Spectr. 49(5), 44–50 (2012). [CrossRef]
6. Y. Qin, X. Jin, Y. Chen, and Q. Dai, “Enhanced depth estimation for hand-held light field cameras,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, (IEEE, 2017), pp. 2032–2036.
7. C. Kim, H. Zimmer, Y. Pritch, S.-H. Alexander, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” Acm Trans. Graph. 32, 1–12 (2017). [CrossRef]
10. S. Wanner and B. Goldluecke, “Globally consistent depth labeling of 4d light fields,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2012), pp. 41–48.
11. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2013), pp. 673–680.
12. M. W. Tao, P. P. Srinivasan, J. Malik, and R. Ramamoorthi, “Depth from shading, defocus, and correspondence using light-field angular coherence,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 1940–1948.
13. H.-G. Jeon, J. Park, G. ChoE, J. Park, Y. Bok, Y.-W. Tai, and S. K. In, “Accurate depth map estimation from a lenslet light field camera,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 1547–1555.
14. T. C. Wang, A. A. Efros, and R. Ramamoorthi, “Occlusion-aware depth estimation using light-field cameras,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 3487–3495.
15. T. C. Wang, A. A. Efros, and R. Ramamoorthi, “Depth estimation with occlusion modeling using light-field cameras,” IEEE Trans. Pattern Anal. Mach. Intell. 38, 2170–2181 (2016). [CrossRef] [PubMed]
16. H. Zhu, Q. Wang, and J. Y. Yu, “Occlusion-model guided anti-occlusion depth estimation in light field,” IEEE J. Sel. Top. Signal Process. 11, 965–978 (2017). [CrossRef]
17. W. Williem and I. K. Park, “Robust light field depth estimation for noisy scene with occlusion,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), pp. 4396–4404.
18. W. Williem, I. K. Park, and K. M. Lee, “Robust light field depth estimation using occlusion-noise aware data costs,” IEEE Trans. Pattern Anal. Mach. Intell. 40, 2484–2497 (2018). [CrossRef]
19. T. Ryu, B. Lee, and S. Lee, “Mutual constraint using partial occlusion artifact removal for computational integral imaging reconstruction,” Appl. Opt. 54, 4147–4153 (2015). [CrossRef]
21. S. Xie, P. Wang, X. Sang, Z. Chen, N. Guo, B. Yan, K. Wang, and C. Yu, “Profile preferentially partial occlusion removal for three-dimensional integral imaging,” Opt. Express 24, 23519–23530 (2016). [CrossRef] [PubMed]
22. A. L. Dulmage and N. S. Mendelsohn, “Coverings of bipartite graphs,” Can. J. Math. 10, 516–534 (1958). [CrossRef]
23. A. Pothen and C.-J. Fan, “Computing the block triangular form of a sparse matrix,” ACM Trans. Math. Softw. 16, 303–324 (1990). [CrossRef]
24. Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE Trans. Pattern Anal. Mach. Intell. 26, 1124–1137 (2004). [CrossRef]
25. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001). [CrossRef]
27. K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke, “A dataset and evaluation methodology for depth estimation on 4d light fields,” in Proceedings of Asian Conference on Computer Vision, (Springer, 2016), pp. 19–34.
28. O. Johannsen, A. Sulc, and B. Goldluecke, “What sparse light field coding reveals about scene structure,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), pp. 3262–3270.
29. L. Si and Q. Wang, “Dense depth-map estimation and geometry inference from light fields via global optimization,” in Proceedings of Asian Conference on Computer Vision, (Springer, 2016), pp. 83–98.
30. S. Zhang, H. Sheng, C. Li, J. Zhang, and X. Zhang, “Robust depth estimation for light field via spinning parallelogram operator,” Comput. Vis. Image Underst. 145, 148–159 (2016). [CrossRef]
31. A. S. Raj, M. Lowney, and R. Shah, “Light-field database creation and depth estimation,” Tech. Rep., Department of Computer Science, Stanford University (2016).