Passive light field imaging generally uses depth cues that depend on the image structure to perform depth estimation, causing robustness and accuracy problems in complex scenes. In this study, the commonly used depth cues, defocus and correspondence, were analyzed by using phase encoding instead of the image structure. The defocus cue obtained by spatial variance is insensitive to the global spatial monotonicity of the phase-encoded field. In contrast, the correspondence cue is sensitive to the angular variance of the phase-encoded field, and the correspondence responses across the depth range have single-peak distributions. Based on this analysis, a novel active light field depth estimation method is proposed by directly using the correspondence cue in the structured light field to search for non-ambiguous depths, and thus no optimization is required. Furthermore, the angular variance can be weighted to reduce the depth estimation uncertainty according to the phase encoding information. The depth estimation of an experimental scene with rich colors demonstrated that the proposed method could distinguish different depth regions in each color segment more clearly, and was substantially improved in terms of phase consistency compared to the passive method, thus verifying its robustness and accuracy.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Light field imaging is an advanced technology for detecting four-dimensional (4D) optical signals that simultaneously contain position and direction information of light rays [1–3]. Various techniques and devices, for example, camera arrays  and plenoptic cameras with coding aperture , built-in masks , or microlens arrays , have been used to detect light fields. By using computational approaches to post-process the recorded light field data, numerous imaging applications, such as multiview imaging, digital refocusing, and full focus imaging, can be implemented. In addition, the strong coupling of the angular information with the scene depth information enables depth sensing and 3D reconstruction.
Disparity and blur are commonly used for light field depth estimation [8–15]. The former extracts at least two images from various viewpoints from the light field data and then calculates disparity maps. The latter uses the light field data to focus on different depths and obtain focal stacks, and then estimates the blurring kernel (similar to depth from defocus) or the focusing degree (similar to depth from focus). The disparity and blur in a light field can be regarded as two performance measures in scene depth variance. Recently, Tao et al. combined correspondence and defocus cues obtained by computing a light field epipoplar image along different directions to estimate the scene depth . Hahne et al. traced and intersected a pair of light rays to predict the distance to a refocused object plane . Chen et al. derived a geometric optical model based on on-axis point light sources to measure the distances of object planes . These techniques, which depend on the image structure, such as color, texture, and shading, to provide matching features for depth cues or ray tracing models, can be classified as passive methods. However, they may suffer from lack of robustness and accuracy in complex scenes with occlusion, discontinuous depth, repeating texture, and diverse illumination. Recently, Tao et al. presented an active depth estimation method to analyze the relationship between the defocus degree and the modulation of the captured fringe images for a camera array .
In this study, the method by Tao et al.  was used to analyze both the defocus and correspondence depth cues from a different perspective by means of structured illumination for plenoptic cameras. In a recorded light field under structured illumination [17,18], i.e., structured light field (SLF), light rays contain additional information on phase encoding. Thus, a phase-encoded field (PEF) can be retrieved to construct matching features independent of the image structure. Through experimental analysis, it was demonstrated that the defocus depth cue obtained by spatial variance is insensitive to the global spatial monotonicity of the PEF. In contrast, the correspondence depth cue becomes sensitive to the angular variance of the PEF. Accordingly, a novel active light field depth estimation method was proposed by using only the correspondence depth cue. The angular variance of the PEF across the depth range exhibits a single-peak distributional trend that can be used to search for a non-ambiguous depth, so that the global propagation for optimal depth estimation is no longer required. Furthermore, the angular variance of the PEF was weighted according to the phase encoding information to reduce the uncertainty of the depth estimation. As a result, the proposed method achieved robust and accurate light field depth estimation.
2.1. Light field representation
The term light field describes the radiometric properties of light rays in space . In 1991, Adelson and Bergen introduced a seven-dimensional plenoptic function to represent spatial light rays . In 1996, Levoy and Hanrahan  and Steven et al.  reduced the dimension of the plenoptic function to 4D light fields. Using the intersection of a ray with two parallel planes, a 4D light field can be parameterized as where denotes the recorded radiance intensity, and and denote spatial and angular coordinates, respectively. The recorded light field can be digitally resampled on a desired image plane by performing a 4D shear
2.2. Passive light field depth estimation
The passive methods use matching features provided by the image structure to obtain depth cues for light field depth estimation. In this section, the method by Tao et al. is used to briefly introduce two generally used depth cues: defocus and correspondence, which will be further analyzed from a different perspective of phase encoding.
A recorded light field contains sufficient angular information to perform refocusing. By shearing and then integrating the light field across the angular dimensions, a refocused image
Furthermore, the angular information can be used to render perspective images from different viewpoints. There are correspondences among the multiview images, which are present in the angular variance of the sheared light field. The angular variance relative to the refocused image is
The defocus and correspondence cues may not be used to search for a non-ambiguous depth owing to the influence of the image structure. As mentioned in , the defocus cue performs better in repeating texture and noise, whereas the correspondence cue is robust in bright points and features. The two types of depth cues as well as their confidences are combined to perform a global propagation for optimal depth estimation. However, the optimal estimation still cannot ensure correct depths when the scene is complex with occlusion, discontinuous depth, repeating texture, and diverse illumination. To this end, in this study, phase encoding instead of the image structure is used to construct matching features for robust and accurate light field depth estimation.
Structured illumination is introduced into the light field imaging to record a SLF. In the SLF, each light ray carries additional phase information that can be extracted by fringe analysis [23,24]. The extracted phase reflects a spatial distribution of the encoding information in the SLF, which is called PEF and denoted by The PEF can be used to construct matching features for depth cues. In this section, the defocus and correspondence cues in the PEF are analyzed, and a novel active light field depth estimation method is accordingly proposed.
To demonstrate and analyze the defocus and correspondence cues in the PEF, an experimental setup consisting of a plenoptic camera (first-generation Lytro camera) and a DLP projector (Acer K132) was established. The resolution of the projector is 1280 ☓ 800 pixels. The recorded light field images can be decoded with spatial and angular resolutions of 378 ☓ 379 pixels and 11 ☓ 11 pixels, respectively. Because depth estimation is performed by using the depth cues associated with the light field structure, the projector and plenoptic camera do not need to be calibrated. Two disturbed magic cubes with rich colors on each side were used as an experimental scene. Figure 1(a) shows a light field image of the scene under uniform illumination. An image part marked by a red box is enlarged and shown on the upper right of the figure to demonstrate the light field image structure arranged by multiple microlens subimages. The color of the microlens subimages in any grid of the magic cubes is uniform.
In the experiment, vertical phase-shifting fringe patterns were projected to the measured scene. Figure 1(b) shows a light field image of the scene under structured illumination. The PEF was retrieved from the recorded light field images by using the phase-shifting algorithm, as shown in Fig. 1(c). From the enlarged part on the upper right of Fig. 1(c), it can be observed that the phase changes with not only the positions of the microlenses, but also the directions inside each microlens. Specifically, the phase is globally monotonic in terms of the spatial coordinates of the microlenses and locally monotonic in terms of the angular coordinates inside each microlens. In other words, the PEF is globally spatial monotonic and locally angular monotonic. Compared with the image structure under uniform illumination, the phase encoding in structured illumination can be used to construct more robust and accurate matching features for depth cues.
3.1. Defocus cue in PEF
Because shares the same light field structure, Eqs. (1)–(5) can also be applied to the PEF by replacing with To explore the defocus cue in the PEF, three phase maps refocused on different image planes with depths and were obtained, as shown in Figs. 2(a)–2(c), respectively. The refocused phase maps were normalized to be similar even if the corresponding depths were quite different. Figure 2(d) shows the absolute difference between Figs. 2(a) and 2(b), and its histogram is shown in Fig. 2(e). The difference between the two refocused phase maps in the objective area is close to zero. This implies that the defocus cue in the PEF cannot sense depth variance.
The detail computation of the defocus response is as follows. Figure 3 shows the sheared PEF, in three adjacent spatial coordinates and It can be observed that the distribution of the sheared PEF is monotonic in spatial dimensions, regardless of the depth. The spatial window size was set as 3 ☓ 3 pixels in the experiment. Then the defocus response in the sheared PEF was computed, and the relevant data are listed in Table 1, which shows that the spatial variances of the sheared PEF in different depths are all quite small. This implies that the defocus cue is insensitive to the global spatial monotonicity of the PEF. In addition, the distributions of the normalized defocus response across the depth range are plotted in Fig. 4(a). Although the spatial coordinates are adjacent, their distribution curves are quite different, resulting in significantly discrepant estimated depths, as indicated by the arrows in Fig. 4(a). Consequently, in the PEF, the defocus cue is not suitable for light field depth estimation.
3.2. Correspondence cue in PEF
In Fig. 3, it can be seen that the sheared PEF changes with the angular dimension and the angular variance varies obviously with depth. This implies that the PEF contains rich correspondence cues for light field depth estimation. To illustrate this further, two sub-aperture phase maps were extracted from the PEF, i.e., , with specific angular coordinates and as shown in Figs. 5(a) and 5(b), respectively. Two segments related to the same cross section of the two sub-aperture phase maps, marked by red and blue lines in Figs. 5(a) and 5(b), respectively, are plotted in Fig. 5(c), where the enlarged diagram clearly shows that the two segments have an obvious separation. This is due to the disparity between the two viewpoints associated with the two different angular coordinates.
The correspondence response in the PEF was computed, and Fig. 4(b) shows the distributions of the normalized correspondence response across the depth range in the three adjacent spatial coordinates. All the distributions contain a single peak value, and the estimated depths in the three adjacent spatial coordinates are close, as shown by the arrows in the enlarged diagram in Fig. 4(b). It indicates that the correspondence cue is more sensitive to the angular variance of the PEF and thus is suitable for light field depth estimation. However, the estimated depth in which should be between that in and was slightly inconsistent with the actual scene. The uncertainty of depth estimation using the correspondence cue in PEF is non-negligible and will be discussed in the next section.
3.3. Active light field depth estimation
According to the above analysis, the defocus cue in the PEF is less sensitive to the spatial variance, and this may lead to depth estimation failure; by contrast, the correspondence cue is more sensitive to the angular variance. Here, a novel active light field depth estimation method is proposed by using only the correspondence cue in the PEF. Furthermore, as the correspondence response across the depth range exhibits an essentially single-peak distribution (and hence a non-ambiguous depth may be obtained), the proposed method no longer requires estimated depth optimization.
The proposed method involves several steps, as summarized by the flow chart in Fig. 6. First, an SLF under structured illumination is recorded to retrieve a PEF. Subsequently, the correspondence response in the PEF across the depth range is computed. Finally, the depths are obtained from the correspondence response.
Thereby, active light field depth estimation can be achieved, as demonstrated by the result in Fig. 4(b). However, it should be noticed that although the correspondence response in the PEF has a peak value, the curvature of the area near the peak value is relatively small, which may lead to large uncertainty in depth estimation. The maximum difference of the estimated depths among the three adjacent spatial coordinates is 0.0070, which is large compared to the depth range.
To address this problem, the local angular monotonicity of the PEF was used to weight the angular variance. It can be observed from Fig. 3 that inside a microlens the phases at other angular coordinates vary relative to that at central angular coordinates when the PEF is sheared with an estimated depth far from the correct one. The phase varies horizontally because of horizontal fringe projection in the experiment, and the phase difference increases at larger horizontal angular coordinates. Thus, in the PEF, the horizontal angular coordinate can be used to weight the angular variance relative to the refocused phase map as follows:
Using Eqs. (6) and (5), the correspondence response based on the weighted angular variance can be computed. Figure 7 shows a comparison of the normalized correspondence responses based on the weighted and unweighted angular variances, both in the same spatial coordinate The curvatures of the two distributions are different. The depth range corresponding to 1/1000 of the correspondence response is approximately regarded as the uncertainty of depth estimation, as demonstrated in the enlarged diagram on the right side of Fig. 7. The uncertainty of depth estimation with the weighted angular variance is 0.0079, whereas that with the unweighted angular variance is 0.0208. This indicates that depth estimation with the weighted angular variance has better performance than that with the unweighted angular variance. Therefore, the weighted angular variance is adopted to compute the correspondence response in the proposed method, as shown by the dotted line box in Fig. 7.
4.1. Passive light field depth estimation
For comparison, the passive method described in Section 2 was used to compute the defocus and correspondence responses, with the light field image under uniform illumination (Fig. 1(a)) as an input. The distributions of the normalized defocus and correspondence responses across the depth range in the three adjacent spatial coordinates are shown in Figs. 4(c) and 4(d), respectively. The depths in and obtained from the respective responses are the same, but the results of the defocus response are quite different from those of the correspondence response. In addition, the depths in obtained from the two responses are both far from those in and The distribution curves in Figs. 4(c) and 4(d) have peak values near When the depth is close to 1, the shear is approximated as In this situation, the peak values near may yield ambiguous results, as demonstrated by the green arrow in Fig. 4(c). By comparing Figs. 4(a) and 4(b) with 4(c) and 4(d), respectively, one can see that the active method can suppress the peak values near although the defocus response in Fig. 4(a) is not suitable for light field depth estimation.
The depth maps obtained by using the defocus and correspondence cues in the passive method are shown in Figs. 8(a) and 8(b), respectively. It can be seen that the depth map related to the correspondence cue exhibits random distribution, not only in the background but also in the object areas. This is because the correspondence cue is sensitive to noise in the background and repeating textures on the surface of the measured objects. By contrast, the defocus cue can suppress noise; however, the estimated depths in the background and the object areas are essentially the same and cannot be distinguished. As the color in any grid of the magic cubes is uniform, these object areas may be regarded as the background, resulting in incorrect estimated depths.
As discussed above, the passive method may result in depth estimation inconsistencies between not only depth cues but also adjacent spatial coordinates. Accordingly, propagation of the depth estimation is required (referring to  for the detail program). Figure 8(c) shows the optimization result by combining the two types of depth cues. The cross sections of the depth maps associated with Figs. 8(a)–8(c) are shown in Fig. 8(d). The optimized depth map appears better than the two others; however, it still cannot reflect the depth variation of the measured scene. The passive method, which depends on the image structure, may suffer from robustness and accuracy problems, although it combines different depth cues.
4.2. Active light field depth estimation
As a comparison with the result of passive light field depth estimation, the light field image under structured illumination (Fig. 1(b)), i.e., the SLF, was taken as an input to perform light field depth estimation again. Since fringe projection introduced fringe-pattern texture to the object areas, the depth maps estimated from the two depth cues, as shown in Figs. 9(a) and 9(b), were better than that shown in Figs. 8(a) and 8(b), especially for the depth map related to the correspondence cue. But the estimated depths were still somewhat random, as shown by the blue-marked points in Fig. 9(d). A global propagation was thus performed to obtain an optimized depth map, as shown in Fig. 9(c). The optimized depth map can reflect the depth variation of the measured scene; however, it was quite smoothed, especially at the edge and discontinues depth.
Then, the proposed method was used to estimate the scene depth, using the correspondence cue with weighted angular variance. Figure 10 shows the distributions of the normalized correspondence response across the depth range in the three adjacent spatial coordinates. The depth searching result indicates that the three adjacent spatial points have the same depth. This is more consistent than the result by the unweighted angular variance. Finally, the scene depth was estimated, as shown in Fig. 11(a). The depths in the object areas encoded by the phase information could be correctly obtained, although the estimated depths in the black background were still randomly distributed owing to weak phase modulation. A comparison of the proposed method and the passive method is shown in Fig. 11(b) with the cross sections of the depth maps associated with Figs. 11(a), 9(c), and 8(c). The depth map obtained by active light field depth estimation reflects the spatial structure of the measured scene more clearly than the other two.
4.3. Comparison and analysis
The difference between the active and passive methods was further analyzed. The measured scene was divided into different segments of the same color, as shown in Fig. 12(a). Furthermore, the depth range was divided into four regions, as indicated by the sequence number in Figs. 11(b) and 12(a). If the depth map is correctly estimated, it can clearly distinguish the depth regions in each color segment. The histograms of the depth map of each color segment, corresponding to the active and passive methods, are shown in Figs. 12(b) and 12(c), respectively. It can be observed that the active method using phase encoding could estimate the correct depth map and clearly distinguished different depth regions in each color segment, whereas the passive method failed.
Consequently, the difference between the active and passive light field depth estimation can be quantified by referring to the phase encoding information. That is, the depth maps obtained by the active and passive methods can be used to resample the PEF, since the phase consistency in the resampled PEF can be checked for evaluating the depth estimation accuracy. Note that the passive method itself has nothing to do with the phase encoding. First, a single viewpoint was considered. A sub-aperture phase map with respect to the angular coordinates was extracted from the resampled PEF, i.e., which was compared with the phase map from the central viewpoint, i.e., Figures 13(a) and 13(b) show the absolute phase differences, i.e., corresponding to the active and passive methods. It should be noticed that some overly large values are not shown for clarity. It can be seen that the two phase maps corresponding to the active method are very close, although there are still large phase differences. By contrast, the difference between the two phase maps corresponding to the passive method is significant. Then, the angular variance of the resampled PEF, i.e., was computed by using Eq. (4), which considered all viewpoints. Figures 13(c) and 13(d) show the angular variances corresponding to the active and passive methods, respectively. Obviously, the angular variance corresponding to the passive method is larger than that corresponding to the active method. The relevant data related to the phase difference and angular variance are listed in Table 2, where it can be seen that the active method yielded a sevenfold improvement in phase consistency compared to the passive method. The above experimental demonstration and analysis thus verified that the proposed active method is suitable for robust and accurate light field depth estimation.
Finally, we used another plenoptic camera (Lytro Illum) and other measured scenes to compare the active and passive methods. The experimental results are shown in Fig. 14. The use of Lytro Illum benefits the passive light field depth estimation. However, the passively estimated depths were still rippled, as shown by cross sections of the depth maps in third column in Fig. 14. In comparison, the active method obtained more accurate depth maps.
Passive light field depth estimation based on the image structure may result in inconsistencies between not only the defocus and correspondence cues but also the adjacent spatial coordinates, and thus ambiguous depths may be obtained. Accordingly, phase encoding instead of the image structure was employed to construct matching features for the depth cues. However, the defocus cue obtained by spatial variance is insensitive to the global spatial monotonicity of the PEF. Consequently, the correspondence cue, which is sensitive to the angular variance of the PEF, was directly used to obtain non-ambiguous depths from the single-peak distribution of the correspondence response across the depth range, without requiring global propagation of the depth estimation. In addition, the angular variance was weighted to reduce the depth estimation uncertainty. The proposed method was verified to be suitable for robust and accurate light field depth estimation through experimental analysis and evaluation.
Sino-German Cooperation Group (GZ1391); National Natural Science Foundation of China (11804231, 61875137); Natural Science Foundation of Guangdong Province (2018A030313831); China Postdoctoral Science Foundation (2017M622767).
1. M. Levoy, “Light fields and computational imaging,” Computer 39(8), 46–55 (2006). [CrossRef]
2. I. Ihrke, J. Restrepo, and L. Mignard-Debise, “Principles of light field imaging briefly revisiting 25 years of research,” IEEE Signal Procss. Mag. 33(5), 59–69 (2016).
3. G. Wu, B. Masia, A. Jarabo, Y. Zhang, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field image processing: an overview,” IEEE J. Sel. Top. Signal Process. 11(7), 926–954 (2017). [CrossRef]
4. B. Wilburn, N. Joshi, V. Vaish, E. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM Trans. Graph. 24(3), 765–776 (2005). [CrossRef]
5. A. Levin, R. Fergus, F. Durand, and W. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Trans. Graph. 26(3), 70 (2007). [CrossRef]
6. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” ACM Trans. Graph. 26(3), 69 (2007). [CrossRef]
7. R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanford Technical Report CSTR (2005), pp. 1–11.
8. S. Wanner and B. Goldluecke, “Globally consistent depth labeling of 4D light fields,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 41–48. [CrossRef]
9. C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. 32(4), 73 (2013). [CrossRef]
10. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2013), pp. 673–680. [CrossRef]
11. H. Lin, C. Chen, S. B. Kang, and J. Yu, “Depth recovery from light field using focal stack symmetry,” in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2015), pp. 3451–3459. [CrossRef]
14. Y. Zhang, H. Lv, Y. Liu, H. Wang, X. Wang, Q. Huang, X. Xiang, and Q. Dai, “Light-field depth estimation via epipolar plane image analysis and locally linear embedding,” IEEE Trans. Circ. Syst. Video Tech. 27(4), 739–747 (2017). [CrossRef]
15. I. K. P. Williem, I. K. Park, and K. M. Lee, “Robust light field depth estimation using occlusion-noise aware data costs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(10), 2484–2497 (2018). [CrossRef] [PubMed]
17. J. Salvi, S. Fernandez, T. Pribanic, and X. Llado, “A state of the art in structured light patterns for surface profilometry,” Pattern Recognit. 43(8), 2666–2680 (2010). [CrossRef]
18. J. Geng, “Structured-light 3D surface imaging: a tutorial,” Adv. Opt. Photonics 3(2), 128–160 (2011). [CrossRef]
19. A. Gershun, “The light field,” Translated by P. Moon and, G. Timoshenko, J. Math. Phys. 18(1–4), 51–151 (1936).
20. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational models of visual processing (MIT, 1991), pp. 3–20.
21. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of ACM SIGGRAPH (ACM, 1996), pp. 31–42.
22. S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, “The lumigraph,” in Proceedings of ACM SIGGRAPH (ACM, 1996), pp. 43–54.
23. X. Peng, Z. Yang, and H. Niu, “Multi-resolution reconstruction of 3-D image with modified temporal unwrapping algorithm,” Opt. Commun. 224(1–3), 35–44 (2003). [CrossRef]
24. Z. Cai, X. Liu, H. Jiang, D. He, X. Peng, S. Huang, and Z. Zhang, “Flexible phase error compensation based on Hilbert transform in phase shifting profilometry,” Opt. Express 23(19), 25171–25181 (2015). [CrossRef] [PubMed]