Fast depth estimation with cost minimization for structured light field

Sen Xiang; Sen Xiang; Li Liu; Li Liu; Huiping Deng; Huiping Deng; Jin Wu; Jin Wu; You Yang; You Yang; Li Yu; Li Yu

doi:10.1364/OE.434548

1. Introduction

The 4D light field (LF), a novel representation that describes scenes, records light rays not only with the intensity but also the directions. Benefit from its high-dimensional data, light filed facilitates a range of applications such as view rendering [1,2], super-resolution [3,4], digital refocusing [5,6] and 3D reconstruction [7,8]. Among these applications, depth estimation is the fundamental and critical task.

With 4D LF data, depth can be estimated with epipolar-plane-images (EPIs) [9–11] or digital refocusing images [12–14]. Nevertheless, both approaches face two challenges. On the one hand, the lack of unique features, such as in texture-less and texture-repeated regions, yields ambiguities in the depth. On the other hand, a large number of candidates are tested and quite heavy computation is needed.

To cope with the two challenges, in recent years, structured light field that incorporates light coding to light field is attracting increasing attentions. In structured light field, a projector illuminates the scene with coded patterns to enrich scene features, and thus the depth quality can be improved. For example, Cai et al. [15] proposed the first structured light field system with a projector and a Lytro light field camera, where every light ray is calibrated independently for depth measurement. After that, they combined structured light with light field refocusing [16,17] to estimate depth. However, these methods still test the depth candidates one-by-one and are in low efficiency. Especially that refocusing requires shearing all viewpoints and thus heavy computation burden is introduced.

In this paper, we propose an efficient and accurate disparity/depth estimation scheme with phase-coding light field (PLF). Different from conventional light field, PLF introduces fringe patterns and phase codes that benefit accurate depth estimation. With PLF data, phase EPIs are first generated. After that, candidate lines are tested in phase EPI with a cost function. More specifically, defined with the pixel-wise phase code, the cost is convex about the slope of the candidate line. Finally, a global optimization algorithm: beetle antennae search (BAS) algorithm [18] is utilized to find the optimal candidate line and depth can be obtained. In such a manner, depth values are obtained accurately and efficiently. The contribution of this paper is a novel depth estimation framework with cost minimization for phase-coding light field, and the framework has the following technique novelties. First of all, a cost function defined by the weighted phase variance in candidate lines is proposed, and the convexity of the cost is proved. Secondly, a global optimization method, e.g. beetle antennae search (BAS) algorithm, is incorporated to locate the optimal depth candidate, which greatly improves the efficiency of depth estimation. Last but not least, a re-sampling and local phase refinement method is proposed to generate sub-pixel phase samples and eliminate the phase wrapping effect, which improves depth accuracy.

2. Related work

Light field depth estimation is a fundamental task in a range of applications. However, conventional light field data have textureless or texture-repeated regions, making it difficult to produce accurate and reliable depth. In recent years, structured light that provides rich features has been introduced to light field, and facilitated new researches. Based on the utilized depth cues, these new methods can be categorized as traditional matching-based approach and LF-based approach.

The matching-based approach follows the pipeline of stereo or conventional structured light. Cai et al. [15,19] calibrated each ray in the structured light field system and obtained the phase-depth matching model. With this model, phase values can be converted directly to the world coordinate. Zhang et al. [20] followed the traditional structured light depth measuring pipeline, and the wrapped phase light field is utilized to derive fringe order and achieve phase unwrapping. Zhou et al. [21,22] proposed a coarse-to-fine framework to estimate depth with phase light field. A rough estimation is first made based on the phase consistency of EPI, after that the scene point is refined with the consistency between the projector and the plenoptic camera. Wang et al. [23] incorporated an industrial camera to a structured light field system, and the angular patch of the light field is utilized to help phase unwrapping of the industrial camera. In summary, these methods utilize the rich features of coded light to help depth estimation, but the property of the 4D phase LF is not utilized.

In contrast, the LF-based approach is performed under the framework of conventional LF depth estimation, where EPI and refocusing depth cues are often utilized. Cai et al. [16] proposed to refocus the PLF to estimate depth, where phase unwrapping is performed in angular domain. After that, in [24], they further proposed to estimated depth with original wrapped phase light field. In [17], refocusing is performed on a single sinusoidal pattern to predict depth for each pixel. In our preliminary work [25], we proposed to find the optimal line slope in phase EPI with a newly proposed ‘slope descent’ algorithm, and depth can be obtained. However, these LF-based depth estimation methods suffer either heavy computation load or local minimum problem, and cannot produce accurate depth in high efficiency.

3. Principle

A phase-coding light field (PLF) system consists of a projector and a light filed camera. As shown in Fig. 1(a), a set of fringe patterns are projected to the scene and recorded by the light field camera. After that, pixel-wise phase is extracted with phase-shifting [26,27] or Fourier transform analysis [28,29] for each viewpoint and thus a 4D PLF can be formed. As shown in Fig. 1(b), a PLF is defined in 4D space $(u, v, s, t)$, where $(s,t)$ and $(u,v)$ are the spatial and angular coordinates, respectively. By fixing one spatial axis and one angular axis, the PLF is reduced to 2D space, which forms the phase epipolar plane image (phase EPI). One 4D PLF produces two phase EPIs, a horizontal one in $u-s$ space and a vertical one in $v-t$ space. Therefore, the crucial task is to find the optimal line in the phase-EPI. Without loss of generality, we take horizontal EPI as an example to illustrate the proposed method, and vertical EPI can be proposed in a similar way. Please also note that depth and disparity can easily convert to each other, and thus we do not specifically distinguish them in this paper.

Fig. 1. Sketch of a the PLF system. (a) System schematic. (b) Phase-EPI from phase coding light filed.

Download Full Size | PDF

As shown in Fig. 2(a), given a pixel $p$, the candidate lines $L$ only depends on the angle $\theta$. Pixels on the optimal line are projections of the same scene point, and they should have similar even identical phase codes. Therefore, the phase consistency of a candidate line can be utilized to find the optimal line. In Fig. 2(a), the cost of a candidate line $L(p,\theta )$ is defined as

(1)$$E_{L(p,\theta)}=\sum_{u={-}U}^{U} \left(\varphi_{u}-\bar{\varphi}\right)^{2}.$$

Here, $u \in [-U, U]$ is the range of the angular axis in the phase EPI. $\varphi _u$ is the phase of the intersection between $L(p,\theta )$ and the $u^{th}$ row of the phase EPI, and $\bar {\varphi }$ is the average of $\varphi _{u}$. For the optimal line with angle $\theta ^*$, all samples of $\varphi _u$ are similar even identical, so $E_{L(p,\theta )}$ reaches the minimum:

(2)$$E_{L(p,\theta^*)} = \textrm{min}\left(E_{L(p,\theta)} \right).$$

Thanks to the pixel-wise phase code, the global minimum cost is unique and can be found.

Fig. 2. (a) Phase EPI and candidate lines. (b) $\varphi -u$ curve with different $\theta$ . (c)(d) $k(\theta )$ and $k^2(\theta )$ curves.

Download Full Size | PDF

Without loss of generality, we take $L_1$, $L_2$, $L_3$ in Fig. 2 (a) as examples to analyze the property of phase EPI, and their $\varphi -u$ curves are shown in Fig. 2(b). It can be noticed that $\varphi$ is a linear function about $u$ and the slope varies with the angle $\theta$. Therefore, the $\varphi -u$ curve is formulated as

(3)$${\varphi_u } = k(\theta )({u} - u_0^{}) + {\varphi _\textrm{{0}}},$$

where $k(\theta )$ is the slope, and $(u_0, \varphi _0)$ is the coordinate of central pixel $p$ in Fig. 2(a). With this linear model, the average phase on $L(p,\theta )$ can be derived as

(4)$$\overline{\varphi }= \frac{1}{2U+1}\sum_{{u}{{ ={-}U}}}^U \left[{k(\theta )} ({u} - {u_0}) + {\varphi _0}\right].$$

Similarly, the cost $E_{L(p,\theta )}$ can be rewritten as:

(5)$${E_{L(p,\theta )}} = \sum_{u ={-} U}^U {\left[k(\theta)(u-u_0) + \varphi_0 - \overline{\varphi }\right]^2}.$$

Without loss of generality, $u$ is symmetrically defined in the range of $[-U, U]$, and $u_0$, the angular coordinate of the center pixel, equals to 0. Therefore, $\overline {\varphi }$ is simplified to

(6)$$\overline{\varphi }= {\varphi _0},$$

and ${E_{L(p,\theta )}}$ can be further derived as

(7)$$\begin{aligned}{E_{L(p,\theta )}} &= \sum_{u ={-} U}^U {\left[k(\theta)u + \varphi_0 - \varphi_0\right]^2}\\&= {k^2}(\theta )\sum_{u ={-} U}^U {{u^2}} . \end{aligned}$$

For a given phase EPI, the summation term is fixed and positive. Therefore, $E_{L(p,\theta )}$ depends on $k^2(\theta )$ and is a convex function. In Figs. 2(c) and (d), we plot the curves of $k(\theta )$ and $k^2(\theta )$. It can be observed that, $k(\theta )$ monotonically decreases and the zero point corresponds to the minimum of $k^2(\theta )$. This indicates that when all samples on the candidate line have identical phases, the cost reaches zero, which is the global minimum. In practice, although it is difficult to reach zero-cost due to noise and discrete sampling, the convex property of ${E_{L(p,\theta )}}$ still exists, and optimization algorithms can be utilized to find the minimum.

4. Proposed method

4.1 Acquisition of PLF and phase EPI

To acquire accurate depth, the very fundamental task is to acquire phase light field. We project a set of sinusoidal fringe patterns and after that the phases are extracted with phase-shifting. According to the basic principle of 3D vision, it is required that disparity must exist along the baseline direction, otherwise 3D information cannot be obtained. In PLF, both horizontal and vertical EPIs can be utilized, and thus the phase must be modulated both horizontally and vertically. To be specific, $N$ sinusoidal patterns are projected and the $i^{th}$ pattern is defined as

(8)$$I_{i}(x, y)=A+B \cos \left[\varphi(x, y)-i \frac{2 \pi}{N}\right],$$

where the phase

(9)$$\varphi(x, y)=2 \pi\left(\frac{x+y}{T}\right)$$

is modulated by both ‘x’ and ‘y’ coordinates. These patterns are captured by a light field camera, and for each view the wrapped phase is obtained with phase-shifting algorithm

(10)$$\varphi (x,y) = \arctan \left[ {\frac{{\sum\limits_{i = 1}^N {{I_i}(x,y)\sin (i\frac{{2\pi }}{N})} }}{{\sum\limits_{i = 1}^N {{I_i}(x,y)\cos (i\frac{{2\pi }}{N})} }}} \right].$$

Each view in the captured light field produces a phase map, and thus a 4D phase light field (PLF) and two phase EPIs are formed as illustrated in Fig. 1(b).

4.2 Phase re-sampling and refinement

As aforementioned, with an EPI, disparity is produced by testing a set of candidate lines with varying slopes. However, limited by the low spatial resolution of the light field camera, the sampled pixels may keep unchanged when rotating the candidate line in EPI. Take Fig. 3 as an example, the blue dots are pixels in the EPI, and two candidate lines $L(p,\theta _1)$ and $L(p,\theta _2)$ are investigated. The samples of $L(p,\theta _1)$ and $L(p,\theta _2)$ in the $u_i^{th}$ row are $(s_1, u_i)$ and $(s_2, u_i)$. It can be observed that both samples are at sub-pixel locations without valid phase code. Nevertheless, rounding the coordinates to the nearest integer pixel, $p_1$ in the example, is not a good choice because $L(p,\theta _1)$ and $L(p,\theta _2)$ will share the same phase codes of $p_1$. As a result, the disparity accuracy is limited to pixel level and cannot be higher.

Fig. 3. Sub-pixel sampling method for the linear operator.

Download Full Size | PDF

To solve this problem, we propose to re-sample the phase with interpolation. As shown in Fig. 3, instead of rounding the sample to the nearest integer pixel, we re-sample the phase with Eq. (11)

(11)$$\varphi(s_i, u_i) = \alpha \varphi_l + (1-\alpha)\varphi_r,$$

where $\varphi _l$ and $\varphi _r$ are the phase values of the nearest left and right valid pixels. The weight $\alpha$ is defined as the normalized distance between the sample and its right neighbor. In this way, phase $\varphi (s_i, u_i)$ becomes smoothly varied with the angle $\theta$ and achieves more accurate depth.

Another problem in phase EPI is the wrapping effect. The extracted phase from sinusoidal signals is periodically folded to the main range of $(-\pi , \pi ]$, and that causes incorrect cost. As shown in Fig. 4 (a), there exist several cases of the candidate lines. For a pixel inside a period like $p_{1}$, the optimal line will always in a single period and the phase samples are continuous. For a boundary pixel like $p_{2}$, the optimal line may cover two periods and the phase samples have a sharp discontinuity. Usually, this wrapping effect is eliminated by unwrapping the whole phase map, which may introduce errors and is also time consuming. We notice that, only the samples in the candidate line are involved in calculating the cost, so we propose a local phase refinement method. To be specific, the phase jump can be detected by the difference of neighboring phase samples. In Fig. 4(c), the phase jumps from $\pi$ to $-\pi$, and the phase drop is larger than $\pi$. In this case, phase refinement is performed for all samples as shown in Eq. (12)

(12)$${\varphi_u} = \bmod ({\varphi_u},2\pi ) - \pi,$$

where ‘mod’ means modulo operation. In this way, continuous phase can be obtained as shown in Fig. 4(c).

Fig. 4. Proposed local phase refinement method. (a) Wrapped phase-EPI. (b)-(e) Re-sampled and refined phases of $L(p_{1},\theta )$-$L(p_{4},\theta )$ in (a), respectively.

Download Full Size | PDF

A very special case is $L(p_3,\theta )$ in Fig. 4, where the candidate line is quite close even overlaps with the fringe boundary. It may happen that $\varphi _l$ is close to $\pi$ and $\varphi _r$ close to -$\pi$, and an additional $2\pi$ must be added to $\varphi _r$ before performing Eq. (11). In this case, the resampled phase values are close to $\pi$ or $-\pi$ but not regularly varied. After refinement, continuous phase values are close to zero as shown in Fig. 4(d).

Compared with global phase unwrapping, the proposed phase refinement needs little computation and also avoids error propagation. It refines the phase sample distribution for boundary pixels, but does not always produce continuous phases. If $\theta$ is too small or too large, the candidate line covers multiple periods like $p_4$ in Fig. 4(a), and the refined phase curve is shown in Fig. 4(e), which still have many jumps. As a result, the cost-$\theta$ curve is not monotonic and many local minima exist when rotating the candidate line. Fortunately, this effect does not greatly impair the detection of the optimal line. On one hand, the costs of the local minima are much larger than the global minimum, and a global optimization algorithm can handle the case. On the other hand, this case only happens for very large or small $\theta$ values, but the optimal $\theta$ is usually in the middle range. Take $p_4$ as an example, by anti-clockwise rotating the candidate line, the phase samples will become smoothly varied as the case of $p_1$. In Section 5.5, this effect will be further discussed with patterns of varying frequencies.

4.3 Weighted phase variance cost

As presented in Section 3, with the refined phase, a cost defined as the variance in Eq. (5) can be used to find the optimal candidate line. Nevertheless, in practice there exists distortion effect in image acquisition, and the border views are more severely distorted and thus their phase values are in lower quality compared with the central view. Therefore, we propose to modify the cost by assigning weights to the phase variance, e.g.

(13)$$E_{L(p,\theta)} = \sum_{u ={-}U}^U {{w_u}({\varphi_u} - } \overline \varphi )^2,$$

where $w_{u}$ is the weight defined as

(14)$${w_u} = \exp ( - \frac{{|u|}}{{{\delta_w ^2}}}).$$

Here $\delta _w$ is the attenuation parameter. With the weight $w_u$, high quality center views contribute more to the cost than the low quality boarder views, which improves the robustness.

4.4 Depth estimation with cost optimization

Once the cost defined, the crucial task is to find the global minimum. Ideally, the cost is strictly convex with a single minimum as formulated in Eq. (13). Unfortunately, in practice the cost curve has many local minima due to occlusion, noise and the wrapped phase. In Fig. 5, we take four pixels as example, where $P_1$ and $P_2$ are in smooth surfaces while $P_3$ and $P_4$ are boundary pixels between foreground and background. It can be noticed that the cost curves of $P_1$ and $P_2$ are convex. In contrast, for $P_3$ and $P_4$, the cost curves are quite complex with many local minimal points. In this case, simple optimization methods such as ‘gradient descend’ are highly likely to converge to a local minimum and cannot report the correct result. To solve this problem, we propose a global optimization framework based on beetle antennae search (BAS) algorithm [18].

Fig. 5. The capture of a SAI and some cost curves. (a) The phase-SAI in the center view. (b)(c) Cost curves of $P_{1}$, $P_{2}$ on the surface. (d)(e) Cost curves of $P_{3}$ and $P_{4}$ on the occlusion edge.

Download Full Size | PDF

BAS mimics the detecting and searching behaviors of beetles. A beetle always checks the environment with its two antennas, and make its move based on the received signals. The BAS algorithm works in a similar way to find the global minimum in the solution space of $\theta$ within limited iterations. In the $t^{th}$ iteration, given the current position $\theta _{t}$, BAS checks the left and right neighboring positions $\theta _{l}$ and $\theta _{r}$ with its ‘antenna’,

(15)$$\left\{ \begin{array}{l} \theta_{l} = \theta_{t} - \beta_{t}\\ \theta_{r} = \theta_{t} +\beta_{t} \end{array} \right.,$$

where $\beta _{t}$ is the antenna length. After that, $\theta$ will be updated to the side with lower cost. In other words, if $E_{L(p,\theta _{l})}$, the cost of the left position, is smaller than that of the right position $E_{L(p,\theta _{r})}$, it marches to the left, otherwise it moves to the right as shown in Eq. (16).

(16)$$\theta_{t+1}=\begin{cases} \theta_{t} - \Delta \theta_{t}, & \textrm{if} \quad E_{L(p,\theta_{l})} \le E_{L(p,\theta_{r})} \\ \theta_{t} + \Delta \theta_{t}, & \textrm{otherwise} \\ \end{cases},$$

where $\Delta \theta _{t}$ is the moving step. In such a manner, $\theta _{t}$ always marches along the direction of smaller cost and reaches the minimum. $\beta _{t}$ and $\Delta \theta _{t}$ also attenuates after each iteration

(17)$$\begin{cases} {\beta_{t+1}} = 0.80{\beta_{t}}\\ {\Delta \theta_{t+1}} = 0.85{\Delta \theta_{t}}\end{cases},$$

which can accelerate the convergence.

BAS can robustly find the global minimum in principle. It tests a pair of left and right candidates, and it also uses large moving step $\Delta \theta$. In such a manner, positions in different convex intervals are compared and the optimal convex interval can firstly be found. After that, as the moving step $\Delta \theta$ decreases, the global minimum, which is also the local minimum of the optimal interval, can be reached. In the implementation of this work, $\theta$ is always in the range of $(0^{\circ }, 180^{\circ }]$, and thus $\beta$, $\theta$ and $\Delta \theta$ are initialized as $89^{\circ }$, $90^{\circ }$ and $30^{\circ }$, respectively. Figure 6 presents the optimization process of BAS, where the green arrows with indexes illustrate the first seven steps. The figures show that, the tested position first varies in different intervals, and finally the global minimum is reached.

Fig. 6. Optimization of BAS. (a) The scatter plot of the cost $E$ and the angle $\theta$ in cost minimization. (b) The curve of $\theta$ with iteration $t$.

Download Full Size | PDF

Finally, with the detected optimal angle $\theta ^{*}$, the disparity can be derived as

(18)$$d(p) = cot(\theta^*).$$

In Algorithm 1, we summarize the procedure of finding the global minimum with BAS algorithm in detail.

Furthermore, to remove noise while preserving the sharp boundary between objects, bilateral filter [30] is finally incorporated to the depth maps.

5. Experiments and analysis

The proposed method is evaluated with extensive experiments. First, it is tested on both conventional LF and PLF to verify the advantage of PLF. Second, the proposed algorithm is compared with state-of-the-art PLF depth estimation methods: Slope-descend(SD) [25], SPO [31], Refocus [16], CAE [14], OCC [7] with extensive simulation and real experiments. After that, the complexity is tested to prove the efficiency of the proposed algorithm. Finally, the influence of the parameters is addressed. In addition, the discussion about the limitation of the proposed method is also presented.

In the validation, the light field consists of 15$\times$15 views, each in a spatial resolution of 512$\times$512 or 434$\times$625, which is also the parameter of Lytro ILLUM. The projected patterns are in a resolution of 1920$\times$1080. In simulation data, three-step phase shifting is used and each pattern has 47 sinusoidal periods. In captured data, to better against noise, six-step phase shifting is used and each pattern has 60 sinusoidal periods. Parameters for BAS are set as $\theta _0 = 90^{\circ }$, $\beta _0 = 89^{\circ }$ and $\Delta \theta _0 = 30^{\circ }$. The weighting parameter $\delta _w$ in Eq. (14) is 5.5.

5.1 Comparison between PLF and conventional LF

To validate the advantage of the PLF, a conventional LF and PLF of the scene ‘head’ are simulated with 3dsmax. After that, the proposed depth estimation method is performed on both types of data and the results are shown in Fig. 7.

Fig. 7. Comparison of conventional LF (upper row) and PSLF (lower row). (a) Image and code of the center view. (b) Cost curves of a pixel in planar surface. (c) Depth maps derived by the proposed method. (d) The $400^{th}$ row of (c).

Download Full Size | PDF

In conventional light field, the images are homogeneous without unique features, and thus cost curve is not convex. In Fig. 7, the cost keeps unchanged when $\theta$ in the range of $[50^{\circ }, 60^{\circ }]$. Therefore, it is difficult to report the best disparity and thus serious noise is introduced to the final results. For PLF, the projector provides pixel-wise phase as features, and thus the cost curve is convex, which guarantees that an unique best depth can be reported.

5.2 Results on simulation data

Three simulated scenes ‘leaves’, ‘head’ and ‘teapot’ are used to test the proposed algorithm. The disparity maps are shown in Fig. 8 and objective quality metrics of MAD and BP1 [32] are presented in Table 1 and Table 2.

Fig. 8. Depth maps of synthetic data: ‘leaves’, ‘head’ and ‘teapot’.

Download Full Size | PDF

Table 1. MAD of simulated data (with label ‘S’) and captured data (with label ‘C’)

View Table | View all tables in this article

Table 2. BP1 of simulated data (with label ‘S’) and captured data (with label ‘C’)

View Table | View all tables in this article

It can be observed in Fig. 8 that, there exist many mistakes at object boundaries in the results of SD [25]. The reason lies in the fact SD [25] is trapped to local minima and thus yields incorrect results. SPO [31], Refocus [16], CAE [14] and OCC [7] test a series of depth candidates with tiny steps one-by-one, and thus they can find the global minimum and produce accurate results. However, OCC [7] cannot handle the phase wrapping effect, and thus there exist diagnose stripes. As to the proposed method, thanks to the BAS optimization, the global minimum can be found and the results are comparable with the Refocus method [16] which is with the best quality.

Table 1 and Table 2 present quantitative assessment results. For both MAD and BP1, smaller values indicate less distortion and thus better quality. In general, SD [25] suffers from the severest distortions while Refocus [16] performs the best. Other methods achieve slightly worse but similar performance with [16].

In some cases, the proposed method does not perform the best, the reason lies in the difference of the information used in depth estimation. The proposed method only utilizes candidate lines in EPI. Refocus [16], CAE [14] and OCC [7] are based on light field refocusing, where all sub-aperture images (SAIs) are used. SPO [31] is conducted on parallelogram regions in EPI. These state-of-the-art methods utilize more information than the proposed method, and achieve higher accuracy sometimes. Please note that the abundant information also introduces great computation burden and causes low efficiency. The proposed method in fact produces comparable quality depth maps with much higher efficiency, which will be presented in section 5.4.

5.3 Results on captured data

In addition to the simulation results, we also build a structured light filed system with a Lytro ILLUM camera and a BENQ MW612 projector. Two scenes ‘toys’ and ‘geo’ are used for the test, where ‘geo’ has plaster statues with lambertian surfaces while ‘toys’ is a more challenging scene with three object: a plastic toy with reflective surface, a normal plaster statue and a paper cup with black strip texture. Different from the simulated data, the patterns in real scenes are impaired by factors such as noise and thus is more challenging to generate depth.

The obtained depth maps are shown in Fig. 9, where the occluded background regions are identified with [33] and marked black. Similar to simulation, SD [25] yields many outliers which are caused by local minima. CAE [14] and OCC [7] have obvious stripes caused by the phase wrapping effect. SPO [31], Refocus [16] and the proposed method achieves better quality. The quantitative metrics of MAD and BP1 in Table 1 and Table 2 also prove the performance of the proposed method. We further measure the shape of a ping-pong ball, which is with the standard radius of 20mm, and the results are presented in Fig. 10. The four figures are the captured pattern, the phase map, the depth map and the height against a virtual background plane $height=0$ of the central column in millimeters. Since the camera view is limited and less than half of the ball can be captured. In Fig. 10(d), only samples above the red horizontal line are valid, and the vertical solid red line corresponds to the radius, whose height is between 15mm and 35mm. Nevertheless, there still exist distortions between a standard ball and the measured shape, especially at boundaries, due to the following reasons. On the one hand, there exist occlusions between the foreground and the background, and thus EPIs are incomplete, which finally leads to distorted depth values. This boundary occlusion issue is a limitation of our method, which will be discussed in detail in Section 5.6. On the other hand, boundary regions have severer light attenuation, and thus the phase and depth are in lower quality. Last but not the least, the intrinsic limitation of the Lytro Illum camera, such as the super-narrow baseline and the low spatial resolution, also impairs the depth accuracy.

Fig. 9. Depth maps of captured data: ‘geo’, ‘toys’.

Download Full Size | PDF

Fig. 10. Results of a ping-pong ball in central view. (a)Captured fringe pattern. (b)Phase Map. (c) Depth map. (d) Height in millimeters of the central column against a virtual background plane $height=0$.

Download Full Size | PDF

5.4 Comparison of time consumption

In addition to quality, efficiency is also an important issue in depth estimation. Therefore, the process of searching the global minimum with the ‘one-by-one’ testing methods, including REFOCUS [16], CAE [14], OCC [7], SPO [31] and the proposed method are shown in Fig. 11. The ‘one-by-one’ testing methods test numerous, for instance 80 in this paper, candidates, which corresponds to the dense samples in Figs. 11(a). Otherwise, the global minimum cannot be found. In contrast, the proposed method locates the global minimum with BAS and only 40 samples are tested in Fig. 11(b). In this way, the computation load is greatly reduced and depth can be produced efficiently.

In addition to the cost curves, the time consumption for producing a depth map is shown in Table 3. In the row of platform, ‘M’ means pure Matlab and ‘M+C’ means mixed programming of MATLAB and C for acceleration. It can be noticed that even with mixed programming, the ‘one-by-one’ methods need more than 200 even 1100 seconds to produce a depth map. SD [25] is much faster, but as analyzed before, it only applies to the cases when the cost curve is strictly convex, otherwise it will trap into local minimum. The proposed method needs only about 40 seconds on average for each image, which is about 5.9 times faster than the classic refocused method [16].

Fig. 11. Sketch of the energy minimization. (a)‘One-by-one’ searching methods [16] [14] [7] [31]. (b)The proposed method. (c) Curves of $\theta$-iteration of (b).

Download Full Size | PDF

Table 3. Time consumed (in seconds) for producing a depth map.

View Table | View all tables in this article

In fact, the time can be even shorten at the cost of little accuracy drop if we reduce the iterations, e.g. $t_{max}$ in Algorithm 1. As shown in Fig. 11(c), the BAS algorithm has 40 iterations to make sure the convergence, but the figure shows that the angle $\theta$ almost converges after 25 iterations. So if we slightly relax the condition and reduce the iterations, the efficiency can be further improved.

5.5 Influence of parameters

In addition to the results, we further analyze the influence the parameters in BAS algorithm and the projected patterns.

BAS has three parameters, initialized angle $\theta _0$, antennae length $\beta _0$ and moving step $\Delta \theta _0$. $\theta _0$ is fixed to $90^{\circ }$, the middle point of the valid range of $\theta$, and thus only $\beta _0$ and $\Delta \theta _0$ are investigated. As shown in Figs. 12 and 13, the cost curves have multiple local minima, and the search range should be large enough. With a small $\beta _0$ or a small $\Delta \theta _0$, the search range is quite limited and it sometimes fails to find the global minimum. In contrast, if the parameters are too large, the optimization state is less stable and takes longer time to converge.

Fig. 12. Influence of $\beta _0$. (a)-(c) Cost curves with test angles with different $\beta _0$. (d) Curve of the angle $\theta$ about the iteration $t$.

Download Full Size | PDF

Fig. 13. Influence of $\Delta \theta _0$. (a)-(c) Cost curves with test angles with different $\Delta \theta$. (d) Curve of the angle $\theta$ about the iteration $t$.

Download Full Size | PDF

For the pattern, the angular and spatial resolutions are fixed and the influence of the pattern frequency is studied. As shown in Fig. 14, patterns modulated with three frequencies are used for the test, and the EPIs and cost-$\theta$ curves are presented. It can be noticed that, as the frequency increases, the stripe in phase EPI becomes narrower, and the cost curve becomes steeper and is beneficial to converging to the minimum with higher accuracy. On the other hand, as shown in Fig. 14(d), when the stripes become narrower, a candidate line with the same slope covers more periods, and the refined phase samples still have discontinuities, as presented in Section 4.2. As a result, as $\theta$ varies in a wide range, the cost does not monotonically change, and we can see several local minimal points. More specifically, with higher pattern frequency, the cost curve becomes more complex with increasing number of local minima. Although the cost values of these local minima are still much larger than that of the global minimum and BAS is a global optimization algorithm, the risk of being trapped into local minima still increases. Therefore, a good pattern frequency should balance the accuracy and the robustness.

Fig. 14. Influence of pattern frequency. (a)-(c) Phase map with pattern frequency 10, 30 and 50, respectively. (d)(e) The corresponding EPIs and cost curves. The candidate lines in the three phase EPIs in (d) have equal slopes, but cover different periods.

Download Full Size | PDF

5.6 Limitation

Although the proposed method has high accuracy and efficiency, it also has failure cases at background boundary points. As shown in Fig. 15, the boundary fringe in background is incomplete due to occlusion. For point $p$, the real optimal line, the solid one, covers some foreground pixels and thus the cost increases. Instead, the detected optimal line rotates to the dash line, which has smaller cost than the dash line. As a result, the depth value shifts to the foreground a bit than its original background. This shift effect leads to blurring of background boundary in the depth map as shown in Fig. 15(c). To solve this problem, in the near future, we will conduct a more refined framework to split the candidate line into foreground and background segments, and estimate depth only with the background part.

Another limitation is that the dense views in light field are not adequately utilized due to the trade-off between accuracy and efficiency. Compared with the refocusing-based methods that utilize all views, EPI only consists the central horizontal and vertical views. One of our future work is to incorporate more views under the optimization framework, which can further improve the accuracy.

Fig. 15. The failure case on object boundaries. (a) Sketch of optimal line in the phase EPI. The solid line is the real optimal line, and the dash line is the detected one. (b)(c)Example of phase map and the corresponding depth.

Download Full Size | PDF

6. Conclusion

In this paper, we proposed a novel framework to estimate depth for phase-coding light field data. PLF leverages phase-coding structured light to light field, and enjoys both dense feature and high-dimensional data. With PLF and phase EPI, a cost of weighted phase variance is proposed, which is a convex function about the inclination angle of candidate line. After that, a BAS-based cost minimization method is incorporated to find the global minimum of the cost, and thus the depth can be derived. Extensive experiments have been conducted to verify the performance of the proposed method. The results show that our method produces accurate depth maps, which is comparable with the state-of-the-art refocused-based depth estimation method. In addition, the proposed method accelerates the depth estimation process at about 5.9 times over the refocused-based method, which can facilitate practical applications.

Funding

Wuhan University of Science and Technology (2017xz008); National Natural Science Foundation of China (61702384).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, (1996), pp. 31–42.

2. M. Magnor and B. Girod, “Data compression for light-field rendering,” IEEE Transactions on Circuits Syst. for Video Technol. 10(3), 338–343 (2000). [CrossRef]

3. S. Vagharshakyan, R. Bregovic, and A. Gotchev, “Light field reconstruction using shearlet transform,” IEEE transactions on pattern analysis machine intelligence 40(1), 133–147 (2018). [CrossRef]

4. H. Zhang, C.-J. Zhu, X. Tang, N. He, Y. Zeng, Q. Liu, and S. Xiang, “A light field sparse and reconstruction framework for improving rendering quality,” IEEE Access 8, 209308–209319 (2020). [CrossRef]

5. C.-C. Chen, Y.-C. Lu, and M.-S. Su, “Light field based digital refocusing using a dslr camera with a pinhole array mask,” in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2010), pp. 754–757.

6. C. Zhang, G. Hou, Z. Zhang, Z. Sun, and T. Tan, “Efficient auto-refocusing for light field camera,” Pattern Recognit. 81, 176–189 (2018). [CrossRef]

7. T.-C. Wang, A. A. Efros, and R. Ramamoorthi, “Depth estimation with occlusion modeling using light-field cameras,” IEEE transactions on pattern analysis machine intelligence 38(11), 2170–2181 (2016). [CrossRef]

8. T.-C. Wang, A. A. Efros, and R. Ramamoorthi, “Occlusion-aware depth estimation using light-field cameras,” in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 3487–3495.

9. S. Wanner and B. Goldluecke, “Globally consistent depth labeling of 4d light fields,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 41–48.

10. H. Lv, K. Gu, Y. Zhang, and Q. Dai, “Light field depth estimation exploiting linear structure in epi,” in 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (IEEE, 2015), pp. 1–6.

11. Y. Zhang, H. Lv, Y. Liu, H. Wang, X. Wang, Q. Huang, X. Xiang, and Q. Dai, “Light-field depth estimation via epipolar plane image analysis and locally linear embedding,” IEEE Transactions on Circuits Syst. for Video Technol. 27(4), 739–747 (2017). [CrossRef]

12. M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in Proceedings of the IEEE International Conference on Computer Vision (2013), pp. 673–680.

13. T. Tao, Q. Chen, S. Feng, Y. Hu, and C. Zuo, “Active depth estimation from defocus using a camera array,” Appl. Opt. 57(18), 4960–4967 (2018). [CrossRef]

14. Williem, I. K. Park, and K. M. Lee, “Robust light field depth estimation using occlusion-noise aware data costs,” IEEE transactions on pattern analysis machine intelligence 40(10), 2484–2497 (2018). [CrossRef]

15. Z. Cai, X. Liu, X. Peng, Y. Yin, A. Li, J. Wu, and B. Z. Gao, “Structured light field 3d imaging,” Opt. Express 24(18), 20324–20334 (2016). [CrossRef]

16. Z. Cai, X. Liu, G. Pedrini, W. Osten, and X. Peng, “Accurate depth estimation in structured light fields,” Opt. Express 27(9), 13532–13546 (2019). [CrossRef]

17. Z. Cai, G. Pedrini, W. Osten, X. Liu, and X. Peng, “Single-shot structured-light-field three-dimensional imaging,” Opt. Lett. 45(12), 3256–3259 (2020). [CrossRef]

18. X. Jiang and S. Li, “Bas: Beetle antennae search algorithm for optimization problems,” Int. J. Robotics Control. 1(1), 1 (2018). [CrossRef]

19. Z. Cai, X. Liu, X. Peng, and B. Z. Gao, “Ray calibration and phase mapping for structured-light-field 3d reconstruction,” Opt. Express 26(6), 7598–7613 (2018). [CrossRef]

20. X. Zhang, Z. Cai, X. Liu, and X. Peng, “Improved 3d imaging and measurement with fringe projection structured light field,” 2019 Int. Conf. on Opt. Instruments Technol. Optoelectronic Imaging/Spectroscopy Signal Process. Technol. 11438, 73 (2020). [CrossRef]

21. P. Zhou, Y. Zhang, Y. Yu, W. Cai, and G. Zhou, “3d reconstruction from structured light field by fourier transformation profilometry,” in AOPC 2019: Optical Sensing and Imaging Technology, vol. 11338 (2019), p. 113381K.

22. P. Zhou, Y. Zhang, Y. Yu, W. Cai, and G. Zhou, “3d shape measurement based on structured light field imaging,” Math. Biosci. Eng 17(1), 654–668 (2020). [CrossRef]

23. Z. Wang, Y. Yang, X. Liu, Y. Miao, Q. Hou, Y. Yin, Z. Cai, Q. Tang, and X. Peng, “Light-field-assisted phase unwrapping of fringe projection profilometry,” IEEE Access 9, 49890–49900 (2021).

24. Z. Cai, X. li Liu, G. Pedrini, W. Osten, and X. Peng, “Structured-light-field 3d imaging without phase unwrapping,” Opt. Lasers Eng. 129, 106047 (2020). [CrossRef]

25. H. D. L. Liu, S. Xiang, and J. Wu, “Fast geometry estimation for phase-coding structured light field,” in 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) (2020), pp. 124–127.

26. P. S. Huang and S. Zhang, “Fast three-step phase-shifting algorithm,” Appl. Opt. 45(21), 5086–5091 (2006). [CrossRef]

27. B. Pan, Q. Kemao, L. Huang, and A. Asundi, “Phase error analysis and compensation for nonsinusoidal waveforms in phase-shifting digital fringe projection profilometry,” Opt. Lett. 34(4), 416–418 (2009). [CrossRef]

28. X. Su and W. Chen, “Fourier transform profilometry:: a review,” Opt. lasers Eng. 35(5), 263–284 (2001). [CrossRef]

29. M. Takeda and K. Mutoh, “Fourier transform profilometry for the automatic measurement of 3-d object shapes,” Appl. Opt. 22(24), 3977–3982 (1983). [CrossRef]

30. B. Zhang and J. P. Allebach, “Adaptive bilateral filter for sharpness enhancement and noise removal,” IEEE transactions on Image Process. 17(5), IV-417–IV-420 (2008). [CrossRef]

31. S. Zhang, H. Sheng, C. Li, J. Zhang, and Z. Xiong, “Robust depth estimation for light field via spinning parallelogram operator,” Comput. Vis. Image Underst. 145, 148–159 (2016). [CrossRef]

32. K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke, “A dataset and evaluation methodology for depth estimation on 4d light fields,” in Asian Conference on Computer Vision (Springer, 2016), pp. 19–34.

33. S. Xiang, Y. Yang, H. Deng, J. Wu, and L. Yu, “Multi-anchor spatial phase unwrapping for fringe projection profilometry,” Opt. Express 27(23), 33488–33503 (2019). [CrossRef]

	SD [25]	REFOCUS [16]	CAE [14]	OCC [7]	SPO [31]	Proposed
Teapot(S)	0.1052	0.0045	0.0040	0.0117	0.0049	0.0106
Head(S)	0.1163	0.0026	0.0171	0.0211	0.0167	0.0096
Leaves(S)	0.0644	0.0090	0.0061	0.0135	0.0049	0.0119
Toys(C)	0.0381	0.0093	0.0254	0.0387	0.0182	0.0274
Geo(C)	0.0388	0.0064	0.0262	0.0363	0.0192	0.0320

	SD [25]	REFOCUS [16]	CAE [14]	OCC [7]	SPO [31]	Proposed
Teapot(S)	0.0127	0.0045	0.0042	0.0112	0.0052	0.0067
Head(S)	0.0492	0.0025	0.0163	0.0205	0.0159	0.0036
Leaves(S)	0.0426	0.0092	0.0066	0.0133	0.0052	0.0081
Toys(C)	0.0268	0.0080	0.0414	0.0425	0.0070	0.0098
Geo(C)	0.0184	0.0040	0.0065	0.0472	0.0034	0.0069

Method	SD [25]	REFOCUS [16]	CAE [14]	OCC [7]	SPO [31]	BAS
Platform	M	M+C	M+C	M+C	M	M
Teapot	60.25	238.96	1102.28	255.84	243.10	40.67
Head	57.32	238.15	1074.83	308.85	246.85	42.48
Leaves	56.05	241.42	1128.59	241.45	262.22	40.61
Toys	54.89	245.22	1191.68	319.59	265.37	39.41
Geo	53.98	246.24	1181.50	337.39	266.14	39.25

	SD [25]	REFOCUS [16]	CAE [14]	OCC [7]	SPO [31]	Proposed
Teapot(S)	0.1052	0.0045	0.0040	0.0117	0.0049	0.0106
Head(S)	0.1163	0.0026	0.0171	0.0211	0.0167	0.0096
Leaves(S)	0.0644	0.0090	0.0061	0.0135	0.0049	0.0119
Toys(C)	0.0381	0.0093	0.0254	0.0387	0.0182	0.0274
Geo(C)	0.0388	0.0064	0.0262	0.0363	0.0192	0.0320

	SD [25]	REFOCUS [16]	CAE [14]	OCC [7]	SPO [31]	Proposed
Teapot(S)	0.0127	0.0045	0.0042	0.0112	0.0052	0.0067
Head(S)	0.0492	0.0025	0.0163	0.0205	0.0159	0.0036
Leaves(S)	0.0426	0.0092	0.0066	0.0133	0.0052	0.0081
Toys(C)	0.0268	0.0080	0.0414	0.0425	0.0070	0.0098
Geo(C)	0.0184	0.0040	0.0065	0.0472	0.0034	0.0069

Fast depth estimation with cost minimization for structured light field

Abstract

1. Introduction

2. Related work

3. Principle

4. Proposed method

4.1 Acquisition of PLF and phase EPI

4.2 Phase re-sampling and refinement

4.3 Weighted phase variance cost

4.4 Depth estimation with cost optimization

5. Experiments and analysis

5.1 Comparison between PLF and conventional LF

5.2 Results on simulation data

5.3 Results on captured data

5.4 Comparison of time consumption

5.5 Influence of parameters

5.6 Limitation

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (15)

Tables (3)

Equations (18)

Optics Express