## Abstract

In this paper, we propose a method by means of light field imaging under structured illumination to deal with high dynamic range 3D imaging. Fringe patterns are projected onto a scene and modulated by the scene depth then a structured light field is detected using light field recording devices. The structured light field contains information about ray direction and phase-encoded depth, via which the scene depth can be estimated from different directions. The multidirectional depth estimation can achieve high dynamic 3D imaging effectively. We analyzed and derived the phase-depth mapping in the structured light field and then proposed a flexible ray-based calibration approach to determine the independent mapping coefficients for each ray. Experimental results demonstrated the validity of the proposed method to perform high-quality 3D imaging for highly and lowly reflective surfaces.

© 2016 Optical Society of America

## 1. Introduction

Optical 3D imaging based on fringe projection has advantages: non-contact, high-quality and flexible, and has applications in industry, biomedicine, entertainment and so on [1]. In most situations it performs well for objects with a Lambertian reflective surface. However, for the surfaces that are highly or lowly reflective, the fringe modulation is low, which will lead to low-reliability phase computation and final low-quality 3D imaging [2]. In this case, high dynamic range (HDR) 3D imaging has been studied, and various HDR methods, such as multi-viewpoint imaging, parameter adjustment, light filtering, etc., have been developed [3–11]. The multi-viewpoint imaging method is performed by transforming the direction of camera [3] or projector [4], which ensures that surfaces with high and low reflectivity can be reconstructed in the best viewpoints respectively, and finally merged together for HDR 3D imaging. In the parameter adjustment method, the camera’s exposure time [5,6] or the projector’s illuminating brightness [7] or both [8] are adjusted to integrate high-quality images suitable for HDR 3D imaging. In addition, the light filtering method can achieve HDR 3D imaging by using polarization or color filters [9–11] to acquire the valid data on surfaces with high and low reflectivity. Although these methods achieve success for HDR 3D imaging, they require capturing multi-group fringe images, which suffer from time-consuming data acquisition and complicated data processing. In this paper, we make use of light-field property of multidirectional ray recording to overcome this drawback.

Light field imaging as a technology to simultaneously record the position and direction of light ray has recently been studied [12–17]. With the recording light field data, scene depth can be estimated through two kinds of commonly used techniques, i.e. disparity and blur [18–23]. They are regarded as two dimensions of performance on the scene depth variation. The disparity technique extracts at least two images from different viewpoints from the light field data and then calculates the disparity map with matched features among these multiview images. The blur technique utilizes the light field data to focus on different depths to obtain focal stacks and then estimates the blurring kernel (similar to depth from defocus) or the focusing degree (similar to depth from focus). For example, Atanassov et al. estimated the pixel shift of corresponding points in adjacent microimages to extract depth with different spatial resolutions [18]. Wanner and Goldluecke employed the structure tensor of an epipolar image to calculate pixel matching direction to obtain depth estimation [20]. Frese and Gheta estimated depth information through minimizing energy functionals fusing the stereo and focus series acquired by a camera array with different focus lengths [22]. Tao et al. obtained disparity and defocus cues by computing the light-field epipolar image along different directions respectively, and then combined the two cues to estimate the scene depth [23]. However, both techniques are in terms of scene-dependent features, such as color, texture, illumination, etc., so both have problems of 3D imaging quality and robustness due to the complex scene: occlusion, discontinuous depth, repeat texture and diverse illumination.

To address the aforementioned issues, this paper presents a novel method, structured light field 3D imaging (SLF-3DI), in which the light field with structured illumination is studied for HDR 3D imaging. Standard fringe patterns are projected onto the scene, whose phases are modulated by the scene depth. Then the modulated patterns can be recoded using light field format, namely, the SLF. In an SLF, we can trace rays in different directions and extract the modulated fringe phases from each of them. Subsequently, the phase-depth mapping related to the rays in an SLF can be researched to estimate the scene depth. Two superiorities of this SLF-3DI method can be accomplished. First, because the encoding phase information is essentially independent of intensity and modulated only by the scene depth, it is adaptive to the illuminating environment and the material of object surface. Thus the SLF-3DI method is robust to scene features for high-quality 3D imaging. Second, the scene depth can be estimated from each ray in SLF independently, giving the SLF-3DI method the ability of multidirectional depth estimation and selection for HDR 3D imaging. Therefore, it is possible to reconstruct the 3D profile of highly and lowly reflective surfaces simultaneously without multi-group fringe images.

## 2. Principle

The light field is parameterized through intersection of a ray with two parallel planes, i.e. 4D light field $L\left(u,s\right)$, where $L$ is the radiance intensity along the ray ${l}_{us}$, $u=\left(u,v\right)$ and $s=\left(s,t\right)$ represent coordinates of the two intersection points in light field coordinate system [12]. In this paper we adopt this 4D light field to parameterize the SLF.

The SLF-3DI system consisting of a projector and a light field recording device, e.g. a plenoptic camera with a microlens array in front of the image sensor [24], is shown in Fig. 1(a). A beam reflected from an object point is not integrated on a pixel area as is done by an ordinary camera, but is split by the microlenses and recorded on different pixels. This gives the plenoptic camera the ability to record the direction of a ray so that rays reflected from different directions can be distinguished and processed independently. Additionally, under structured illumination, the reflected rays carry the modulated phase information associated with the scene depth. Consequently each ray in the SLF contains a mapping relationship between the phase and the depth. Once the phase-depth mapping in the SLF is determined, the scene depth can be estimated for 3D imaging.

A 2D diagram of an SLF-3DI system within a world coordinate system $XOZ$ is illustrated in Fig. 1(b), where the parallel plane pair $u-s$ denotes the parameterized light field in a light field coordinate system, the point P denotes the projection center of the projector, and the reference plane is set at $Z=0$. Thus, the world coordinate of a point on a record ray can be described as a function of $u$ and $s$. For instance, on the ray $\overline{\text{BC}}$, the world coordinates of points B and C can be denoted as $\text{B}=\left({X}_{\text{B}}\left(u\right),{Z}_{\text{B}}\left(u\right)\right)$ and $\text{C}=\left({X}_{\text{C}}\left(s\right),{Z}_{\text{C}}\left(s\right)\right)$ respectively. It should note that the parallel plane pair $u-s$ here are artificially selected planes to parameterize the 4D light field. They are not planes of the photosensor and microlens array in Fig. 1(a). The 2D diagram can meet the requirement of formula derivation in the next section, since the scene depth $d\left(u,s\right)$ can be represented using *X-Z* coordinates of points B, C and P and the phase difference associated with *X*-coordinate of points A and E.

Dashed lines in Fig. 1(b) show that rays in different directions can modulate a scene depth with different phase values. Thus the scene depth can be estimated from various directions for multidirectional 3D imaging. Even so, these depth values mapping from the phases generally have different reliabilities due to the direction-dependent reflectivity. The difference in reflectivity from different directions causes discrepant modulations. The higher the modulation, the higher the phase reliability, causing a higher quality of depth estimation. Consequently, high-quality depth values can be selected in terms of the modulation intensity in SLF. This ability of multidirectional depth estimation and selection makes the SLF-3DI method be robust to the reflection property of the object’s surface, which is suitable for HDR 3D imaging. Next we will explore phase-depth mapping to estimate the scene depth in an SLF.

## 3. Method

#### 3.1. Phase-depth mapping in SLF

A sinusoidal fringe pattern is projected onto the reference plane with the intensity to be

where $a$ is the background intensity, $b$, the modulation intensity, and $f$, the spatial frequency. In SLF, the recorded radiance intensity is proportional to the projected intensity and the surface reflectivity. Let $R\left(u,s\right)$ represent the surface reflectivity dependent on the direction of a ray ${l}_{us}$. So the SLF can be represented asObserving the geometric optical structure in Fig. 1(b), when there is no object, the projected ray $\overline{\text{PA}}$ is reflected from the point A on the reference plane. The corresponding modulated phase is ${\varphi}_{\text{ref}}=2\pi f{X}_{\text{A}}$. If there is an object, another projected ray $\overline{\text{PE}}$ is reflected from the object point D with depth *d* relative to the reference plane. The corresponding modulated phase is changed to ${\varphi}_{\text{obj}}=2\pi f{X}_{\text{E}}$. So their phase difference is

According to the relations among the phase difference $\Delta \varphi $ and the location of points $\text{B}=\left({X}_{\text{B}}\left(u\right),{Z}_{\text{B}}\left(u\right)\right)$, $\text{C}=\left({X}_{\text{C}}\left(s\right),{Z}_{\text{C}}\left(s\right)\right)$ and $\text{P}=\left({X}_{\text{P}},{Z}_{\text{P}}\right)$, the depth *d* can be derived from Eq. (3) (refer to Appendix) that

#### 3.2. Ray-based phase-depth mapping calibration

Instead of determining the complicated transformation of the coordinate systems in SLF, we attempt to calibrate every ray independently. Rays in SLF can be recorded and operated separately, which enables the ray-based calibration to be achieved. Furthermore, the phase-depth mapping relationship in Eq. (4) can be transformed into a more concise form for flexible mapping calibration.

For a fixed SLF-3DI system, the coordinate of point P is standing. And for the recorded ray $\overline{\text{BC}}$ in SLF, as mentioned above, the coordinates $\left({X}_{i},{Z}_{i}\right),i=\text{B},\text{C}$ are also stationary. In this case, Eq. (4) can be simplified as

## 4. Experimental results

We set up an SLF-3DI system consisting of a plenoptic camera (Lytro 1.0) with 11 Megaray (10^{7} rays) resolution and a DLP projector (Dell M110) with 800x1280 resolution for experiments. In our experiments, the 2D Lytro image was decoded to be 4D light field with directional resolution of 11x11 and spatial resolution of 378x379 [27]. The selection of the two resolutions is a trade-off between the directional and spatial resolutions. The system setup was shown in Fig. 3, as well as a displacement platform and a plane target.

The SLF-3DI system would be calibrated first. The plane target was moved within a range of 100mm by a step of 5mm. Then a total of 21 positions was reached. We chose the position with the furthest distance to the system as the reference plane so that 20 depth values ${d}_{i}=5i,i=1,2,\cdots ,20$ relative to the reference plane were provided. At each position, the fringe patterns were projected onto the plane target, and then the phase-shifting and phase-unwrapping techniques were applied to calculate the phase map. Correspondingly, 20 phase-difference maps $\Delta {\varphi}_{i},i=1,2,\cdots ,20$ relative to the reference plane were obtained. In the experiment, 10,852,041 rays were calibrated. For every ray ${l}_{us}$, there was an independent data set $\left({d}_{i|us},\Delta {\varphi}_{i|us}\right),i=1,2,\cdots ,20$ to fit its mapping coefficients $\left({m}_{us},{n}_{us}\right)$. Figure 4 shows one set of measured data of ray ${l}_{us},\left(u,s\right)=\left(6,6,300,300\right)$ marked by red points. We can observe that the mapping relationship of the depth and phase difference were monotonic and approximately linear.

To further analyze the mapping relationship, the derivative of the depth with respect to the phase difference is derived from Eq. (5) that

The linear model is very simple, so that only one coefficient ${k}_{us}$ must be determined for each ray in the SLF. We compared the linear model with the nonlinear model within different measured ranges of 100mm and 40mm, respectively. The corresponding fitted characteristic curves were also drawn in Fig. 4. Table 1 listed the corresponding data of calibration results, including the fitting coefficients, the maximum (MAX) and root mean square (RMS) of the fitting error. It can be seen that the accuracy of the linear model is close to that of the nonlinear model within a small measured range. In this situation, the linear model can be adopted in consideration of its concise form. However, the nonlinear model is better than the linear model within a large measured range. In our experiments, we employed the nonlinear model within the measured range of 100mm.

The calibrated system could then be applied to estimate the scene depth for 3D imaging. We chose a plaster model for the experiment, which was reconstructed from eight directions: $u=\left(2,2\right),\left(2,6\right),\left(2,10\right),\left(6,2\right),\left(6,10\right),\left(10,2\right),\left(10,6\right),\left(10,10\right)$, as shown in Fig. 5. The central subfigure is a 2D image, and the upper-left drawings beside other subfigures are microlens arrays with 11x11 directional pixels in which each red pixel denotes a special ray direction. The reconstruction results lost some details due to low spatial resolution, and the reconstruction qualities had a bit of distinction in different directions because the direction-dependent reflectivity led to discrepant qualities of depth estimation. This suggests the SLF-3DI method to deal with 3D imaging of highly and lowly reflective surfaces.

To demonstrate the validity of the proposed method for HDR 3D imaging, we set up a scene including concurrently a silver metal with a highly reflective surface and a black board with a lowly reflective surface. A focused view of the scene, just as be captured by an ordinary camera, is shown in Fig. 6(a). The area of the silver metal was overexposed, while that of the black board was underexposed. We calculated the modulation intensity of the focused view, as shown in Fig. 6(c). It can be seen that both the overexposed and underexposed areas led to low modulation intensity. By contrast, the SLF included multidirectional modulation intensities, as shown in Fig. 6(b). Two heat-maps on the right side were detailed segments covered by two microlenses respectively, which indicated that the modulation intensity was varied along with the reflected direction. So in an SLF, the best direction with maximum modulation intensity can be selected for high-quality 3D imaging. Fig. 6(d) shows the maximum modulation-intensity image in the SLF, which presented a higher distribution trend when compared with Fig. 6(c). This point could be illustrated through their statistical histograms, as shown in Fig. 6(e) and 6(f), respectively. Appropriately utilizing a same-threshold value (labeling via red arrows) two modulation masks were generated, as shown in Fig. 6(g) and 6(h), respectively. In the focused view, the areas with low modulation intensity (overexposed and underexposed areas) were removed. However, the same areas in the SLF was retained.

The modulation intensity closely related to the reflected direction will has a significant impact on the quality of 3D imaging. We selected two directions with ${u}_{1}=\left(2,6\right)$ and ${u}_{2}=\left(10,6\right)$ for experiments. From these two directions, the scene was reconstructed by using the calibrated coefficients, whose depth maps are shown in Fig. 7(a) and 7(b). Upper-left drawings in each subfigure represent a microlens with 11x11 directional pixels, and each red pixel denotes a special ray direction. In the ${u}_{1}$ direction labeled in Fig. 7(a), some areas of the silver metal could not be reconstructed, while most areas of the black board were reconstructed. In the ${u}_{2}$ direction labeled in Fig. 7(b), the status exhibits the opposite trend. Next, we selected the best direction in terms of the maximum modulation intensity to reconstruct the scene, whose depth map and corresponding 3D view are shown in Fig. 7(c) and 7(d) respectively. When compared with the single-direction 3D imaging, not only could the whole scene be reconstructed, but also the 3D imaging quality was significantly better. This demonstrated that the SLF-3DI method was valid for performing high-quality 3D imaging for highly and lowly reflective surfaces (Visualization 1).

The plenoptic camera used in our experiments adopts spatial multiplexing techniques to record the light field, a trade-off between the spatial and directional resolutions. The trade-off also appear in the SLF-3DI method in that the depth can be estimated from different directions for high-quality 3D imaging, but the spatial resolution of 3D imaging is reduced. The more the directions, the higher the quality, but the lower the spatial resolution. Low spatial resolution may lead to loss of detail in a reconstructed 3D scene. However, it has not had an effect on the final HDR 3D imaging.

## 5. Conclusion

We successfully developed a novel SLF-3DI method with multidirectional depth estimation and selection suitable for HDR 3D imaging. We implemented a flexible ray-based phase-depth mapping calibration in the experiments to achieve high-quality 3D imaging, which was robust to the reflection property of object surface. However, the SLF-3DI method would reduce the spatial resolution of 3D imaging because of the trade-off between the spatial and directional sampling of the plenoptic camera.

## Appendix

The following is the derivation of the phase-depth mapping from Eq. (3). The key is to work out ${X}_{\text{A}}$ and ${X}_{\text{E}}$. With the point coordinates $\text{B}=\left({X}_{\text{B}},{Z}_{\text{B}}\right)$ and $\text{C}=\left({X}_{\text{C}},{Z}_{\text{C}}\right)$, the equation of the line $\overline{\text{BC}}$ can be represented as

*X*-coordinate from Eq. (10) that

Similarly, with the point coordinates $\text{P}=\left({X}_{\text{P}},{Z}_{\text{P}}\right)$ and $\text{D}=\left({X}_{\text{D}},d\right)$, the equation of the line $\overline{\text{PD}}$ can be represented as

*X*-coordinate from Eq. (12) that

In addition, because the point D is also on $\overline{\text{BC}}$, its *X*-coordinate can be calculated from Eq. (10) that

Substituting Eqs. (11), (13) and (14) into Eq. (3), the depth then can be deduced to be

## Funding

National Natural Science Foundation of China (NSFC) (61377017, 61405122, 61201355); Sino-German Center for Research Promotion (SGCRP) (GZ 760); Scientific and Technological Project of the Shenzhen Government (JCYJ20140828163633999, JCYJ20140509172709158).

## References and links

**1. **S. S. Gorthi and P. Rastogi, “Fringe projection techniques: Whither we are?” Opt. Lasers Eng. **48**(2), 133–140 (2010). [CrossRef]

**2. **X. Su and W. Chen, “Reliability-guided phase unwrapping algorithm: a review,” Opt. Lasers Eng. **42**(3), 245–261 (2004). [CrossRef]

**3. **G. H. Liu, X. Y. Liu, and Q. Y. Feng, “3D shape measurement of objects with high dynamic range of surface reflectivity,” Appl. Opt. **50**(23), 4557–4565 (2011). [CrossRef] [PubMed]

**4. **R. Kowarschik, P. Kuhmstedt, J. Gerber, W. Schreiber, and G. Notni, “Adaptive optical three-dimensional measurement with structured light,” Opt. Eng. **39**(1), 150–158 (2000). [CrossRef]

**5. **S. Zhang and S. Yau, “High dynamic range scanning technique,” Opt. Eng. **48**(3), 033604 (2006).

**6. **S. Fang, Y. Zhang, Q. Chen, C. Zuo, R. Li, and G. Shen, “General solution for high dynamic range three-dimensional shape measurement using the fringe projection technique,” Opt. Lasers Eng. **59**, 56–71 (2014). [CrossRef]

**7. **C. Waddington and J. Kofman, “Saturation avoidance by adaptive fringe projection in phase-shifting 3D surfaceshape measurement,” in 2010 International Symposium on Optomechatronic Technologies, (IEEE, 2010), pp. 1–4.

**8. **H. Jiang, H. Zhao, and X. Li, “High dynamic range fringe acquisition: A novel 3-D scanning technique for highreflective surfaces,” Opt. Lasers Eng. **50**(10), 1484–1493 (2012). [CrossRef]

**9. **L. Wolff, “Using polarization to separate reflection components,” in Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 1989), pp. 363–369. [CrossRef]

**10. **S. Umeyama and G. Godin, “Separation of diffuse and specular components of surface reflection by use of polarization and statistical analysis of images,” IEEE Trans. Pattern Anal. Mach. Intell. **26**(5), 639–647 (2004). [CrossRef] [PubMed]

**11. **Q. Hu, K. G. Harding, X. Du, and D. Hamilton, “Shiny parts measurement using color separation,” Proc. SPIE **6000**, 60000D (2005). [CrossRef]

**12. **M. Levoy and P. Hanrahan, “Light field rending,” in Proceedings of ACM SIGGRAPH, (1996), pp. 31–42.

**13. **A. Orth and K. B. Crozier, “Light field moment imaging,” Opt. Lett. **38**(15), 2666–2668 (2013). [CrossRef] [PubMed]

**14. **C. Hahne, A. Aggoun, S. Haxha, V. Velisavljevic, and J. C. J. Fernández, “Light field geometry of a standard plenoptic camera,” Opt. Express **22**(22), 26659–26673 (2014). [CrossRef] [PubMed]

**15. **E. Y. Lam, “Computational photography with plenoptic camera and light field capture: tutorial,” J. Opt. Soc. Am. A **32**(11), 2021–2032 (2015). [CrossRef] [PubMed]

**16. **J. Liu, T. Xu, W. Yue, J. Sun, and G. Situ, “Light-field moment microscopy with noise reduction,” Opt. Express **23**(22), 29154–29162 (2015). [CrossRef] [PubMed]

**17. **X. Lin, J. Wu, G. Zheng, and Q. Dai, “Camera array based light field microscopy,” Biomed. Opt. Express **6**(9), 3179–3189 (2015). [CrossRef] [PubMed]

**18. **K. Atanassov, S. Goma, V. Ramachandra, and T. Georgiev, “Content-based depth estimation in focused plenoptic camera,” Proc. SPIE **7864**, 78640G (2011). [CrossRef]

**19. **T. E. Bishop and P. Favaro, “The light field camera: extended depth of field, aliasing, and superresolution,” IEEE Trans. Pattern Anal. Mach. Intell. **34**(5), 972–986 (2012). [CrossRef] [PubMed]

**20. **S. Wanner and B. Goldluecke, “Globally consistent depth labeling of 4D light fields,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 41–48. [CrossRef]

**21. **C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and M. Gross, “Scene Reconstruction from high spatio-angular resolution light fields,” ACM Trans. Graph. **32**(4), 73 (2013). [CrossRef]

**22. **C. Frese and I. Gheta, “Robust depth estimation by fusion of stereo and focus series acquired with a camera array,” in Proceedings of IEEE International Conference on Multisensor fusion and Intergration for Intelligent Systems (IEEE, 2006), pp. 243–248. [CrossRef]

**23. **M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi, “Depth from combining defocus and correspondence using light-field cameras,” in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2013), pp. 673–680. [CrossRef]

**24. **R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanford Technical Report CTSR (2005), pp. 1–11.

**25. **V. Srinivasan, H. C. Liu, and M. Halioua, “Automated phase-measuring profilometry of 3-D diffuse objects,” Appl. Opt. **23**(18), 3105–3108 (1984). [CrossRef] [PubMed]

**26. **X. Peng, Z. Yang, and H. Niu, “Multi-resolution reconstruction of 3-D image with modified temporal unwrapping algorithm,” Opt. Commun. **224**(1–3), 35–44 (2003). [CrossRef]

**27. **D. G. Dansereau, O. Pizarro, and S. B. Williams, “Decoding, calibration and rectification for lenselet-based plenoptic cameras,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (IEEE, 2014), pp. 1027–1034.