## Abstract

Typical single-pixel imaging techniques inherently consume a large number of measurements to reconstruct a high-quality and high-resolution image. Three-dimensional (3-D) single-pixel imaging with both high sampling efficiency and high depth accuracy remains a challenge. We implement fringe projection virtually by exploiting Helmholtz reciprocity. Depth information is modulated into a deformed fringe pattern whose Fourier spectrum is sampled by using sinusoidal intensity pattern illumination and single-pixel detection. The fringe pattern has a highly focused first-order component in its Fourier spectrum, which allows us to efficiently acquire the depth information from measurements far fewer than illumination pattern pixels. The 3-D information is retrieved through Fourier analysis. We experimentally obtained a 3-D reconstruction of a complex object with $599\times 599$ effective pixels, achieving a measurement-to-pixel ratio of 5.78%. The depth accuracy is evaluated at sub-millimetric level by using a test object.

© 2016 Optical Society of America

Single-pixel imaging techniques allow spatial-resolved images to be reconstructed with the use of a non-spatial resolved detector (a photodiode, for example). Single-pixel imaging techniques provide an approach for imaging over wavelengths where silicon pixelated detectors are blind. Hence, single-pixel imaging techniques have attracted considerable attention during recent years [1–13].

Two-dimensional (2-D) imaging commonly refers to a process of intensity information acquisition. To acquire the intensity information from an object, a single-pixel detector is used to perform temporal measurements point-by-point (raster) scanning over the object or measuring the inner products between a sequence of illumination patterns and the object. The measurements required are as many as the pixels in illumination patterns. If differential measurement is employed for signal-to-noise (SNR) enhancement, the number of measurements would be doubled. Thus, the inherent limitation of single-pixel imaging techniques turns out to be that high-resolution and high-quality reconstruction requires a large number of measurements, compared to the number of effective image pixels. Here, we define that the number of effective image pixels means the spatial resolution of the reconstructed image, determined by the resolution of illumination patterns. We use the ratio of temporal measurements to effective pixels number, termed measurement-to-pixel ratio (MPR), to quantify the sampling efficiency of single-pixel imaging techniques. There are two typical solutions to reduce the MPR. The one is to use compressive sampling [4] which attempts to recover more effective image pixels from fewer measurements by a computational means with the expense of computational times. The other is to use multiplexing techniques, such as polarization multiplexing [8] and wavelength multiplexing [9], which performs sampling in multiple channels simultaneously with the expense of additional devices.

Three-dimensional (3-D) imaging commonly refers to a process of depth information (or height distribution) acquisition. In the context of 3-D imaging, achievable depth accuracy is an important factor to quantify the quality of 3-D reconstruction. Depth accuracy relies on the scheme of depth information modulation and demodulation. Howland *et al.* [10] and Sun *et al.* [11] reported 3-D single-pixel imaging systems based on the scheme of time-of-flight. Yu *et al.* [12] and Salvador-Balaguer *et al.* [13] proposed approaches based on the scheme of binocular vision. Sun *et al.* [5] presented an approach based on the shape-from-shading scheme. They used four single-pixel detectors to reconstruct four 2-D images at the same time. They reconstructed each high-quality 2-D image at the expense of ${10}^{6}$ measurements. In short, it remains a challenge to achieve 3-D single-pixel imaging with high-depth accuracy from a low MPR.

Here, we demonstrate a single-pixel imaging technique that can well address the challenge by adapting the conventional cross-optical axis fringe projection geometry into a single-pixel imaging scheme and exploiting Helmholtz reciprocity [14]. Our experimental configuration is shown in Fig. 1(a). It consisted of an illumination party, a detection party, and a target object. The illumination party and the detection party were in a cross-optical axis geometry. The normal of the reference plane was parallel to the optical axis of the illumination party. In the illumination party, a commercial digital projector (Toshiba Tp-T90) projected a sequence of 2-D sinusoidal intensity patterns onto the target object surface. The projector operated in the 24-bit mode and switched patterns every 0.2 s. In the detection party, a photodiode (HAMAMATSU S1227-1010BR) was used as a single-pixel detector. It was driven by an amplifier circuit and collected the intensity of the reflected light field through a grating. The grating ($3\text{\hspace{0.17em}}\mathrm{line}\xb7{\mathrm{mm}}^{-1}$) was inserted in the optical path of the detection party, between the object and the detector. The reflected light field was modulated by the grating before falling onto the single-pixel detector. The lines of the grating were normal to the plane of the figure. Lens 1 with focal length ${f}_{1}$ formed an image of the object onto the grating plane. The parameters ${L}_{1}$, ${L}_{2}$, and ${f}_{1}$ are subject to the thin-lens equation. Lens 2 with focal length ${f}_{2}$ was a collecting lens with which the single-pixel detector could only detect the light through the grating. The responses of the detector were collected by the data acquisition panel (National Instruments USB-6343). The target object was placed on the reference plane. The optical axes of both parties intersected on the reference plane at an angle of $\theta =17\xb0$.

The experimental configuration was reconfigured from the configuration shown in Fig. 1(b). The latter is a typical setup used in conventional fringe projection profilometry [15–17]. The reconfiguration is subject to the Helmholtz reciprocity. The non-spatial resolved source (i.e., the bulb) was replaced by a non-spatial resolved detector (i.e., the photodiode). The spatial-resolved detector (i.e., the pixilated camera) was replaced by a spatial-resolved source (i.e., the projector). As such, we would be able to reconstruct an image by using a single-pixel detector with the experimental configuration. More importantly, if we capture an image with the reciprocal configuration by using the pixelated camera, the reconstructed image and the captured image would be exactly the same.

As our previous work demonstrates, the object image reconstruction can be done by using 2-D sinusoidal intensity patterns for illumination and a single-pixel detector for collecting the intensity of the reflected light field [6]. Each sinusoidal intensity pattern $P$ is an equivalent Fourier basis pattern. Therefore, the inner product between the pattern and the object image is an equivalent real Fourier coefficient. Each sinusoidal intensity pattern is characterized by its spatial frequency pair $({f}_{x},{f}_{y})$ and its initial phase $\varphi $: ${P}_{\varphi}(x,y)=a+b\xb7\mathrm{cos}(2\pi {f}_{x}x+2\pi {f}_{y}y+\varphi )$, where $(x,y)$ is the 2-D Cartesian coordinate on the reference plane, $a$ is the average intensity of the pattern, and $b$ the contrast. The intensity of the reflected light field is the inner product between the object image $R$ and the illumination pattern $P$. To illuminate the object with four patterns with the same spatial frequency $({f}_{x},{f}_{y})$ at four different phase shifts ($\varphi =0$, $\pi /2$, $\pi $, and $3\pi /2\text{\hspace{0.17em}}\mathrm{rad}$), each complex Fourier coefficient $\tilde{I}$ is obtained from the four corresponding responses by means of differential measurement [6], $\tilde{I}=({D}_{0}-{D}_{\pi})+j\xb7({D}_{\pi /2}-{D}_{3\pi /2})$, where $j$ denotes the imaginary unit, ${D}_{\varphi}$ is the detector’s response corresponding to illumination pattern ${P}_{\varphi}$. The object image $I$ is obtained by applying a 2-D inverse discrete Fourier transform to $\tilde{I}$.

In our experiment, the resolution of the illumination patterns was $599\times 599$ pixels and, therefore, the Fourier spectrum to be acquired was represented by (or discretized into) $599\times 599$ complex Fourier coefficients. The discrete Fourier spectrum shown in Fig. 2(a) was fully sampled from 717,602 ($599\times 599\times 2$) measurements. This number is twice that of the effective image pixels. Such a differential measurement strategy has proven beneficial for the improvement of SNR [6]. The reconstructed image is shown in Fig. 2(b). The reconstructed image has $599\times 599$ effective pixels. An apparent fringe pattern can be observed on the reconstructed image. The fringe pattern is deformed according to the height distribution of the target object surface. It is subject to the reciprocity that the target object on the reconstructed image is viewed from the perspective of the digital projector, and the shading profile is set by location of the detection party. Therefore, the fringe pattern is essentially the image of the grating “projected” onto the object surface through Lens 1. This fringe pattern cannot be observed in reality with our experimental configuration, but is exactly the same as the real fringe pattern which can be observed in the reciprocal configuration. Therefore, we term such a strategy “virtual fringe projection.”

The deformed fringe pattern intensity distribution is expressed as $I(x,y)=A+B(x,y)\mathrm{cos}[2\pi {f}_{0}y+\mathrm{\Delta}\phi (x,y)]$, where $I$ denotes the intensity distribution of the fringe pattern, $A$ is the mean intensity, $B$ is the contrast which relies on the object reflectance $R$, ${f}_{0}$ is the carrier frequency determined by both the utilized grating and the magnification factor of the lens system in the detection party, and $\mathrm{\Delta}\phi $ the modulated phase. Importantly, the fringe phase $\mathrm{\Delta}\phi (x,y)$ varies with the object surface height distribution $h(x,y)$ according to the triangular principle [15,16], which allows the retrieval of depth information through fringe phase estimation. With the virtual fringe projection strategy, we modulate the depth information into a fringe phase. The fringe phase is represented in the form of intensity distribution of a 2-D image. The image can be reconstructed by using a single-pixel detector, and the fringe phase can be estimated by employing algorithms, such as the Fourier transform method [15] and the wavelet transform ridge method [17]. Thus, the depth information can be ultimately retrieved via phase-to-height conversion.

The reconstructed 2-D image shown in Fig. 2(b) is clear enough for the succeeding 3-D reconstruction, but it consumes too many measurements ($\mathrm{MPR}=200\%$). Fortunately, the sparsity of the depth information in the Fourier space allows us to reduce the number of measurements. As Fig. 2(a) shows, the deformed fringe pattern gives a highly focused representation in the Fourier space. Most of the energy concentrates on three components, namely, the zero-order component ${\tilde{I}}_{0}$, the first-order component ${\tilde{I}}_{+1}$, and the conjugation of the first-order component ${\tilde{I}}_{-1}$. If we preserve the zero-order component only in the spectrum [Fig. 2(c)], the inverse transform [Fig. 2(d)] would be the reflectance pattern, with the deformed fringe pattern muted. Similarly, if we preserve the first-order component only in the spectrum [Fig. 2(e)], the inverse transform [Fig. 2(f)] would be the deformed fringe pattern, with the reflectance pattern muted. It demonstrates that the zero-order component is mainly contributed by the reflectance pattern, while the first-order component by the deformed fringe pattern. With the fact that the depth information is highly focused at the first-order component in the Fourier space, we propose to narrowly sample the Fourier spectrum to acquire the desired first-order component with a reduced number of measurements.

We experimentally demonstrate the proposed technique with the following four stages. The first stage was to determine the carrier frequency ${f}_{0}$, the center of the first-order component. We placed a flat plane on the reference plane, fully sampled the Fourier spectrum, and found the carrier frequency ${f}_{0}=0.131\text{\hspace{0.17em}}{\mathrm{pixel}}^{-1}$ by looking for the spatial frequency with the maximum modulus. Such a prior allows us to know the location of the first-order component in the Fourier space and to acquire it by selecting patterns whose spatial frequency is around the carrier frequency for illumination. As the carrier frequency is fixed for a given experimental configuration, such a process needs to be performed once only.

The second stage was to perform a calibration to obtain the conversion factor between phase and height. For our experimental configuration, the modulated phase is proportional to the object surface height. A white rubber triangular prism shown in Fig. 3(a) was used as a standard object for the calibration. The conversion factor obtained was $1.1407\text{\hspace{0.17em}}\mathrm{rad}\xb7{\mathrm{mm}}^{-1}$. The calibration process also needs to be performed once only for a given configuration.

The third stage was to quantify the achievable depth accuracy of our technique by using a test object. We used the wooden hemisphere shown in Fig. 3(a) as the test object. The hemisphere is with a radius of 25 mm. We illuminated the test object by using a sequence of $599\times 599$-pixel sinusoidal intensity patterns. As the object’s surface height distribution is rather smooth and without rapid variations, the resultant first-order component is highly focused. We sampled the first-order component from 13,316 measurements ($\mathrm{MPR}=1\%$) around the carrier frequency along a round spiral path. The reconstructed 2-D deformed fringe image is shown in Fig. 3(b). We used the Fourier transform method [15] for fringe phase estimation. The 2-D phase map shown in Fig. 3(c) is wrapped into $[-\pi ,\pi ]$. The final 3-D reconstruction [Fig. 4(d)] is derived by unwrapping the phase and converting the phase to height. Note that the height is relative, with respect to the reference plane. We take the row that crosses the center of the hemisphere in comparison with the true values [Fig. 4(e)]. The height error [Fig. 4(f)] is less than 1 mm, except at the side locations, because the surface normals are so close to being perpendicular to the line of sight at those locations that the reflectance is rather weak. Therefore, we state that the proposed technique can achieve sub-millimetric depth accuracy.

The fourth stage was to demonstrate the efficiency of the proposed technique by using a target object with a complex surface. The target object was a white plaster head shown in Fig. 4(a), where the illumination area has been marked by a red box. We did the same procedure as the third stage to acquire the first-order component [Fig. 4(b)]. The component was narrowly sampled from 20,740 measurements ($\mathrm{MPR}=5.78\%$). The estimated phase map is shown in Fig. 4(c) with the final 3-D reconstruction shown in Fig. 4(d). No post-processing has been applied to the reconstruction. The number of measurements was far less than the number of effective image pixels. Detailed facial textures, especially the eyes and the lips, are well presented. A few abrupt changes off the face are unwrapping failures probably caused by shadow or defocusing. That the reconstruction appears slightly jagged is probably caused by the gamma distortion (nonlinear intensity) of the utilized commercial digital projector, which can be reduced by gamma correction [18].

The lateral accuracy of the proposed technique is fundamentally determined by the resolution of the illumination patterns and the depth accuracy by the carrier frequency. Higher carrier frequency equivalent to shorter wavelength, therefore, potentially gives higher depth accuracy. If the carrier frequency is too low, the zero-order and the first-order components would overlap. The carrier frequency is mainly determined by the spatial frequency of the grating and the magnification of the lensing system (i.e., Lens 1 in Fig. 1). In addition, advanced fringe phase estimation algorithms [17] and calibration strategies [19,20] might improve the accuracy.

The number of measurements is determined by the effective image pixels and the geometric complexity of the object surface (or how sparse the depth information is in Fourier space). Complex structures (e.g., sharp edges) on the object surface would broaden the first-order component, resulting in more measurements being required. In principle, the number of projected patterns would be optimal when the first-order component is marginally sampled. When the first-order component is undersampled, more measurements can improve the reconstruction quality (see Visualization 1). When the first-order component is oversampled, the quality can no longer improve with more measurements, but will probably degenerate if some undesirable components (e.g., the zero-order component) are sampled.

We acknowledge that the illumination rate in our experiment is too low for fast 3-D imaging, but it might be solved by using high-speed programmable digital mirror devices (DMD). In addition, the sinusoidal intensity illumination patterns can be flexibly generated by using laser interference or defocusing stripe patterns.

In summary, we propose a single-pixel imaging technique that allows high-quality 3-D reconstruction from far fewer measurements than effective image pixels. The measurement reduction is achieved by utilizing the sparsity of depth information in the Fourier space. We experimentally demonstrate that the proposed technique can achieve sub-millimetric depth accuracy and a low MPR for a complex 3-D object. The technique is potentially suitable for fast and accurate 3-D measurement over broad wavebands.

## Funding

National Natural Science Foundation of China (NSFC) (61475064).

## Acknowledgment

The authors thank Xiao Ma for making Visualization 1, Shiping Li for preparation of the experimental equipment, and Qinqiu Fang for linguistic assistance.

## REFERENCES

**1. **T. Pittman, Phys. Rev. A **52**, R3429 (1995). [CrossRef]

**2. **S. Bennink, S. Bentley, and R. Boyd, Phys. Rev. Lett. **89**, 113601 (2002). [CrossRef]

**3. **J. Shapiro, Phys. Rev. A **78**, 061802 (2008). [CrossRef]

**4. **M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk, IEEE Signal Process. Mag. **25**(2), 83 (2008). [CrossRef]

**5. **B. Sun, M. Edgar, R. Bowman, L. Vittert, S. Welsh, A. Bowman, and M. Padgett, Science **340**, 844 (2013). [CrossRef]

**6. **Z. Zhang, X. Ma, and J. Zhong, Nat. Commun. **6**, 6225 (2015). [CrossRef]

**7. **P. Sen, B. Chen, G. Garg, S. Marschner, M. Horowitz, M. Levoy, and H. Lensch, ACM Trans. Graph. **24**, 745 (2005). [CrossRef]

**8. **S. Welsh, M. P. Edgar, R. Bowman, B. Sun, and M. J. Padgett, J. Opt. **17**, 025705 (2015). [CrossRef]

**9. **M. P. Edgar, G. M. Gibson, R. Bowman, B. Sun, N. Radwell, K. J. Mitchell, S. Welsh, and M. J. Padgett, Sci. Rep. **5**, 10669 (2015). [CrossRef]

**10. **G. Howland, D. Lum, M. Ware, and J. Howell, Opt. Express **21**, 23822 (2013). [CrossRef]

**11. **M. Sun, M. Edgar, G. Gibson, B. Sun, N. Radwell, R. Lamb, and M. Padgett, “Single-pixel 3D imaging with time-based depth resolution,” (2016), http://arxiv.org/abs/1603.00726.

**12. **W. Yu, X. Yao, X. Liu, L. Li, and G. Zhai, Appl. Opt. **54**, 363 (2015). [CrossRef]

**13. **E. Salvador-Balaguer, P. Clemente, E. Tajauerce, F. Pla, and J. Lancis, J. Display Technol. **12**, 417 (2016).

**14. **M. Born and E. Wolf, in *Principles of Optics* (Pergamon, 1959).

**15. **M. Takeda and K. Mutoh, Appl. Opt. **22**, 3977 (1983). [CrossRef]

**16. **J. Geng, Adv. Opt. Photon. **3**, 128 (2011). [CrossRef]

**17. **J. Zhong and J. Weng, Opt. Lett. **30**, 2560 (2005). [CrossRef]

**18. **H. Guo, H. He, and M. Chen, Appl. Opt. **43**, 2906 (2004). [CrossRef]

**19. **H. Du and Z. Wang, Opt. Lett. **32**, 2438 (2007). [CrossRef]

**20. **M. Vo, Z. Wang, B. Pan, and T. Pan, Opt. Express **20**, 16926 (2012). [CrossRef]