## Abstract

A system approach to acquire a three-dimensional object distribution is presented using a compact and cost efficient camera system with an engineered point spread function. The corresponding monocular setup incorporates a phase-only computer-generated hologram in combination with a conventional imaging objective in order to optically encode the axial information within a single two-dimensional image. The object’s depth map is calculated using a novel approach based on the power cepstrum of the image. The in-plane RGB image information is restored with an extended depth of focus by applying an adapted Wiener filter. The presented approach is tested experimentally by estimating the three-dimensional distribution of an extended passively illuminated scene.

© 2016 Optical Society of America

## 1. Introduction

The ability to acquire depth information in a single shot in addition to the conventional two-dimensional image of an object scene is of increased interest in modern applications for consumer electronics, bio-medical imaging, machine vision and automotive engineering. Depending on the particular application, optical system solutions rely on active or passive illumination. The former approach incorporates a tailored, artificial light source in addition to an image acquisition module to extract the depth information of an object. Existing technologies include structured light [1], time-of-flight (Lidar) [2] as well as interferometry [3]. Passive illumination methods purely rely on ambient light and thus generally benefit from reduced energy consumption and system complexity, as well as robustness with respect to stray light. Most common solutions are based on multi-aperture approaches, i.e. stereo setups. The major disadvantage of these setups is the necessity for multiple optical systems and image sensors that result in increased costs, higher complexity and the need for an elaborate calibration [4]. In contrast, conventional single aperture approaches based on depth from focus (DFF) or depth from defocus (DFD) extract depth information by analyzing the axially dependent image blur or by searching for the in-focus state of the imaging system, respectively [5]. These configurations provide less complexity, but commonly suffer from low axial precision or require multiple acquisitions [6]. An approach that enables combining the advantages of monocular and stereo systems is based on the integration of a diffraction grating in front of a single imaging configuration [7, 8]. But the utilization of higher diffraction orders in order to generate a stereo pair results in a significant spectral dependence of the image disparity. Accordingly, the method requires a quasi-monochromatic illumination or prior knowledge on the object spectrum in order to retrieve well-defined depth information. In the past decade, plenoptic cameras have been of increased interest due to their rather simple, cost efficient setup. However, the inherent loss in lateral object resolution due to the optical demagnification by the microlens array represents a severe drawback [9, 10].

An alternative method for acquiring three-dimensional object information utilizes temporally or spatially engineered point spread functions (PSFs). Temporal PSF engineering techniques exploit a tailored focus sweep to generate a depth dependent PSF distribution with an extended depth of focus, which requires complex and costly opto-mechanical components such as piezo-electric actuators or deformable lenses [11, 12]. Various spatially tailored PSFs have been proposed in order to enhance the depth discrimination capabilities of depth-from-defocus systems. In [13], adapted aperture masks are utilized to extract depth information, but severely reduce the light efficiency of an optical system. In order to overcome this constraint, complex segmented optical elements within the pupil can be employed to achieve an extended depth of focus, but only provide low depth discrimination [14]. Moreover, the respective PSF engineering approaches commonly require extensive computational effort due to the incorporated iterative error minimization methods [12–14 ].

A novel PSF engineering approach has been demonstrated by Piestun and coworkers, which utilizes a rotating double helix PSF [15, 16]. The corresponding system has been applied successfully in the area of microscopy, demonstrating an extended depth of focus and a high depth resolution for 3D single-molecule localization [17, 18]. Moreover, the general feasibility for broadband, passive cameras has been verified [19]. However, the applicability to commercial camera systems, e.g. in the area of consumer electronics or machine vision, is strongly limited. The necessity of multiple image acquisitions in order to retrieve the axial and lateral object information represents a major drawback of the system and restricts its application to (near) static object scenes. Additional minor drawbacks of these systems include the complex and costly setup, as well as the low light efficiency due to the incorporated spatial light modulator, which requires polarization filtering. A similar system based on four rotating PSF peaks has been developed by Niihara et. al [20], which enables single shot depth acquisition. But in addition to the costly numerical reconstruction approach, the respective pupil elements are not optimized for an extended rotation range, which significantly limits the retrievable depth range.

Here, we present a closed system approach based on the combination of a compact cost efficient optical setup and customized image processing that enables obtaining three-dimensional, broadband (RGB) object information from a single image. In particular, we show how the image’s power cepstrum can be used to retrieve the axially dependent PSF parameters, which encode the object’s depth information, with low computational effort. Based on the obtained parameters, the lateral scene can be reconstructed by a tailored Wiener filter, which, in contrast to the filter proposed in [16, 19], does not require an additional reference frame.

Initially, the concept of the proposed image acquisition approach is presented and a simplified imaging model to describe the hybrid optical system is established. The work flow of the applied image processing steps, including the depth map retrieval and the object reconstruction, is subsequently described. In the final section, experimental results of a developed demonstration system are presented, which verify the applied system approach.

## 2. System approach

#### 2.1. Imaging setup

The general image acquisition setup is schematically shown in Fig. 1. A passively illuminated, three-dimensional object is imaged on a conventional image sensor. The hybrid optical system is based on a conventional camera objective in combination with a computer-generated hologram (CGH). The thin holographic element is located directly at the objective’s aperture stop position in order to ensure a field independent transmission. The particular design of the CGH is based on the approach presented in [15]. An initial estimate is obtained analytically based on a tailored superposition of Gauss-Laguerre modes with respective indices (2,2), (4,6), (6,10), (8,14) and (10, 18). Subsequently, a phase-only element is retrieved by further iterative optimization. The element modifies the phase of the transmitted light, which results in characteristic spiral exit pupil phase distributions that are exemplarily shown in Fig. 2(a) for an in- and out-of-focus object point. The corresponding double-helix shaped PSF distribution features a depth dependent rotation with an extended depth of focus. When an extended object distribution is imaged, the depth dependence is encoded within the recorded two-dimensional image. By decoding this raw image, both the depth map and the lateral object information can be extracted.

In contrast to the spatial light modulator used in [19], the presented setup incorporates a thin glass substrate with a structured surface profile, which provides a more compact and robust system solution. The glass element can be used with a broader temperature range and without the need for a polarization filter, which decreases the light efficiency. The profile is generated in two steps utilizing cost efficient, state-of-the-art wafer level technology that enables the processing of multiple-elements in a single iteration. Initially, a master sample is fabricated inside a photo resist layer using a novel grayscale, LED writing lithography system [21]. In particular, the utilized system provides a high accuracy, characterized by a lateral resolution below 1 *μm*, a low wave front error of manufactured CGHs, and a highly dynamic dosage control. In comparison to the system applied in [22], the increased lateral processing area (11×) and the improved positioning accuracy (2×) enables highly parallelized, more cost efficient manufacturing of the CGH master samples. Using reactive-ion-etching or mask imprinting technology, the obtained profile is subsequently transferred onto the targeted substrate, which is diced in order to obtain the final elements. Ultimately, they are directly implemented inside a commercial camera objective. Note that the optical parameters (e.g. focal length, F-number) can be tailored to particular application needs. The optical setup is similar to the coded aperture configuration proposed in [13], which incorporates an adapted aperture mask. However, the systems light efficiency is significantly increased, due to the utilization of a phase-only element. In addition, the more confined double-helix PSF distribution inherently provides a higher lateral resolution.

#### 2.2. Image acquisition

The proposed setup is modeled as an incoherent imaging system, described by [19]

**is the discretely sampled coded image distribution,**

*i***(**

*o**z*) is the object’s discrete surface brightness,

**(**

*h**z*) is the engineered point spread function and

**describes an additive noise term. Note that * denotes the discrete lateral convolution integral with the laterally shift-invariant and axially shift-variant PSF. In the following, the indices (**

*n**k*,

*l*) denote the pixel indexing within the discretely sampled, two-dimensional distributions. According to the design of the CGH, the axial dependence of the PSF can be described by a combination of a rotation and a lateral scaling of the double peak separation. At the same time, the engineered PSF inherently extends the system’s depth of focus by minimizing the spreading of the individual peaks within the axial range of interest. If we assume that the two peaks of the double-helix PSF are well confined with negligible side-lobes over the entire axial range of interest, the PSF can be approximated by where

*h*_{0}represents the nominal, shift-invariant distribution of a single PSF peak. The Delta-distributions

*δ*

^{±}(

*z*) can be expressed by

*p*(

*z*) and an azimuth orientation angle

*θ*(

*z*) that linearly depend on

*z*. Accordingly, Eq. (1) can be rewritten as

*o*_{0}, which are shifted according to their axial position. Note that

*o*_{0}describes the blurred object distribution, analogue to a conventional imaging system with an extended depth of focus.

#### 2.3. Image processing

The work flow of the proposed image processing procedure, based on the previously described image acquisition approach, is schematically shown in Fig. 3. First, the depth map of the encoded image is retrieved as described in the following section. In the second step, the object distribution is reconstructed by applying the decoding approach explained in the subsequent section.

### 2.3.1. Depth map retrieval

The key to retrieving the depth distribution of the object from the raw image ** i** is to determine the lateral distribution of the rotation angle

*θ*of the twin images. This is done by analyzing the object features in a

_{kl}*M*×

*N*pixels large neighborhood of each image location (

*k,l*), which is valid under the assumption that the neighborhood corresponds to a part of the object distribution located at the same distance

*z*. Thus, a sliding window function

_{kl}**, which results in the subimage distribution**

*i**defined by*

**I**_{kl}In order to reduce the numerical effort of the depth map retrieval, the subimage distribution may be sampled at a reduced rate given by the window size divided by a sampling factor *q*, which is typically on the order of 2 to 4. According to Eq. (4), each windowed subimage distribution * I_{kl}* is given by the convolution

*and*

**O**_{kl}*denote the windowed subobject and noise distributions, respectively. The windowed nominal distribution of a single PSF peak is described by*

**N**_{kl}

*H*_{0}. In our approach, we apply the cepstrum concept to extract the corresponding, discretely sampled PSF parameters

*θ*and

_{kl}*p*. The concept originates from pitch detection in human speech [25, 26] and can also be utilized to detect motion blur or stereo correspondence in imaging applications [23, 24], which in fact represent similar image processing problems. The power cepstrum distribution of

_{kl}*is defined by*

**I**_{kl}

*I*_{0,kl}denotes the subimage without noise. The obtained cepstrum is thus a superposition of the cepstrum of the encoded object distribution and a second contribution that depends on the spatial frequency content of the noise as well as the encoded object. The key property of the cepstrum calculation is that it maps a convolution into an addition. Thus, the first term in Eq. (11) can be written as

*θ*and a separation 2

_{kl}*p*and thus directly provide the engineered PSF parameters.

_{kl}The main challenge of the proposed approach is to accurately identify these impulse peaks within each subimages cepstrum * C_{kl}*. The first main limitation arises from the spatial frequency content of the object distribution. Fig. 2(b) shows the Modulation Transfer Function (MTF) of the proposed hybrid imaging system for two representative object distances

*z*

_{1}and

*z*

_{2}. It can be seen that the introduced phase element leads to a modulation of the conventional MTF shown in Fig. 2(c), which is characterized by a period

*p*and an orientation

*θ*. This modulation ultimately leads to the set of impulses within the cepstrum domain. Note that in comparison to performing an autocorrelation, the cepstrum analysis provides increased contrast of the impulse peak in case of a weak modulation. In fact, the separation between the source signal (blurred object distribution) and the carrier (double Delta-distribution) is improved due to the logarithmic enhancement of the modulation in the spatial frequency domain. However, the modulation turns invisible in case a of lack of sufficiently small spatial object features and the set of impulses in the cepstrum domain vanishes completely. Accordingly, the object’s spatial frequency spectrum must span an area beyond the first modulation minimum at 1/(

*V*·

*p*), considering the magnification

*V*of the optical system. In other words, the object scene must contain spatial features that are comparable to or smaller than the double helix PSF extension

*V*·

*p*in object space. In addition, the peak identification can be ambiguous in case of periodic object features, which lead to an equivalent modulation of the image spectrum. The corresponding, additional peaks in the cepstrum domain may corrupt the peak identification and result in false depth information. The second major influence is given by the noise level. If the object’s spatial frequency content in the range of interest is insufficient, peaks that originate from the noise contributions (second term in Eq. (11)) are dominating the cepstrum and the impulse identification becomes unreliable [24]. Accordingly, the window size needs to be increased at the expense of lateral depth resolution in order to include more object features. It should be noted that both limitations are (in a slightly modified manner) inherent to all passive optical systems.

In fact, the size of the considered neighborhood is crucial for a robust depth estimation of each object point. But making a reasoned choice for *M* and *N* is difficult because the noise in the cepstrum depends on the spatial frequency content in each window according to the second term in Eq. (11). The window size and the corresponding degree of noise averaging needed must then be set according to the particular object that is imaged.

In order to increase the reliability of the peak identification, a two-dimensional Hann window is initially applied to the individual subimages

*θ*and

*p*is used. First, it is assumed that the axial extension of the object distribution is limited to a total PSF rotation range of 180 degree, which ensures a unique relationship between

*θ*and

*z*. Under this condition, the detection range of the impulses in the cepstrum domain can be truncated according to

*p*

_{min},

*p*

_{max}). Both should typically be in the order of 0.8 to 0.9 and 1.1 to 1.2 times the peak separation at the nominal (in-focus) object distance, respectively, which can be extracted from the optical system design. Second, the truncated cepstrum

*is convolved with a Gaussian kernel of size*

**C′**_{kl}*s*to mitigate the impact of noise on the peak detection. The kernel width

*s*should be selected in the range of 1–2 times the diffraction limited PSF peak size, which determines the minimum size of features in the cepstrum domain that do not originate from noise. In practice,

*s*as well as (

*p*

_{min},

*p*

_{max}) may be obtained by experimentally analyzing the PSF peak width

*σ*and separation

*p*(

*z*), respectively.

Finally, the pixel location (*m*
_{max}, *n*
_{max}) of the maximum in each convolved cepstrum * C″_{kl}* is located and the cepstrum values in an

*s*pixel wide neighborhood are extracted. The weighted position of the peak within this subset of

*can be calculated using a standard center of gravity detection algorithm and the rotation parameters*

**C″**_{kl}*θ*and

_{kl}*p*are extracted for each subimage. In fact, the identification of a single peak in the cepstrum is sufficient, due to the symmetry of

_{kl}*. The angle*

**C**_{kl}*θ*is finally related to the object distance

_{kl}*z*based on a look-up table of the calibrated relationship

_{kl}*z*(

*θ*). We emphasize that the described peak identification approach focuses on a high computational efficiency. More advanced methods, e.g. based on maximum likelihood estimators, can provide a higher accuracy and robustness, but require computationally expensive iterative algorithms.

### 2.3.2. Object reconstruction

In order to reconstruct the original object information from a single acquisition, the twin images in ** i** need to be merged. This can be done by means of a deconvolution operation using the double helix PSF. The shape of the PSF can, however, be distorted in comparison to the original design due to geometrical and chromatic aberrations, as well as mechanical system tolerances. A direct deconvolution may thus result in severe artifacts depending on these shape deviations. In general, it is possible to determine the exact PSF distribution experimentally. However, this may require measuring the two-dimensional PSF shape within the entire three-dimensional region of interest due to the lateral and axial dependency of potential PSF distortions, i.e. in case of significant off-axis aberrations. In addition to the extensive calibration efforts, either a comprehensive look-up table or complex interpolation schemes based on analytic or numerical approximations need to be incorporated. Alternatively, blind deconvolution algorithms can be applied that are however numerically demanding due to the necessity for iterative optimization procedures.

In order to facilitate a fast and reliable image decoding, the proposed object reconstruction focuses on removing the twin image within ** i** and partially recovering sharp object features. We retrieve the windowed subobject distributions

*with a linear Wiener-type (deconvolution) filter. The Fourier transform of Eq. (7) can be expressed as*

**O′**_{mn}*and*

**Î′**_{kl}*denote the Fourier transformation of the Hann windowed distributions*

**Ô′**_{kl}*and*

**I**_{kl}*, respectively. The Fourier transform*

**O**_{kl}*of the delta-distributions is*

**D̂**_{kl}*θ*and

_{kl}*p*are already obtained during the depth map retrieval. The Fourier transform

_{kl}

*Ĥ*_{0}of a single PSF peak with neglected side-lobes is approximated by a Gaussian function

*σ̂*. The sharpened object spectrum

*is reconstructed by a Wiener filter*

**Ô′**_{kl}**SNR**

*is the signal-to-noise ratio of each subwindow. In addition to limiting the amplification of noise,*

_{kl}**SNR**

*is essential in order to compensate for zero values of*

_{kl}*within the denominator of Eq. (18). On the one hand,*

**D̂**_{kl}*removes the modulation of the spectrum, which eliminates the twin image in*

**D̂**_{kl}**. On the other hand,**

*i*

*Ĥ*_{0}recovers high spatial frequency contributions. Therefore, a respective Gaussian width

*σ̂*> max{

*N*,

*M*}/(2

*πσ*) should be selected according to the PSF peak width

*σ*in order to avoid ringing artifacts. Note that a proper removal of the twin images necessitates accurate estimation of the local PSF parameters

*θ*and

_{kl}*p*in the order of the pixel size of the image sensor. A false estimation, i.e. due to a high noise level or due to an oversized sliding window that spans over a significant depth range, can result in severe artifacts within the reconstructed object distribution. Contrarily, estimation errors based on lack of small object features in certain object regions only lead to minor reconstruction artifacts due to the absence of high spatial frequencies.

_{kl}Finally, an inverse Fourier transformation of Eq. (18) leads to the recovered distribution * O′_{kl}*. Adding these windowed subobjects according to

**. The Hann window, which is maintained within each subobject, leads to a smooth overlap of the individual**

*o**, which mitigates stitching artifacts within*

**O**_{kl}**even in case of a small sampling factor**

*o**q*.

It should be pointed out that the object reconstruction significantly benefits from the extended depth of focus of the hybrid optical system. As can be seen in Fig. 2(b), the presented out-of-focus MTF is generally increased (i.e. for spatial frequencies |*ξ*|, |*η*| > 0.1/(*λF*#)) in comparison to the conventional MTF in Fig. 2(c). Hence, it enables an improved reconstruction of these frequencies, which results in an enhanced image resolution for object areas that are significantly out-of-focus. In addition, we emphasize that the Fourier transform *ℱ*{* I′_{kl}*} =

*, which is required to determine*

**Î′**_{kl}*in Eq. (18), is already calculated during the prior cepstrum analysis. The total numerical effort for the depth estimation in combination with the object retrieval is thus mainly determined by only three Fourier transformations. Hence, it provides significantly reduced numerical costs in comparison to regularized iterative error minimization methods used in [12–14 , 20]. It facilitates a fast approach that performs up to 1–2 fps for a megapixel image using our current software implementation in MATLAB on a conventional desktop PC. The frame rate can be increased furthermore by employing state-of-the-art hardware and using a dedicated computation on a GPU, which can potentially allow for a real time implementation on the order of 10–20 fps.*

**Ô′**_{kl}## 3. Proof-of-principle experiment

#### 3.1. Setup implementation

A demonstration system according to the proposed imaging setup shown in Fig. 1 is realized in order to verify the presented approach. For demonstration purposes, the developed photo resist master, which is obtained in the first step of the CGH fabrication process, is directly utilized without the subsequent transfer of the surface profile onto the final substrate. The surface profile of the corresponding element, which is measured using a white light interferometer, is shown in Fig. 4(a). Note that its lateral extension is slightly oversized with respect to the aperture stop diameter of 10 *mm* (indicated by a white circle in Fig. 4(a)) in order to accommodate alignment tolerances. The maximum profile height of 885 ± 10 *nm* complies with the maximum required phase shift of 2*π*, considering a design wavelength of 550 *nm* and the photo resist refractive index of 1.62. The major difference in comparison to the phase element manufacturing in [22] relies on the applied exposure scheme. In contrast to a single shot exposure, the element shown in Fig. 4(a) is manufactured by optimized lateral stitching of multiple substructure exposures in order to achieve a more than 4 times increased diameter of 12 *mm*. The advanced fabrication thus enables versatile lateral scaling of the designed CGHs in order match the apertures of commercially available objective lenses. Additional minor advantages include an improved surface smoothness with minimum imperfections, which results in reduced straylight, as well as an enhanced height discretization of 10bit in comparison to 30 levels in [22]. The phase element is placed at the aperture location of a compact optical demonstration setup, which consists of an achromatic doublet pair. In particular, we utilize two conventional achromats from Thorlabs with a focal length of 100 *mm* (AC254-250-A) and 250 *mm* (AC254-100-A), respectively. The diffraction limited optical system is optimized for a nominal object distance of 1 *m* and features a focal length of 83 *mm* and an F-number of 8.4. The correction of axial color aberrations is essential in order to minimize the spectral dependence of the rotation angle *θ*, which limits the axial resolution. However, it should be pointed out that a tailored spectral dependence can also provide an additional degree of freedom that can potentially increase the reliability of the depth measurement. A 1/2.3-inch CMOS image sensor (Aptina MT9F002) with a total pixel count of 4384 × 3290 (14MP) and a standard Bayer pattern for RGB imaging is placed at the nominal image position. The pixel size of 1.4 *μm* × 1.4 *μm* with a Nyquist frequency of 357 *lp/mm* leads to a minor undersampling of the image considering the optical cut-off frequency of 216 *lp/mm* of the nominal system, but provides sufficiently high sampling of the engineered image distribution. The final system covers a lateral object field extension of 75 × 53 *mm*
^{2} at the nominal distance.

#### 3.2. Depth estimation

First, the relationship between the rotation angle *θ* and the object distance *z* is calibrated by successively imaging three LED point sources with peak irradiances at 465 *nm*, 540 *nm* and 625 *nm*, respectively. The noise is reduced by averaging over 10 image acquisitions. Note that the calibration can be limited to on-axis points due to the diffraction limited performance of the demonstration system over the entire three-dimensional field of interest. The corresponding PSF distribution, which is illustrated in the two insets of Fig. 4(b), clearly shows two distinct peaks (separated by approximately 20*μm*), which can be analyzed in order to obtain the axial dependence of *θ*(*z*) shown in Fig. 4(b). It can be seen that a linear relationship is maintained over a large range of approximately 170°. The effective depth range that can be utilized is 160 *mm*. Beyond this range, the rotation rate begins to decrease drastically and the distorted shape of the PSF prohibits reliable depth estimation. In comparison to conventional multi-aperture approaches that utilize multiple optical systems, no further calibration routines, such as the determination of the relative positions of the subsystems, need to be performed.

An extended, three-dimensional scene is imaged at the nominal object distance of 1 *m* using the calibrated system. The setup includes multiple objects located at different distances within the calibrated range between 960 *mm* and 1060 *mm*. The scene is illuminated by a conventional, broad-band halogen desk lamp without any spectral or polarization filtering. The left part of Fig. 5(a) shows the imaged nominal object scene, which is initially obtained without the CGH inside the optical system. The enlarged image sections on the right side of Fig. 5(a) exemplarily highlight two distinct object parts that are located at an in- and out-of-focus location. After the CGH is implemented into the system, the encoded image shown in Fig. 5(b) is obtained. As can be seen in the two enlarged image sections, the engineered PSF results in twin images of the captured features, which are laterally shifted in a direction according to the their axial position. The depth map of the object scene is obtained by applying the proposed cepstrum approach to the captured image. The sampling of the depth map is selected based on a compromise between maximizing the lateral resolution on the one hand and ensuring sufficient spatial object features to increase the signal to noise ratio for the peak identification in the cepstrum domain on the other hand. In particular, a windowing size of *M* = *N* = 256 and a sampling factor *q* = 4 are applied based on an empirical selection. After the determination of the angle distribution *θ _{kl}* and a subsequent evaluation of the corresponding distance

*z*using the linear calibration fit (Fig. 4(b)), the final depth map is calculated after applying a 3 × 3 pixel median filter [28] in order to reduce outliers. The resulting depth distribution shown in Fig. 6 covers the entire field of view over a depth range of approximately 100

_{kl}*mm*. It clearly exhibits distinct objects of the captured scene and provides a spatially resolved visualization of their axial position.

#### 3.3. Image decoding

Finally, the angle distribution *θ _{kl}*, obtained during the depth retrieval, is combined with the information on the peak separation

*p*to reconstruct the object distribution. The deconvolved image distribution, shown in Fig. 5(c), is obtained after applying the proposed filter function according to Eq. (18) using a width

_{kl}*σ̂*= 18, which effectively avoids ringing artifacts. For demonstration purposes, a simplified, constant signal-to-noise ratio of 33 is applied for all sub-windows, which provides a compromise between minimum noise amplification and an effective twin image removal. The image shows the uncoded RGB color information of the object and is only subject to minor reconstruction artifacts. A comparison between the decoded and the uncoded object distributions in Fig. 5 demonstrates the successful removal of the twin-image and an increased image contrast. A residual background, i.e. in the direction of the double-helix orientation, remains after the deconvolution due to the elongated side-lobes of the PSF, which are not accounted for in the approximated PSF in Eq. (2) and the corresponding filter in Eq. (18). These side lobes are a residue of the CGH design approach as well as fabrication tolerances, which can potentially be minimized further by optimizing the design, as well as the fabrication method. Alternatively, the experimental PSF can directly be used in the deconvolution approach, which necessitates a comprehensive three-dimensional characterization of the PSF distribution in order to minimize reconstruction artifacts. Furthermore, a comparison between the highlighted object parts of the nominal and the decoded image in Fig. 5 demonstrates the extended depth of focus property of the proposed hybrid system. Whereas the features of the out-of-focus part (highlighted in blue) of the nominal image are significantly blurred in comparison to the in-focus part (highlighted in red), the deconvolved image provides a comparable resolution throughout the entire axial range of interest. In fact, the out-of-focus part in Fig. 5(c) features an increased contrast in comparison to the same part in Fig. 5(a).

## 4. Conclusion

A system approach describing a passive optical setup combined with a tailored image processing concept is presented, which enables the acquisition of three-dimensional object information using a monocular camera system. The method is based on integrating a computer-generated hologram, fabricated on a thin glass substrate, into a conventional camera setup, which facilitates a compact, robust and cost efficient system with an extended depth of focus. Moreover, the optical setup does not require additional wavelength or polarization filters, which enables a light efficient image acquisition that maintains the RGB color information of the object. An efficient image processing approach has been developed that analyzes the cepstrum distribution of the image and incorporates a Wiener filter in order to provide a fast calculation of the axial and lateral object distribution based on a single image. Without the need for extensive iterative optimization procedures of common image deconvolution algorithms, the system potentially allows for three-dimensional video acquisition.

An experimental system has been implemented, demonstrating the capabilities of the proposed system approach. The depth map as well as the lateral (RGB) information of an extended scene has been obtained based on a single acquisition using a compact, light efficient optical system with an engineered point spread function.

In addition to the qualitative demonstration presented here, future work will include a quantitative assessment of the systems imaging performance. In particular, we aim to address scaling laws of the axial and lateral resolution limits, which potentially allows for a system optimization according to a particular application and enables a proper comparison to other three-dimensional imaging approaches, i.e. based on stereo or plenoptic configurations.

## Acknowledgments

The authors would like to thank Marko Stumpf for manufacturing the CGH and Lucas van Vliet for a critical reading of the manuscript. This work was performed in the frame of the Photonics Research Germany funding program by the German Federal Ministry of Education and Research under contract 13N13667.

## References and links

**1. **J. Salvi, J. Pagès, and J. Batlle, “Pattern codification strategies in structured light systems,” Pattern Recogn. **37**(4), 827–849 (2004). [CrossRef]

**2. **M. Amann, T. Bosch, M. Lescure, R. Myllyla, and M. Rioux, “Laser ranging: a critical review of usual techniques for distance measurement,” Opt. Eng. **40**(1), 10–19 (2000).

**3. **D. Huang, E. Swanson, and C. Lin, “Optical coherence tomography,” Science **254**(5035), 1178–1181 (1991). [CrossRef] [PubMed]

**4. **M. Z. Brown, D. Burschka, and G. D. Hager, “Advances in computational stereo,” IEEE Trans. Pattern Anal. Mach. Intell. **25**(8), 993–1008 (2003). [CrossRef]

**5. **Y. Schechner and N. Kiryati, “Depth from defocus vs. stereo: How different really are they?” Int. J. Comput. Vision **39**(2), 141–162 (2000). [CrossRef]

**6. **M. Subbarao and G. Surya, “Depth from defocus: a spatial domain approach,” Int. J. Comput. Vision **13**(3), 271–294 (1994). [CrossRef]

**7. **R. Horisaki and J. Tanida, “Multi-channel data acquisition using multiplexed imaging with spatial encoding,” Opt. Express **18**(22), 429–432 (2010). [CrossRef]

**8. **R. Horisaki and J. Tanida, “Preconditioning for multiplexed imaging with spatially coded PSFs,” Opt. Express **19**(13), 573–583 (2011). [CrossRef]

**9. **T. Georgiev, K. C. Zheng, B. Curless, D. Salesin, S. Nayar, and C. Intwala, “Spatio-angular resolution tradeoffs in integral photography,” in Proceedings of the Eurographics Symposium on Rendering (2006), pp. 263–272.

**10. **A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in *Proceedings of IEEE Conference on Computer Vision and Pattern Recognition* (IEEE, 2009), pp. 1–8.

**11. **D. Miau, O. Cossairt, and S. Nayar, “Focal sweep videography with deformable optics,” in *Proceedings of IEEE International Conference on Computational Photography* (IEEE, 2013), pp. 1–8. [CrossRef]

**12. **P. Llull, X. Yuan, L. Carin, and D. Brady, “Image translation for single-shot focal tomography,” Optica **2**, 822–825, (2015). [CrossRef]

**13. **A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Transactions on Graphics **26**(3), 70 (2007). [CrossRef]

**14. **A. Levin, S. Hasinoff, and P. Green, “4D frequency analysis of computational cameras for depth of field extension,” ACM Transactions on Graphics **28**(3), 97 (2007).

**15. **S. R. P. Pavani and R. Piestun, “High-efficiency rotating point spread functions,” Opt. Express **16**(5), 3484–3489 (2008). [CrossRef] [PubMed]

**16. **A. Greengard, Y. Y. Schechner, and R. Piestun, “Depth from diffracted rotation,” Opt. Lett. **31**(2), 181–183 (2006). [CrossRef] [PubMed]

**17. **S. Quirin, S. R. P. Pavani, and R. Piestun, “Optimal 3D single-molecule localization for superresolution microscopy with aberrations and engineered point spread functions,” Proc. Natl. Acad. Sci. **109**(3), 675–679 (2012). [CrossRef] [PubMed]

**18. **S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proc. Natl. Acad. Sci. **106**(9), 2995–2999 (2009). [CrossRef] [PubMed]

**19. **S. Quirin and R. Piestun, “Depth estimation and image recovery using broadband, incoherent illumination with engineered point spread functions [Invited],” Appl. Opt. **52**(1), 367–376 (2013). [CrossRef]

**20. **T. Niihara, R. Horisaki, and M. Kiyono, “Diffraction-limited depth-from-defocus imaging with a pixel-limited camera using pupil phase modulation and compressive sensing,” Appl. Phys. Express **8**, 012501 (2014). [CrossRef]

**21. **H.-C. Eckstein, M. Stumpf, P. Schleicher, S. Kleinle, A. Matthes, U. D. Zeitner, and A. Bräuer, “Direct write grayscale lithography for arbitrary shaped micro-optical surfaces,” presented at the 20th Microoptics Conference, Fukuoka, Japan, 25–28 Oct. 2015.

**22. **G. Grover, S. Quirin, Callie Fiedler, and Rafael Piestun, “Photon efficient double-helix PSF microscopy with application to 3D photo-activation localization imaging,” Biomed. Opt. Express **82**(11), 3010–3020 (2011). [CrossRef]

**23. **M. Cannon, “Blind deconvolution of spatially invariant image blurs with phase,” IEEE Trans. Acoust. Speech Signal Process. **24**, 230–2351976).

**24. **P. W. Smith and N. Nandhakumar, “An improved power cepstrum based stereo correspondence method for textured scenes,” IEEE Trans. Pattern Anal. Mach. Intell. **18**(3), 338–348 (1996). [CrossRef]

**25. **A. M. Noll, “Short-time spectrum and ”cepstrum” techniques for vocal-pitch detection,” J. Acoust. Soc. Am. **36**(2), 296–302 (1964). [CrossRef]

**26. **A. M. Noll, “Cepstrum pitch determination,” J. Acoust. Soc. Am. **41**(2), 293–309 (1967). [CrossRef] [PubMed]

**27. **R. Rom, “On the cepstrum of two-dimensional functions (Corresp.),” IEEE Trans. Inf. Theory **21**(2), 214–217 (1975). [CrossRef]

**28. **W. K. Pratt, *Digital Image Processing* (John Wiley & Sons, 2007). [CrossRef]