## Abstract

In this paper we present a high-resolution technique to passively sense, detect and recognize a 3D object using computational integral imaging. We show that the use of a non-stationary microlens array improves the longitudinal distance estimation quantization error. The proposed method overcomes the Nyquist upper limit for the resolution. We use 3D non-linear correlation to recognize the 3D coordinates and shape of the desired object.

©2003 Optical Society of America

## 1. Introduction

A number of methods have been proposed for 3D object sensing and visualization [1–5]. The use of 3D object information may improve the recognition of an object in certain circumstances. For 3D recognition, some techniques use digital holography [6]. The disadvantage of this technique is the use of coherent illumination. Other techniques use different 2D perspectives of the object to visualize and recognize a 3D object [7–8]. Integral imaging is a method to collect several 2D images of the same object from different perspectives using an array of microlenses. The elemental images generated by the microlenses are stored digitally using a CCD camera. A technique that uses integral imaging to visualize and detect a 3D object is presented in [9]. The main drawback of this technique is the low resolution of the reconstructed object, and the large quantization error in the longitudinal depth information estimation. The low resolution of the reconstructed images is due to the limited number of microlenses. Another drawback of integral imaging is the blur in the reconstructed image as a result of the small aperture of the microlens. An all-optical technique is presented in [10] to improve the resolution of the image reconstructed using integral imaging with a non-stationary microlens array. This technique is computationally implemented using super resolution techniques for improving the resolution of 2D integral images [11]. Several methods can be used to reconstruct a high-resolution image from a set of low-resolution images. Frequency domain methods uses the Fourier transform shifting property and the sampling theorem to acquire a high-resolution image [12–14]. Spatial domain methods are also presented for the reconstruction of a high-resolution image using a set of noisy low-resolution images [15–17]. Other techniques are used for the same task [18]. The proposed technique increases the computational time as a result of the super resolution process and the large size of the reconstructed high-resolution image. The increase in the computational time depends on the number of multiplexed elemental images we use.

In this paper, we present improved techniques to sense, reconstruct and detect 3D objects with high resolution using integral imaging. We use non-stationary micro optics to improve the resolution of the reconstructed 3D object and the longitudinal accuracy of the detected 3D object. We use iterative back projection [15] algorithm to reconstruct a high-resolution image from the set of low-resolution, noisy, and blurred integral images. In Section 2, we describe the non-stationary integral imaging system and the process of reconstructing the high-resolution image. In Section 3 we describe the 3D reconstruction process, and in Section 4 we illustrate the detection algorithm. Section 5 presents optical experiments and computer simulations to illustrate the system performance followed by conclusion in Section 6.

## 2. High resolution 3D image reconstruction using time multiplexed integral imaging

In integral imaging, several parameters control the resolution of the reconstructed 3D image, including, the microlens size and pitch, and CCD pixel size. To improve the resolution of the reconstructed 3D image we use a non-stationary microlens array [10], as shown in the clip in Fig. 1. We assume that the microlenses have a sufficient depth of focus such that all the elemental images are obtained at the same plane. Also, we assume that the elemental images do not overlap. Those conditions are obtained by placing the object far enough from the microlens array. On the other hand, the farther the object is from the microlens array, the lower the elemental images resolution and the depth estimation accuracy are. In the optical experiment setup we use an imaging lens before the CCD, not shown, to image the elemental images on the CCD.

The resultant time multiplexed integral images are a set of low-resolution elemental images that will be processed to generate a high-resolution image. The clip in Fig. 2 shows the sequence of low-resolution images corresponding to one of the elemental images.

#### 2.1 Maximum possible depth estimation resolution

In our analysis we assume that the microlens array has 100% fill factor, the elemental images are formed at distance plane that is *d* from the microlens array, and *ϕ* is the microlens diameter. Using one-dimensional notation, let *n* be the microlens number with reference to the center microlens. Using Fig. 3, the projection of an object pixel at the CCD plane for two different longitudinal depths can be represented by:

and,

Where Δ*z*=(*z _{2}*-

*z*) and

_{1}*Δx*=(

*x*-

_{1}*x*).

_{2}It is important to calculate the ability of a microlens to detect the change of the depth of a certain pixel in the 3D object. This ability depends on the optical setup and the size of the CCD pixel. In other words, we need to know what is the minimum change in an object pixel depth [*Δz* in Eq. (2)] needed to produce a shift of one pixel in an elemental image. While calculating the best possible depth resolution, we have to set a minimum z distance such that all the elemental images are focused at the same plane and are not overlapped. We use a microlens array with each microlens having a diameter *ϕ*=1, and a focal length of 5.2mm. Figure 4 shows the depth change as a function of the longitudinal distance of the 3D object, z_{1}, that produces a one pixel shift at the CCD plane for n=6 and n=10. It is worth to mention that as the microlens number gets lower (that is, the lenslet becomes closer to the optical axis), a higher depth change is needed to produce one pixel shift in the projected image plane. It is clear that a stationary microlens array cannot produce high accuracy to estimate the depth of a 3D object.

Assume that the CCD has a pixel size of *c*×*c*, and the minumum acceptable distance by the system is z_{min}. We can show that the depth change from z_{min} that produces a one pixel shift on the CCD is given by:

#### 2.2 High resolution integral image generation

According to the previous discussion, enhancing the spatial resolution of the elemental integral images improves the accuracy of depth estimation Δz. For example, enhancing the spatial resolution by *N* times has an effect equivelent to reducing the pixel size by *N* times which improves, theoretically, the depth estimation accuracy by as much as *N* times.

Super resolution is the process of reconstructing a single high-resolution image from a set of low-resolution images. We use Iterative Back Projection (IBP) to reconstruct the high-resolution image. In [11], this technique was used to improve the resolution of elemental images, and 2D perspectives of the 3D object. Let *f*(*x*) be a low resolution image, and *g*(*x*) be the reconstructed high resolution image. The reconstructed high resolution image at iteration *j* is given by:

where *h _{psf}* is the optical system point spread function, ↓

^{D}is down sampling, ↑

^{U}is up-sampling, ⊗ stands for correlation,

*N*is the number of low resolution images and

*i*takes a value from 1 to

*N*.

*h*is the microlens point spread function which is obtained using a point source placed at a far distance.

_{psf}*h*is the impulse response of the back projection function and is obtaind by psudo invers of the point spread function. Figure 5 shows the reconstructed high-resolution elemental image from a set of four low-resolution images. In Fig. 5, the resolution enhancement in the horizontal direction is achieved through interpolation.

_{bp}Figure 6 shows *Δz* (using equation 3) that will produce a one pixel shift at the microlens image plane for different microlens numbers (*n*). Stationary microlens array is compared with a moving microlens array (time multiplexed images) with a sequence of *N*=5 integral images.

## 3. High resolution 3D reconstruction and depth estimation:

The system starts by storing a set of elemental images. Then the microlens array is moved by a steps of *ϕ*/*N* to store another array of elemental integral images. We repeat this step until we store *N* arrays of elemental integral images. Figure 7 shows an example of a sequence of low-resolution time multiplexed elemental integral images.

We reconstruct a high-resolution image for each elemental image of the microlens array using the stored sequence of low-resolution elemental images. The image projected by the center lens is used as the reference image. Then, the depth of each pixel in the center elemental image is estimated by locating it in the other elemental images to determine *Δx*.

We start by deciding the desired depth estimation resolution *Δz _{min}*. To locate a certain object pixel in the other elemental images we use a window that is centered on this pixel with a size of

*W*×

_{x}*W*. We chose the window size as small as possible to increase the longitudinal distance estimation.

_{y}As the longitudinal distance of an object pixel increases, the ability to detect its depth with the desired resolution decreases. So, it makes sense to set a maximum limit for the maximum acceptable object depth that can be detected using the optical setup such that the longitudinal resolution is preserved. Using Fig. 3, we can show that the maximum detectable object depth for a pixel *u* in the 3D object such that the longitudinal resolution is preserved by Δz is:

Figure 8 shows the maximum detectable longitudinal distance, *z _{max}*, as a function of the object pixel coordinate

*u*for different microlenses. Both cases of a stationary and moving microlens array are illustrated. The total number microlens is

*n*.

We use the sum of the squared difference as a metric to decide the location of a specific object pixel in the elemental images. The squared difference, *d _{n,m}*(

*x,y*) between a window centered at coordinates

*x*and

_{p}*y*in the center elemental image and a moving window, centered at

_{q}*x*,

_{i}*y*, over the integral image corresponding to lens (

_{j}*n,m*) is given by:

$${{\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}I}_{n,m}(x-{x}_{i},y-{y}_{j})]}^{2}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}p,i=1,{N}_{x}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}q,j=1,{N}_{y},$$

where *I _{n.m}*(

*x,y*) is the elemental image projected by lens (

*n,m*), (

*x,y*) are the spatial coordinates, and

*N*,

_{x}*N*are the number of rows and columns in the elemental image.

_{y}*W*,

_{x}*W*are the correlation window width and height. Then, $[{x}_{\mathit{pn}}^{(n,m)},{y}_{\mathit{qm}}^{(n,m)}]\equiv {\mathrm{arg}}_{{x}_{i},{y}_{j}}min\left({d}_{n,m}({x}_{p},{x}_{q},{x}_{i},{y}_{j})\right)$ is determined. Figure 9 presents a movie to illustrate the process of detecting the projected object coordinates in the elemental images. Using Eqs. (1) or Eq. (2) we can estimate the depth

_{y}*ẑ*(

_{n, m}*x, y*) using the elemental image corresponding to lens number (

*n,m*). The Actual depth of the 3D object coordinates (

*u,v*) can be presented as:

where *n _{z}*(

*u,v;n,m,ẑ*) is the longitudinal depth estimation error and

*ẑ*(

*u, v; n, m*) is the depth estimation. The estimation error can be represented as a uniform random variable with zero mean and a variance equals to Δ

*z*

^{2}(

*u, v; n, m, ẑ*)/12, where

*Δz*is computed using Eq. (3). The final depth estimation for an object coordinates (

*u,ν*) can be given by:

where *N _{u}*, and

*N*are the number of active rows and columns in the microlens array used in computing

_{v}*ẑ*(

*u,v;m, n*)). Assuming that the quantization errors in depth estimation in different elemental images are statistically independent and the number of elemental images is large enough, the final depth is:

where *n _{z}*(

*u,v*) is a Gaussian random variable with zero mean and a variance equals:

It is evident that using Eq. (8) to estimate the depth of a 3D object depth has enhanced the distribution of the estimation error from uniform distribution to Gaussian distribution, and has reduced the error variance.

## 4. 3D object detection

In the previous section we illustrated the process of reconstructing the 3D coordinates of the input scene. In this section we detect a 3D object in the scene. We use non-linear correlation to detect the 3D coordinates of the reference target in the scene. Non-linear correlation has the advantage of controlling the balance between distortion tolerance and discrimination by changing the non-linearity coefficient [19].

The detection process starts by obtaining the depth estimation of the reference object and the candidate scene to be inspected. Then, we perform non-linear 3D correlation between the reference object and the estimates scene. Let the reference object be denoted as *r*(*x,y,z*) and the reconstructed input scene be denoted as *s*(*x,y,z*). The correlation output is given by:

where |*S*| and |*R*| are the absolute value of Fourier transform of *s*(*x,y,z*) and *r*(*x,y,z*), respectively. *ϕ _{S}* and

*ϕ*are the phases of the Fourier transform of

_{R}*s*(

*x,y,z*) and

*r*(

*x,y,z*), respectively.

*IFT*stands for the inverse Fourier transform, and

*k*is the non-linearity that is between 1 and 0. The reference object coordinates (

*x,y,z*) is determined according to the maximum of C.

## 5. 3D recognition experimental results and simulations

In this section we present the process of constructing the 3D scene using non-stationary microlens array. Also we illustrate the 3D recognition of a 3D object using non-linear correlation. The input scene contains two circular objects having the same size. The object on the right is located 75mm from the microlens array and the second object is located 79mm from the microlens array plane. Each microlens has a diameter of 1mm and a focal length of 5.2mm. The microlens array is moved using a stage with micrometer scale. We store the set of integral images using a high resolution CCD and digital computer. We move the stage with a step of .2 mm to take 5 exposures of the integral images. Figure 10 shows a 2D image of the input scene.

We use IBP to generate high-resolution integral images from 5 low-resolution integral images. We store the integral images using digital computer. The CCD has a pixel size of 12µm×12µm. Figure 11 shows the stored sequence of low-resolution integral images. In Fig. 12 we show the low-resolution elemental images and the reconstructed high-resolution elemental image.

To reconstruct the 3D scene, the position of each coordinate in the center elemental image is located using Eq. (8). When applying Eq. (8), we take into consideration the angle of view of the microlens array. The elemental images are thresholded before processing to remove the background.

Figure 13 shows the calculated 2D map of the depth of the two 3D objects in the scene. The colors in the figure represent the depth information. As shown in Fig. 13 there is some noise in the estimated depth. In Fig. 14, we show the reconstructed 3D scene

To recognize the 3D object, we may use a number of correlation algorithms [19–22]. We apply a non-linear filter to the reconstructed 3D object. The high degree of non-linearity (low k) produces a correlation filter with higher discrimination. We find, heuristically, that a non-linearity index of 0.3 provides a good balance between the filter discrimination ability and distortion tolerance. The recognition filter is generated through the reconstruction of the reference object alone at a known location. Using the assumption that the reference object coordinates are known we can locate the location of the 3D objects in the scene. In the following experiment we try to detect the 3D object in Fig. 10, that is the bug on the right, by applying a 3D non-linear filter to the 3D scene. Figure 15 shows different layers from the 3D correlation output. Figures 15(a) and 15(b) illustrates the ability of the correlator to detect the *z* distance of the 3D object properly. Figure 15(c) shows the correlation output at plane that is 79 mm from the lenslet, which is the location of the other object. As we can see from the figure there was a sharp peak at the location of the desired 3D object at plane that is 75mm from the lenslet.

We repeat the same experiment for the other object in Fig. 10 that is the object on the left side. Figure 16 shows different planes from the 3D correlation output showing that the filter is able to detect the (x,y,z) coordinates of the 3D object and discriminate between the reference object and the other object.

## 6. Conclusion

In this paper we presented a method to reconstruct and detect 3D objects with improved resolution using time multiplexing computational integral imaging. We use a non-stationary microlens array to overcome the low-resolution limit due to the Nyquist upper limit. The proposed algorithm improves the longitudinal depth estimation accuracy. The depth estimation improvement depends on the number of time multiplexed integral images used in the process. We illustrate improvements in depth estimation due to the use of moving lenslet array. We use non-linear filtering to detect the 3D coordinates of the 3D object. Experiments illustrate the ability of the system to reconstruct the 3D scene. Also, experiments illustrate ability to detect the 3D coordinates of the object within the scene.

## References and links

**1. **H. E. Ives, “Optical properties of a Lippmann lenticulated sheet,” J. Opt. Soc. Am. **21**, 171–176 (1931). [CrossRef]

**2. **J. Caulfield, *Handbook of Optical Holography*, (Academic, London, 1979).

**3. **F. Okano, J. Arai, H. Hoshino, and I. Yuyama, “Three dimensional video system based on integral photography,” Opt. Eng. **38**, 1072–1077 (1999). [CrossRef]

**4. **H. Arimoto and B. Javidi, “Integral three-dimensional imaging with digital reconstruction,” Opt. Lett. **26**, 157–159 (2001). [CrossRef]

**5. **T. Okoshi, *Three-dimensional Imaging Techniques*, (Academic, NY, 1971).

**6. **B. Javidi and E. Tajahuerce, Three-dimensional object recognition by use of digital holography,” Opt. Lett. **25**, 610–612 (2000). [CrossRef]

**7. **J. Rosen, “three dimensional electro-optical correlation,” J. Opt. Soc. Am. A **15**, 430–436 (1998). [CrossRef]

**8. **J. Rosen, “Electrooptical correlators for three-dimensional pattern recognition” in Image Recognition and Classification: Algorithms, Systems, and Applications, B. Javidi Ed., Marcel Dekker, NY2002.

**9. **Y. Frauel and B. Javidi, “Digital three-dimensional image correlation by use of computer reconstructed integral imaging,” Appl. Opt. **41**, 5488–5496 (2002). [CrossRef] [PubMed]

**10. **J. Jang and B. Javidi, “Improved viewing resolution of three dimensional integral imaging by use of non-stationary micro-optics,” Opt. Lett. **27**, 324–326 (2002). [CrossRef]

**11. **A. Stern and B. Javidi, “3D Image Sensing and Reconstruction with Time-Division Multiplexed Computational Integral Imaging (CII),” Applied Optics-IP **42**, 7036–7042 (2003). [CrossRef]

**12. **R. Tsu and T. Huang, “Multi-frame image restoration and registration,” in advances in computer vision and image processing 1, 317–339, JAI press (1984).

**13. **A. Teklap, *Digital Video Processing*, (Prentice Hall, NJ, 1995).

**14. **S. Kim, N. Bose, and H. Valenzuela, “Recursive reconstruction of high resolution image from noisy undersampled multiframes,” IEEE trans. On Acoustics, Speech and Signal Processing **38**, 1013–1027 (1990). [CrossRef]

**15. **D. Keren, S. Peleg, and R. Brada, “Image sequence enhancement using subpixel displacement,” Proceeding of the IEEE computer society conference on computer vision and pattern recognition, 742–746, June (1988).

**16. **B. Frieden and H. Aumann, “Image reconstruction from multiple 1-D scans using filtered localization projection,” Appl. Optics **26**, 3615–1621 (1987). [CrossRef]

**17. **N. Shah and A. Zakhor, “Multiframe spatial resolution enhancement of color video,” in Proceeding of the IEEE international conference on image processing, Lausanne, Switzerland, 985–988, Sept. (1996).

**18. **R. Schultz and R. Stevenson, “Improved definition video frame enhancement,” in Proceeding of the IEEE international conference of Acoustics, Speech and Signal Processing, 2169–2172, Detroit, MI (1995).

**19. **B. Javidi, “Nonlinear joint power spectrum based optical correlators,” Appl. Opt. **28**, 2358–2367 (1989). [CrossRef] [PubMed]

**20. **A. Mahalanobis, “On the optimality of the MACH filter for detection of targets in noise” Opt. Eng. **36**, 2642–2648 (1997). [CrossRef]

**21. **C. Chesnaud, Ph. Réfrégier, and V. Boulet, “Statistical region snake based segmentation adapted to different physical noise models,” IEEE Transactions on Pattern Analysis and Machine Intelligence **21**, 1145–1157 (1999). [CrossRef]

**22. **B. Javidi and J.L. Homer, *Real-time Optical Information Processing*, (Academic Press, NY1994).