## Abstract

In a computational three-dimensional (3D) volumetric reconstruction integral imaging (II) system, volume pixels (voxels) of the scene are reconstructed plane by plane. Therefore, foreground occluding objects and background occluded objects can be reconstructed separately when there is enough spatial separation between the occluding object and the occluded object. Using volumetric computational II reconstruction, we are able to recognize distorted and occluded objects with correlation based recognition algorithms. We present experimental results which show recognition of 3D rotated and occluded targets in a reconstructed scene. We also show the ability of the proposed technique to recognize distorted and occluded 3D non-training targets.

©2006 Optical Society of America

## 1. Introduction

Three-dimensional (3D) imaging and visualization techniques have been the subject of great interest [1–18]. Integral imaging (II) [3–16] is a promising technology among 3D imaging techniques. II systems use a microlens array to capture light rays emanating from 3D objects in such a way that the light rays that pass through each pickup microlens are recorded on a two-dimensional (2D) image sensor. The captured 2D image arrays are referred to as elemental images. The elemental images are 2D images, flipped in both the x and y direction, each with a different perspective of a 3D scene. To reconstruct the 3D scene optically from the captured 2D elemental images, the rays are reversely propagated from the elemental images through a display microlens array that is similar to the pickup microlens array.

In order to overcome image quality degradation introduced by optical devices used in an optical II reconstruction process, and also to obtain arbitrary perspective within the total viewing angle, computational II reconstruction techniques [7, 9, 12–14] have been proposed. The reconstructed high resolution image that could be obtained with resolution improvement techniques [9] is an image reconstructed from a single viewpoint. Recently, a volumetric computational II reconstruction method has been proposed [13], which uses all of the information of the elemental images to reconstruct the full 3D volume of a scene. It allows us to reconstruct 3D voxel values at any arbitrary distance from the display microlens array.

In a complex scene, some of the foreground objects may occlude the background objects, which prevents us from fully observing the background objects. To reconstruct the image of the occluded background objects with the minimum interference of the occluding objects, multiple images with various perspectives are required. To achieve this goal, a volumetric II reconstruction technique with inverse projection of the elemental images has been applied to the occluded scene problem [16].

Many pattern recognition problems can be solved with the correlation approach [17–30]. To be distortion tolerant [19, 28, 29], the correlation filter should be designed with a training data set of reference targets to recognize the target viewed from various rotated angles, perspectives, scales and illuminations. Many composite filters have been proposed according to their optimization criteria. An optimum nonlinear distortion tolerant filter is obtained by optimizing the filter’s discrimination capability and noise robustness to detect targets placed in a nonoverlapping (disjoint) background noise. The filter is designed to maintain fixed output peaks for the members of the true class training target set. Because the nonlinear filter is derived to minimize the mean square error of the output energy in the presence of disjoint background noise and additive overlapping noise, the output energy is minimized in response to the input scene, which may include the false class objects.

One of the challenging problems in pattern recognition is the partial occlusion of objects [31], which can seriously degrade system performance. Most approaches to this problem have been addressed by the development of specific algorithms, such as statistical techniques or contour analysis, applied to the partially occluded 2D image. In some approaches it is assumed that the objects are planar and represented by binary values. Scenes involving occluded objects have been studied recently by using 3D integral imaging (II) systems with computational reconstruction [16, 18]. The reconstructed 3D object in the occluded scene can be correlated with the original 3D object.

In this paper we have applied an optimum nonlinear filter technique to detect distorted and occluded 3D objects using volumetric computational II reconstruction. Overviews of 3D volumetric computational II reconstruction of the occluded objects and distortion-tolerant optimum nonlinear filter technique are discussed in section 2. The experiments and computer simulation for detecting distorted and occluded background objects are provided in section 3, and the conclusion follows in section 4.

## 2. Overviews of related techniques

#### 2.1 3D volumetric reconstructions of occluded objects using computational II

Each voxel of the 3D object is mapped into the imaging plane of the pickup microlens array and contributes to form the elemental images in the pickup process of the II system within the viewing angle range of the system. Each recorded elemental image conveys a different perspective and different distance information of the 3D object. The 3D volumetric computational II reconstruction method extracts pixels from the elemental images by an inverse mapping through a computer synthesized (virtual) pinhole array, and displays the corresponding voxels [13,14] on a desired display plane. The elemental images inversely mapped through the synthesized pinhole array may overlap each other at any depth level from the virtual pinhole array for *M*>1, where *M* is the magnification factor. It is the ratio of the distance, *z*, between the synthesized pinhole array and the reconstruction image plane to the distance, *g*, between the synthesized pinhole array and the elemental image plane, that is *M*=*z*/*g*. The intensity at the reconstruction plane is inversely proportional to the square of the distance between the elemental image and the reconstruction plane. The inverse mappings of all the elemental images corresponding to the magnification factor *M* form a single image at any reconstruction image plane. To form the 3D volume information, this process is repeated for all reconstruction planes of interest with different distance information. In this manner, we use all of the information of the recorded elemental images to reconstruct a full 3D scene, which requires simple inverse mapping and superposition operations.

Since it is possible to reconstruct image planes of interest with volumetric computational II reconstruction, we are able to separate the background objects from the foreground objects, that is, it is possible to reconstruct the image of the background object with a reduced effect of the foreground occluding objects. However, there is a constraint on the distance between the foreground objects and background objects [16]. The minimum distance between the occluding object and a pixel on the background object is *d*
_{0}×*l*_{c}
/(*n*-1)*p*, where *d*_{0}
is the distance between the virtual pinhole array and the pixel of the background object, *l*_{c}
is the length of the occluding foreground object, *p* is the pitch of the virtual pinhole, and *n* is the rhombus index number which defines an volume in the reconstructed volume.

#### 2.2 Distortion-tolerant optimum nonlinear filter

In this section, we overview a distortion tolerant optimum nonlinear filter for recognizing rotated and/or scaled targets [19]. *r*
_{i}(*t*) denotes one of the distorted reference targets where *i*=1, 2, …, *T*, and *T* is the size of reference target set. The input image *s*(*t*) which may include distorted targets is

where *v*_{i}
is a binary random variable which takes a value of 0 or 1, of which probability mass functions are, *p*(*ν*_{i}
=1)=1/*T*, and *p*(*ν*_{i}
=0)=1-1/*T*. In Eq. (1), *v*_{i}
indicates whether the target *r*_{i}
(*t*) is present in the scene or not. If *r*_{i}
(*t*) is one of the reference targets, *n*_{b}
(*t*) is the nonoverlapping background noise with mean *m*_{b}
, *n*_{a}
(*t*) is the overlapping additive noise with mean *m*_{a}
, *w*(*t*) is the window function for the entire input scene, *w*_{ri}
(*t*) is the window function for the reference target *r*_{i}
(*t*), *τ*_{i}
is a uniformly distributed random location of the target in the input scene, whose probability density function is *f*(*τ*_{i}
)=*w*(*τ*_{i}
)/*d* (*d* is the area of the support region of the input scene). *n*_{b}
(*t*) and *n*_{a}
(*t*) are assumed to be wide-sense stationary random processes and statistically independent to each other.

The filter is designed so that when the input to the filter is one of the reference targets, then the output of the filter in the Fourier domain expression becomes

where *H*(*k*) and *R*_{i}
(*k*) are the discrete Fourier transforms of *h*(*t*) (impulse response of the distortion tolerant filter) and *r*_{i}
(*t*), respectively, * denotes complex conjugate, *M* is the number of sample pixels, and *C*_{i}
is a positive real desired constant. Equation (2) is the constraint imposed on the filter. To obtain noise robustness, we minimize the output energy due to the disjoint background noise and additive noise. We can integrate both disjoint background noise and additive noise can be represented in one noise term as *n*(*t*)=*n*_{b}
(*t*){*w*(*t*)-${\mathrm{\Sigma}}_{i=1}^{T}$
*v*_{i}*w*_{ri}
(*t*-*τ*_{i}
)}+*n*_{a}
(*t*)*w*(*t*). We minimize a linear combination of the output energy due to the input noise and the output energy due to the input scene under the filter constraint in Eq. (2).

Let *a*_{k}
+*jb*_{k}
be the *k*-th element of *H*(*k*), and *c*_{ik}
+*jd*_{ik}
be the *k*-th element of *R*_{i}
(*k*), and *D*(*k*)=(*w*_{n}*E*|*N*(*k*)|^{2}+*w*_{d}
|*S*(*k*)|^{2})/*M* in which *E* is the expectation operator, *N*(*k*) is the Fourier transform of *n*(*t*), *S*(*k*) is the Fourier transform of *s*(*t*), *w*_{n}
and *w*_{d}
are the positive weights of the noise robustness capability and discrimination capability, respectively. Now, the problem is to minimize

with the real and imaginary part constrained, because *MC*_{i}
is a real constant in Eq. (2). We use the Lagrange multiplier to solve this minimization problem. Let the function to be minimized with the Lagrange multipliers *λ*_{1i}
, *λ*_{2i}
be

We need to find *a*_{k}
, *b*_{k}
and *λ*_{1i}
, *λ*_{2i}
that satisfy filter constraints. We can obtain values for *a*_{k}
and *b*_{k}
that minimize *J* and satisfy the required constraints,

The following additional notations are used to complete the derivation,

$${A}_{x,y}\equiv \sum _{k=0}^{M-1}\frac{\mathrm{Re}\left[{R}_{x}\left(k\right)\right]\mathrm{Re}\lfloor {R}_{y}\left(k\right)\rfloor +\mathrm{Im}\left[{R}_{x}\left(k\right)\right]\mathrm{Im}\lfloor {R}_{y}\left(k\right)\rfloor}{2D\left(k\right)}=\sum _{k=0}^{M-1}\frac{{c}_{\mathit{xk}}{c}_{\mathit{yk}}+{d}_{\mathit{xk}}{d}_{\mathit{yk}}}{2D\left(k\right)},$$

$${B}_{x,y}\equiv \sum _{k=0}^{M-1}\frac{\mathrm{Im}\left[{R}_{x}\left(k\right)\right]\mathrm{Re}\lfloor {R}_{y}\left(k\right)\rfloor -\mathrm{Re}\left[{R}_{x}\left(k\right)\right]\mathrm{Im}\lfloor {R}_{y}\left(k\right)\rfloor}{2D\left(k\right)}=\sum _{k=0}^{M-1}\frac{{d}_{\mathit{xk}}{c}_{\mathit{yk}}-{c}_{\mathit{xk}}{d}_{\mathit{yk}}}{2D\left(k\right)},$$

where superscript *t* is the matrix transpose, and Re(•), Im(•) denote the real and imaginary parts, repectively. Let **A** and **B** be *T*×*T* matrices whose elements at (*x*, *y*) are *A*_{x,y}
, and *B*_{x,y}
, respectively. We substitute *a*_{k}
and *b*_{k}
into the filter constraints and solve for *λ*_{1i}
, *λ*_{2i}
,

From Eqs. (5) and (6), we obtain the *k*-th element of the distortion tolerant filter *H*(*k*),

We have chosen both *w*_{n}
and *w*_{d}
in *D*(*k*) as *M*/2. Therefore, the optimum nonlinear distortion tolerant filter *H*(*k*) is

where ${\mathrm{\Phi}}_{b}^{0}$(*k*) is the power spectrum of the zero-mean stationary random process ${n}_{b}^{0}$(*t*), and ${\mathrm{\Phi}}_{a}^{0}$(*k*) is the power spectrum of the zero-mean stationary random process ${n}_{a}^{0}$(*t*). *W*(*k*) and *W*_{ri}
(*k*) are the discrete Fourier transforms of *w*(*t*) and *w*_{ri}
(*t*), respectively. ⊗ denotes a convolution operator. *λ*_{1i}
and *λ*_{2i}
are obtained from Eq. (6).

## 3. Distortion-tolerant 3D occluded object recognition

Figure 1 depicts the system setup to capture the occluded 3D scene. Volumetric computational II reconstruction is performed in a computer with a virtual pinhole array using ray optics.

Figure 2 shows the two toy cars and foreground vegetation illuminated by incoherent light used in the experiments. The pickup microlens array is placed in front of the object to form the elemental image array. The distance between the microlens array and the closest part of the occluding vegetation is around 30 *mm*, the distance between the microlens array and the front part of the green car is 42 *mm*, and the distance between the microlens array and the front part of the blue car is 52 *mm*. The minimum distance between the occluding object and a pixel on the closest background object should be equal to or greater than 9.6 *mm* according to Eq. (7) in reference 16, where the rhombus index number in our experiments is 7 for the green car. This satisfies the constraint of the experimental setup to reconstruct the background objects. The background objects are partially occluded by foreground vegetation, thus, it is difficult to recognize the occluded objects from the 2D scene in Fig. 2. The elemental images of the object are captured with the digital camera and the pickup microlens array. The microlens array used in the experiments has 53×53 square refractive lenses in a 55 *mm*×55 *mm* square area. The size of each lenslet is 1.09 *mm*×1.09 *mm*, with less than 7.6 *µm* separation. The focal length of each microlens is 3.3 *mm*. The size of each captured elemental image is 73 pixels×73 pixels.

In our experiments, we have used the blue car as a true class target, and the green car as a false object. In other words, we aim to detect only the blue car in a scene that contains both of the green and blue cars. Because of the similarity of the shape of the cars used in the experiments, it is difficult to detect the target object with linear filters. We have obtained 7 different elemental image sets by rotating the reference target from 30° to 60° in 5° increments. One of the captured elemental image sets that are used to reconstruct the 3D training targets are shown in Fig. 3. The reconstructed images from the elemental image sets are shown in Fig. 4 at various distance levels. From each elemental image set with rotated targets, we have reconstructed the images from *z*=60 *mm* to *z*=72 *mm* in 1*mm* increments. Therefore, for each rotated angle (from 30° to 60° in 5° increments) 13 reconstructed images are used as a 3D training reference target. As rotation angle increases, we can observe more of the side view of the object and less frontal view. The input elemental images have a true class training target, or a true class non-training target and a false object (green car). True class training target is a set of 13 reconstructed images of the blue car rotated at 45°. True class non-training target is a set of 13 reconstructed images of the blue car rotated at 32.5°, which is not from the training reference targets. True class training and non-training targets are located on the right side of the input scene and the false object is located at the left side of the scene. The true class non-training target used in the test is distorted in terms of out-of-plane rotation, which is challenging to detect. Figure 5 shows the reconstructed 3D scene from the elemental images of the occluded true class training target scene with the false object taken at an angle of 45° with various longitudinal distances. Similarly, Figure 6 shows the reconstructed 3D scene from the elemental images of the occluded true class non-training target scene with the false object taken at an angle of 32.5° with various longitudinal distances. With volumetric computational II reconstruction, it is possible to separate the foreground occluding object and background occluded objects with the reduced interference of the foreground objects. The movie files of the reconstructed images in Figs. 5 and 6 are shown in Figs. 7 and 8, respectively.

The distortion tolerant optimum nonlinear filter has been constructed in a 4D structure, that is, *x*, *y*, *z* coordinates and 3 color components. Figures 9 and 10 are visualization of the 4D optimum nonlinear filter at different longitudinal depth levels. We set all of the desired correlation values of the training targets, *C*_{i}
, to 1 [see Eq. (2)]. Figures 9(a)–9(d) are the normalized outputs of the 4D optimum nonlinear distortion tolerant filter in Eq. (8) at the longitudinal depth levels of the occluding foreground vegetation, the true class training target, and the false object, respectively. A dominant peak only appears at the true class target distance, as shown in Figure 9(d). Figures 10(a)–10(d) are the normalized outputs of the 4D optimum nonlinear distortion tolerant filter at the longitudinal levels of the occluding foreground vegetation, the true class non-training target, and the false object, respectively.

Figure 10(d) shows a dominant peak at the location of the true class non-training target. The peak value of the true class training target is higher than that of the true class non-training target. The ratio of the non-training target peak value to the training target peak value is 0.9175. The ratio of the peak value to the maximum side-lobe is 2.8886 at the 3D coordinate of the false object. It is possible to distinguish the true class targets and false object or occluding foreground objects.

Because of the constraint of the minimum distance between the occluding object and a pixel on the background object, the experimental setup is very important to reconstruct the background image with a reduced effect of the foreground occluding objects. One of the parameters to determine the minimum distance is the density of the occluding foreground object. If the density of the foreground objects is high, the background object should be farther from the image pickup system. If not, the background objects may not be fully reconstructed, which can result in poor recognition performance. Nevertheless, even in this case, the proposed approach gives us better performance than that of the 2D recognition systems [18].

## 4. Conclusion

Using a 3D computational volumetric II reconstruction system and a 3D distortion tolerant optimum nonlinear filtering technique, we have represented experiments to recognize a partially occluded and distorted 3D objects in a 3D scene. The experimental results show that we can reconstruct the background objects with the reduced effect of occluding foreground. With the distortion tolerant 4D optimum nonlinear filter (3D coordinates plus color), we have demonstrated the recognition capability of the rotated 3D targets when the input scene contains false objects and is partially occluded by foreground objects such as vegetation.

## Acknowledgments

We wish to thank Dr. A. Shortt, Dr. S. Yeom, and M. Daneshpanah for their assistance with this manuscript.

## References and Links

**1. **S. A. Benton, ed., *Selected Papers on Three-Dimensional Displays* (SPIE Optical Engineering Press, Bellingham, WA, 2001).

**2. **P. Ambs, L. Bigue, R. Binet, J. Colineau, J.-C. Lehureau, and J.-P. Huignard, “Image reconstruction using electro-optic holography,” *Proc. of the 16th Annual Meeting of the IEEE Lasers and Electro-Optics Society, LEOS 2003*, vol. 1 (IEEE, Piscataway, NJ, 2003) pp. 172–173.

**3. **B. Javidi and F. Okano, eds., *Three Dimensional Television, Video, and Display Technologies* (Springer, Berlin, 2002).

**4. **T. Okoshi, *Three-dimensional Imaging Techniques* (Academic Press, New York, 1976).

**5. **G. Lippmann, “La photographic intergrale,” C. R. Acad. Sci.146, 446–451 (1908).

**6. **H. E. Ives, “Optical properties of a Lipmann lenticulated sheet,” J. Opt. Soc. Am. **21**, 171–176 (1931). [CrossRef]

**7. **H. Arimoto and B. Javidi, “Integral three-dimensional imaging with digital reconstruction,” Opt. Lett. **26**, 157–159 (2001) [CrossRef]

**8. **J.-S. Jang and B. Javidi, “Formation of orthoscopic three-dimensional real images in direct pickup one-step integral imaging,” Opt. Eng. **42**, 1869–1870 (2003). [CrossRef]

**9. **A. Stern and B. Javidi, “Three-dimensional image sensing and reconstruction with time-division multiplexed computational integral imaging,” Appl. Opt. **42**, 7036–7042 (2003). [CrossRef] [PubMed]

**10. **H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A **15**, 2059–2065 (1998). [CrossRef]

**11. **J.-S. Jang and B. Javidi, “Improved viewing resolution of three-dimensional integral imaging by use of nonstationary micro-optics,” Opt. Lett. **27**, 324–326 (2002). [CrossRef]

**12. **M. Martínez-Corral, B. Javidi, R. Martínez-Cuenca, and G. Saavedra, “Integral imaging with improved depth of field by use of amplitude modulated microlens array,” Appl. Opt. **43**, 5806–5813 (2004). [CrossRef] [PubMed]

**13. **S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express **12**, 483–491 (2004), http://www.opticsexpress.org/abstract.cfm?URI=OPEX-12-3-483. [CrossRef] [PubMed]

**14. **S. Yeom, B. Javidi, and E. Watson, “Photon counting passive 3D image sensing for automatic target recognition,” Opt. Express **13**, 9310–9330 (2005), http://www.opticsinfobase.org/abstract.cfm?URI=oe-13-23-9310. [CrossRef] [PubMed]

**15. **B. Javidi, S.-H. Hong, and O. Matoba, “Multi dimensional optical sensors and imaging systems,” Appl. Opt. **45**, 2986–2994 (2006). [CrossRef] [PubMed]

**16. **S.-H. Hong and Bahram Javidi, “Three-dimensional visualization of partially occluded objects using integral Imaging,” IEEE J. Display Technol. **1**, 354–359 (2005). [CrossRef]

**17. **Y. Frauel and B. Javidi, “Digital three-dimensional image correlation by use of computer-reconstructed integral imaging,” Appl. Opt. **41**, 5488–5496 (2002). [CrossRef] [PubMed]

**18. **B. Javidi, R. Ponce-Diaz, and S.-H. Hong, “Three-dimensional recognition of occluded objects using volumetric reconstruction,” Opt. Lett. **31**, 1106–1108 (2006). [CrossRef] [PubMed]

**19. **S.-H. Hong and B. Javidi, “Optimum nonlinear composite filter for distortion-tolerant pattern recognition,” Appl. Opt. **41**, 2172–2178 (2002). [CrossRef] [PubMed]

**20. **O. Matoba, T. J. Naughton, Y. Frauel, N. Bertaux, and B. Javidi, “Real-time three-dimensional object reconstruction by use of a phase-encoded digital hologram,” Appl. Opt. **41**, 6187–6192 (2002). [CrossRef] [PubMed]

**21. **B. Javidi, *Image Recognition and Classification, Algorithms, Systems, and Applications* (Marcel Dekker, Inc., New York, 2002). [CrossRef]

**22. **J. W. Goodman, *Introduction to Fourier optics, 2 ^{nd} edition* (McGraw-Hill, New York, 1996). [PubMed]

**23. **H. Kwon and N. M. Nasrabadi, “Kernel RX-algorithm: a nonlinear anomaly detector for hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens. **43**, 388–397 (2005) [CrossRef]

**24. ***Selected Papers on Automatic Target Recognition*, F. Sadjadi, Editor, SPIE- CDROM (1999).

**25. **J. L. Turin, “An introduction to matched filters,” IRE Trans. Inf. Theor. **IT-6**311–329 (1960). [CrossRef]

**26. **P. Refreigher, V. Laude, and B. Javidi, “Nonlinear joint-transform correlation: an optimal solution for adaptive image discrimination and input noise robustness,” Opt. Lett. **19**, 405–407 (1994).

**27. **A. Mahalanobis, “Review of correlation filters and their application for scene matching,” Optoelectronic Devices and Systems for Processing, Critical Reviews of Optical Science Technology , B. Javidi and K. Johnson eds., **CR 65**, SPIE Press, 240–260 (1996).

**28. **B. Javidi and J. Wang, “Optimum distortion-invariant filter for detecting a noisy distorted target in nonoverlapping background noise,” J. Opt. Soc. Am. A **12**, 2604–2614 (1995). [CrossRef]

**29. **F. Goudail and P. Refregier, “Statistical algorithms for target detection in coherent active polarimetric images,” J. Opt. Soc. Am. **18**, 3049–3060 (2001). [CrossRef]

**30. **M. T. Prona, A. Mahalanobis, and K. N. Zachery, “LADAR automatic target recognition using correlation filters,” Proc. SPIE, Automatic Target Recognition IX , **3718**, 388–396 (1999).

**31. **J. Maycock, T. Naughton, B. Hennely, J. McDonald, and B. Javidi, “Three-dimensional scene reconstruction of partially occluded objects using digital holograms,” Appl. Opt. **45**, 2975–2985 (2006). [CrossRef] [PubMed]