## Abstract

Three dimensional (3D) imaging systems have been recently suggested for passive sensing and recognition of objects in photon-starved environments where only a few photons are emitted or reflected from the object. In this paradigm, it is important to make optimal use of limited information carried by photons. We present a statistical framework for 3D passive object recognition in presence of noise. Since in quantum-limited regime, detector dark noise is present, our approach takes into account the effect of noise on information bearing photons. The model is tested when background noise and dark noise sources are present for identifying a target in a 3D scene. It is shown that reliable object recognition is possible in photon-counting domain. The results suggest that with proper translation of physical characteristics of the imaging system into the information processing algorithms, photon-counting imagery can be used for object classification.

© 2010 Optical Society of America

## 1. Introduction

Photons are considered the basic carriers of optical information in the context of imaging system. However, a photon’s behavior is governed by principles of quantum physics [1]. This makes it difficult to rely on an individual photon or even a small number of them for reliable information transfer. Another stage of indeterministic process occurs in the detection stage where the photons are converted into electrons and counted using electronic circuitry [2]. Fortunately, there is an abundance of photons in most scenarios which has resulted in sensors, imaging systems and image processing algorithms to operate around statistical properties of information bearing photons. However, there are a number of benefits to systems that can perform various high level tasks such as visualization, object recognition and classification with limited photons.

Many of classical object recognition algorithms operate on images that are formed using tremendous number of photons [3–7]. These algorithms have also been explicitly adopted for three dimensional imaging systems [8, 9]. The optimality of such algorithms, however, may not carry over if these methods are extended directly to the photon counting regime due to the quantum-limited nature of the imagery. Thus, a new class of automatic object recognition problems arise within the context of photon-counting image sensing [10, 11]. In fact, three-dimensional, multi-perspective imaging systems along with conventional linear and nonlinear matched filters have been applied to photon counting object recognition [12, 13]. The methods of statistical sampling theory have also been investigated for such problems [14]. Photon counting image fidelity has also been studied from an information theory point of view [15]. To the best of our knowledge, there has been no study on the effects of background and dark noise on object recognition performance in photon counting domain.

In this paper, the maximum likelihood decision theory is used for object recognition in 3D photon-counting imagery where the ratio of object photons to dark counts is less than one. Unlike conventional intensity images, photon-counting images with very few (50 or less) photons contain many pixels that register no counts at all. A typical matched filter [16], for example, would not consider such pixel as information bearing. Nevertheless, the absence of photon counts, by itself, conveys information about the object which is exploited in the present framework. This is the key difference between the proposed method with prior art which makes it more robust to background and dark noise sources.

The rest of the paper is organized as following: in Section 2, a brief review of multi-view image sensing and reconstruction is presented. In Section 2.1, the disjoint object and background model is combined with quantum-limited photo-detection principles to model realistic photon-counting imagery including dark noise. Section 3 represents the maximum likelihood based pattern recognition algorithm, while Section 4 contains experimental results and performance evaluation of the proposed method. The paper concludes in Section 5.

## 2. Multi-View Photon-Counting 3D Sensing

Three dimensional (3D) passive imaging and display systems using multiple sensors has been extensively studied [17–24]. The image registered by each sensor is commonly referred to as an elemental image. Multiple image sensors can be used in a grid, or a single sensor can sequentially scan and collect the images while moving on a platform (also known as synthetic aperture integral imaging [25]) [see Fig. 1]. In either method, angular information of the rays are encoded in the relative lateral shift of ray-sensor intercept between multiple sensors. Having both direction and intensity information of rays emanating from the object, one can computationally reconstruct the scene at a desired distance from the sensor array using back-projection [26].

Let us consider the one dimensional notation of the *k*-th photon-counted elemental image be
${\mathbf{R}}_{k}=\left\{{r}_{k}^{i}:i=1\dots M\right\}$, where *M* is the total number of pixels in each elemental image. We show the back-propagation of such elemental image at distance *z* as **R*** _{k}* ↑

*, which according to back-propagation based reconstruction can be written as:*

^{z}*p̃*=

*pf*/

*μz*is the pickup grid pitch (

*p*) normalized by sensor pixel pitch (

*μ*) and magnification (

*z*/

*f*) [see Fig. 1].

A point in the object space at distance *z* can be reconstructed by integrating its associated image pixels on all sensors, that is for the *i*-th object space point at distance *z* one has
${\text{R}}^{i}{\uparrow}^{z}={\Sigma}_{k=1}^{K}{\text{r}}_{k}^{i-k\tilde{p}}$. The collection of all points on plane *z* = *z*_{0} represent the light field distribution in the object space at that particular plane, i.e. **R** ↑* ^{z}* = {R

*:*

^{i}*i*= 1 ...

*M*}. Similarly, the light field distribution in 3D object space can be reconstructed at

*Q*intermittent planes, such that:

#### 2.1. Photon Counting Imagery Model

When an object is present in non-obstructing clutter, the clutter can be considered as spatially disjoint background noise. Such models appear frequently in image-based pattern recognition problems when an object is to be detected or recognized in presence of spatially disjoint background noise [27]. The advantage of this model for recognition purposes is that it allows for the object and background pixels to be treated independently based on their respective available a priori knowledge.

We extend this model to 3D imaging systems and particularly apply it to quantum-limited (photon-counting) imaging scenarios. In addition to the disjoint background noise, in quantum-limited imaging conditions, the number of thermally excited (dark) electrons in detector arrays can be comparable to, or exceed, photo-electrons [28]. In such case, it is essential to model and take dark noise into account in pattern recognition problems using quantum-limited imagery.

Dark electrons are predominantly generated by thermal excitation within defective regions in silicon crystal [2]. We assume a uniform defect distribution among sensors’ pixels [29]. For an easier presentation, we associate an equivalent irradiance, *n _{d}*, to each pixel such that statistics of dark counts is preserved. Therefore, the resulting irradiance (

*r*) incident on the

*i*-th pixel of

*k*-th sensor in a multi-view imaging system can be modeled as a combination of object (

*s*), background (

*n*) and dark-count equivalent (

_{B}*n*) irradiances as following:

_{d}*k*-th elemental image so that ${w}_{k}^{i}$ is unity within the object boundaries and zero elsewhere. Note that,

*w*is different for each elemental image and is known a priori as part of reference object information. Likewise, background noise can be different in each elemental image due to varying sensor viewpoints. In addition,

_{k}*α*accounts for the potential difference between unknown object and reference irradiances. Throughout this paper, the pre-superscript, post-superscript and post-subscript for each symbol denote object class, pixel index and signal source, respectively. For example, ${}^{j}{w}_{k}^{i}$ denotes the

*i*-th pixel of a

*j*-th class object support function as seen from

*k*-th elemental image.

In general, inherent stochastic fluctuations of irradiance can influence the statistical properties of photo-counts, which can result in non-Poissonian photo-count distributions. However, it can be shown that for polarized thermal radiation, when the count degeneracy parameter approaches zero, the probability distribution of photo-counts approaches Poisson distribution [1]. In this case, the number of detected photons, r* _{i}*, in

*i*-th pixel is a discrete random variable whose mean is related to irradiance,

*r*, of the light impinging on that pixel and follows Poisson distribution as [30]:

_{i}Intensity images can be used to simulate photon-counting imagery [13] since recorded intensity on a pixel is related to the mean number of photons impinging on each pixel. In simulating photon-counting imagery, one can control the total number of photo-counts in all elemental images combined, *N _{ph}*, by using normalized irradiance as [13]:

*I*is the recorded intensity at

^{i}*i*-th image pixel.

## 3. Pattern Recognition with 3D Photon-Counting Imagery

In the realm of statistical decision theory [31], each possible state (or class) is represented by a hypothesis *H _{j}*. Given the photon-counting imagery, the maximum-likelihood (ML) decision criterion is to choose between one of the hypothesis such that an objective function (e.g. probability of error) is minimized. In a binary classification problem,

*H*

_{1}is selected if object 1 is more likely to have produced the observed photon-counting data. For mathematical brevity, we assume that all object classes are equally likely and that the cost of error is the same for all misclassifications. Given a photon counting dataset,

**R**, a convenient way to use ML decision theory is to calculate the likelihood ratio,

*ℓ*(.), between the two classes and make decisions based on the outcome [30]:

As described in Section 2, the multi-view photon counting imagery can be used to reconstruct the object space in 3D. The same methodology can be extended to quantum-limited imaging conditions. The likelihood function of the reconstruction space under hypothesis *H _{j}* can be expanded as:

*i*of

*k*-th elemental image. The innermost product in Eq. (7) is on

*M*pixels of each elemental image, the second product is on set of

*K*elemental images and the outermost product is on

*Q*reconstruction planes.

Note that the disjoint object and background model in Eq. (3) can be used to rewrite the probability density of each point in space as:

From Eqs. (3) and (4), it is evident that within the object support of *k*-th elemental image, the conditional density of the number of counts for *i*-th pixel,
${\text{r}}_{k}^{i}$, follows a Poisson distribution with the rate
${r}_{k}^{i}=\alpha .{}^{j}{s}_{k}^{i}+{n}_{d}$, with *j* denoting the class hypothesis, i.e.

*ĩ*=

*i*–

*p̃k*for conciseness. The log likelihood in Eq. (10) can be calculated based on the a priori reference object normalized irradiance (

*s*and

*w*), sensor’s characteristic dark count rate (

*n*) and total counts registered at each pixel of the sensor (r). Note that unlike correlation based techniques, the likelihood in Eq. (10) would penalize a high energy background if the object is not present or expected in the scene, i.e. where

_{d}*s*≪ 1 but

^{i}*r*> 0. This reduces the false positive rate and improves the recognition performance.

^{i}Since the object irradiance is only known to be a scalar multiple of the reference object irradiance, we set to zero the partial derivative of Eq. (10) with respect to *α* to find its estimate *α̂* :

*α̂*is that of a high-order polynomial which does not yield a closed form expression. However, for small enough dark noise,

*n*≪

_{d}*s*,

*α̂*can be simply found as:

*α*in Eq. (10). If

*n*≥

_{d}*s*, one can calculate

*α̂*by applying numerical non-linear solvers, such as Newton’s method, on Eq. (11). Note that only pixels with nonzero counts need to be taken into account to find a solution for

*α̂*in Eq. (11). In photon counting domain, only a small number of pixels are expected to report counts, which in turn simplifies calculation of

*α̂*.

Given a set of photon counting elemental images which include both photon-counts as well as dark-counts, one can calculate the log-likelihood in Eq. (10) for all object class hypotheses *j* = 1, 2,...,*J*. In case of the binary classification between two distinct objects, the labeling strategy based on the likelihood ratio in Eq. (6) can be rewritten with log-likelihood values. The resulting decision rule is:

The computational complexity for calculation of Eq. (10) is *O*(*n*) where *n* represents the total number of pixels that belong to the object in all images. Note that in contrast to conventional intensity images that have a large number of 8 or 12 bit pixels, photon counting images are the outcome of a Poisson process [Eq. (4)] with very low number of incident photons. Such images typically require only 1 or 2 bit pixel elements for detection. This results in substantially reduced number of non-zero pixels in the image and directly translates into reduced storage and computational requirements.

## 4. Experimental Results

In order to evaluate the performance of the proposed method, a multi-view 3D imaging system is used to capture toy models. The resulting images are normalized [see Eq. (5)] and transformed to quantum limited imagery through Poisson transformation [see Eq. (4)]. Dark noise is simulated and added to the photon-counting images per Section 4.1. The algorithms presented in Section 3 are applied to determine the performance of classification through Monte-Carlo simulations. The results are demonstrated in terms of Fisher ratio.

#### 4.1. Multi-View 3D Imaging

Two similar toy truck models are chosen as reference objects (see Fig. 2). Both objects fit in a rectangular box of approximately 3″ × 1″ × 1″ and have similar features and shape. The blue truck in Fig. 2(a) is taken to be the true class while the white truck in Fig. 2(b) is assigned to be the false class object.

Using a multi-view imaging system, as shown in Fig. 1, a single sensor scans the pickup plane in an 11×11 grid, and 121 elemental images of both reference objects are recorded.

The horizontal and vertical sensor pitches are *p _{x}* = 16 mm and

*p*= 10 mm respectively and the focal plane size, i.e. sensor size, is 24×36 mm with pixel size of

_{y}*μ*= 10

*μ*m. The imaging optics has a fixed focal length of 24 mm with

*f*# = 5.4. For each elemental image the object support

**w**

*is extracted by thresholding. The reference objects are imaged under controlled illumination against a dark background. Note that the unknown input scenes need not to be imaged in the same illumination condition and can include background noise of arbitrary pattern and brightness [see Eq. (11)]. As unknown input objects, the same two objects are presented to the imaging system in a different pose (comparing to reference objects) with additional pine-tree foliage background. Figure 3 illustrates 16 (out of 121) views of one of the objects.*

_{k}Intensity images of the unknown scene (Fig. 3) are used to generate photon-counting elemental images according to the Poisson detection model described in Section 2.1. Dark counts are also simulated and added to the photon-counts according to Eq. (3) and (4). Figure 4 illustrates how a single view of the scene is generated from its corresponding intensity image. This process is repeated for all elemental images to create multi-view photon-counting image set.

The photon-counting elemental images can be used to reconstruct the object space, **R**, as described in Section 2. The volumetric reconstruction for both photon-counting imagery and reference objects’ 3D images are generated using Eq. (2) based on which Eq. (10) is used to find the log likelihood with respect to both object hypotheses.

#### 4.2. Recognition Performance

The performance of the proposed photon-counting 3D object recognition is tested under two different scenarios. In both, background noise is present in the scene containing the unknown object. In the first scenario, dark noise is disregarded, i.e. *n _{d}* = 0, and the reconstructed photon-counting 3D image of the object only contains the photo-counts. In the second scenario, sensors are assumed to have a fixed dark count rate as a result of which the total dark counts (

*N*) increase proportional to the total photon counts (

_{dc}*N*). In both cases, the illumination conditions are similar, thus

_{ph}*α̂*is set to 1.

To quantify the recognition performance in ideal conditions, lets consider the case where background noise is present but no dark counts are generated at the detector. Photon-counting images of both true and false class objects in background noise are generated 500 times through Monte Carlo simulation based on experimentally captured elemental images. At each step, the likelihood of the photon-limited 3D reconstruction is computed according to Eq. (10) with respect to both true, *H*_{1}, and false, *H*_{2} class reference objects.

The log likelihood ratio in Eq. (13) is then calculated and the difference between the log likelihoods with respect to the known reference objects, i.e. log ℒ(**R**|*H*_{1}) – log ℒ(**R**|*H*_{2}) is used for classification. This quantity, along with its standard deviation, is plotted in Fig. 5(a) for various values of total photon count, *N _{ph}*.

In the second scenario, the total number of dark counts increase with available photons detected from the scene. The dark count rate, *n _{d}*, is chosen such that the expected number of dark counts combined for all elemental images,

*N*, is always 27 times more than that of photon-counts, i.e

_{dc}*N*= 27

_{dc}*N*. This results in a constant ratio of object photons to dark counts equal to 0.037 that is preserved in all experiments. The resulting log likelihood difference and its standard deviation is plotted in Fig. 5(b).

_{ph}As the performance metric, Fisher Ratio can be used. Table 1 and Fig. 6 show the associated Fisher Ratio for each *N _{ph}* in both scenarios.

Although sparse, the information captured from an object using a 3D photon counting imaging system provides one with means for object recognition. The likelihood ratio formulation can be used to process the photon-counting information in a binary classification problem.

As expected, more object photons result in a better discrimination, i.e. a higher Fisher ratio. In our experiments, in the absence of dark counts, an acceptable Fisher ratio of 7.1 can be achieved even at 10 photons per scene. While with more than 20 photons, the binary classification is virtually perfect. In the realistic case of quantum-limited imaging where dark noise is present, the required number of photons increases to about 50 assuming that the fallacious dark counts are 27 times more than photon-counts, i.e. object photons to dark counts ratio of 0.037.

## 5. Conclusion

In this paper, maximum likelihood decision theory is presented for object recognition in photon-counting imagery containing sparse, quantum-limited information about the object. Background and dark noise sources present in realistic scenes are also considered. The imaging system used for capturing both reference object and photon-counting imagery is a multi-view 3D imaging system which can capture 3D structure of the object. Experimental results were demonstrated for binary object recognition at a ratio of 0.037 between object photons and dark counts. The proposed method makes use of the fact that pixels with zero counts also convey object information when it comes to deciding between multiple object hypotheses. This method can be extended to multiple-class recognition problems.

## Acknowledgments

The authors wish to acknowledge the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) for support of this work.

## References and links

**1. **J. W. Goodman, *Statistical Optics* (Wiley-Interscience, 1985), Wiley classics ed.

**2. **J. R. Janesick, *Scientific Charge-Coupled Devices (SPIE Press Monograph Vol. PM83)* (SPIE Publications, 2001), 1st ed. [CrossRef]

**3. **F. Dubois, “Automatic spatial frequency selection algorithm for pattern recognition by correlation,” Appl. Opt. **32**, 4365–4371 (1993). [CrossRef] [PubMed]

**4. **F. Sadjadi, ed., *Selected Papers on Automatic Target Recognition* (SPIE-CDROM, 1999).

**5. **V. Page, F. Goudail, and P. Refregier, “Improved robustness of target location in nonhomogeneous backgrounds by use of the maximum-likelihood ratio test location algorithm,” Opt Lett **24**, 1383–1385 (1999). [CrossRef]

**6. **A. Mahalanobis, R. R. Muise, and S. R. Stanfill, “Quadratic correlation filter design methodology for target detection and surveillance applications,” Appl Opt **43**, 5198–5205 (2004). [CrossRef] [PubMed]

**7. **H. Kwon and N. M. Nasrabadi, “Kernel matched subspace detectors for hyperspectral target detection.” IEEE Trans Pattern Anal Mach Intell **28**, 178–194 (2006). [CrossRef] [PubMed]

**8. **B. Javidi, R. Ponce-Díaz, and S.-H. Hong, “Three-dimensional recognition of occluded objects by using computational integral imaging,” Opt. Lett. **31**, 1106–1108 (2006). [CrossRef] [PubMed]

**9. **O. Matoba, E. Tajahuerce, and B. Javidi, “Real-time three-dimensional object recognition with multiple perspectives imaging,” Appl. Opt. **40**, 3318–3325 (2001). [CrossRef]

**10. **G. M. Morris, “Scene matching using photon-limited images,” J. Opt. Soc. Am. A **1**, 482–488 (1984). [CrossRef]

**11. **E. A. Watson and G. M. Morris, “Imaging thermal objects with photon-counting detectors,” Appl. Opt. **31**, 4751–4757 (1992). [CrossRef] [PubMed]

**12. **S. Yeom, B. Javidi, and E. Watson, “Three-dimensional distortion-tolerant object recognition using photon-counting integral imaging,” Opt. Express **15**, 1513–1533 (2007). [CrossRef] [PubMed]

**13. **S. Yeom, B. Javidi, and E. Watson, “Photon counting passive 3d image sensing for automatic target recognition,” Opt. Express **13**, 9310–9330 (2005). [CrossRef] [PubMed]

**14. **I. Moon and B. Javidi, “Three dimensional imaging and recognition using truncated photon counting model and parametric maximum likelihood estimator.” Opt Express **17**, 15709–15715 (2009). [CrossRef] [PubMed]

**15. **S. R. Narravula, M. M. Hayat, and B. Javidi, “Information theoretic approach for assessing image fidelity in photon-counting arrays,” Opt. Express **18**, 2449–2466 (2010). [CrossRef] [PubMed]

**16. **S. Yeom, B. Javidi, C. wook Lee, and E. Watson, “Photon-counting passive 3d image sensing for reconstruction and recognition of partially occluded objects,” Opt. Express **15**, 16189–16195 (2007). [CrossRef] [PubMed]

**17. **M. G. Lippmann, “La photographie intégrale,,” Comptes-rendus de l’Académie des Sciences **146**, 446–451 (1908).

**18. **C. B. Burckhardt, “Optimum parameters and resolution limitation of integral photography,” J. Opt. Soc. Amer **58**, 71–76 (1968). [CrossRef]

**19. **T. Okoshi, “Three-dimensional displays,” Proceedings of the IEEE **68**, 548–564 (1980). [CrossRef]

**20. **M. C. Forman, N. Davies, and M. McCormick, “Continuous parallax in discrete pixelated integral three-dimensional displays.” J Opt Soc Am A Opt Image Sci Vis **20**, 411–420 (2003). [CrossRef] [PubMed]

**21. **A. Stern and B. Javidi, “Three-dimensional image sensing, visualization, and processing using integral imaging,” Proceedings of the IEEE **94**, 591–607 (2006). [CrossRef]

**22. **F. Okano, J. Arai, K. Mitani, and M. Okui, “Real-time integral imaging based on extremely high resolution video system,” Proceedings of the IEEE **94**, 490–501 (2006). [CrossRef]

**23. **B. Javidi, F. Okano, and J.-Y. Son, eds., *Three-Dimensional Imaging, Visualization, and Display (Signals and Communication Technology)* (Springer, 2008), 1st ed.

**24. **R. Martinez-Cuenca, G. Saavedra, M. Martinez-Corral, and B. Javidi, “Progress in 3-d multiperspective display by integral imaging,” Proceedings of the IEEE **97**, 1067–1077 (2009). [CrossRef]

**25. **J.-S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. **27**, 1144–1146 (2002). [CrossRef]

**26. **S.-H. Hong, J.-S. Jang, and B. Javidi, “Three-dimensional volumetric object reconstruction using computational integral imaging,” Opt. Express **12**, 483–491 (2004). [CrossRef] [PubMed]

**27. **B. Javidi, P. Refregier, and P. Willett, “Optimum receiver design for pattern recognition with nonoverlapping target and scene noise.” Opt Lett **18**, 1660 (1993). [CrossRef] [PubMed]

**28. **E. A. Richards, “Limitations in optical imaging devices at low light levels,” Appl. Opt. **8**, 1999–2005 (1969). [CrossRef] [PubMed]

**29. **P. Rfrgier, *Noise Theory and Application to Physics* (Springer, 2004), 1st ed.

**30. **A. Papoulis and S. Pillai, *Probability, Random Variables and Stochastic Processes* (McGraw Hill Higher Education, 2002), 4th ed.

**31. **R. J. Schalkoff, *Pattern Recognition: Statistical, Structural and Neural Approaches* (Wiley, 1991), 1st ed.