Three-dimensional distortion-tolerant object recognition using photon-counting integral imaging

Seokwon Yeom; Bahram Javidi; Edward Watson

doi:10.1364/OE.15.001513

1. Introduction

Automatic object recognition in scenes has been an interesting subject in military and industrial applications [1–16]. The aim of the automatic target recognition (ATR) system is to detect and identify objects in input scenes and label them as one of hypothesized classes. The discrimination capability of the recognition system is often challenged by arbitrary distortion of objects or the external noise of the sensor environment. Various imaging and signal processing techniques have been developed to improve the recognition performance. There are numerous techniques using two-dimensional (2D) object recognition in context, while recently there has been growing interest in three-dimensional (3D) imaging and recognition system. For example, we can utilize 3D complex images reconstructed at arbitrary depths and perspectives by means of digital holography [4,6].

Integral imaging (II) is a 3D sensing and visualization technique [17–20]. We record the directional information of rays using a micro-lenslet array. Each elemental image has its own perspective of objects according to the corresponding micro-lenslet. 3D scenes can be reconstructed optically or numerically in the opposite way of the recording [17,18]. The application of the II system has been extended to the object recognition and depth estimation [12–16].

There are various applications of photon-counting imaging such as night vision, laser radar imaging, radiological imaging, stellar imaging, and medical imaging [21–35]. Photon-counting imaging systems in general require less power than the conventional imaging systems that generate irradiance images. Image recognition with photon counting has been proposed in [23,24]. Recognition techniques with photon counting have been applied to infrared imaging and thermal imaging [25,26]. The classification of photon-limited images has been researched in [27,28], and 3D active sensing by LADAR [29–32]. In [33], maximum likelihood estimator was constructed to match photon-limited scenes for astronomical image reconstruction. Estimation of irradiance images has been researched in [34]. In [35], the Bhattacharyya distance is employed as an efficient merit for target detection in a low flux coherent active imagery. Recently, nonlinear matched filtering using photon-counting II has been developed in [16].

In this paper, we address a classification scheme for distorted objects by means of photon-counting II. Photon-counting linear discriminant analysis (LDA) is proposed to classify distorted objects. In photon-counting LDA, the irradiance images of the objects are used to train the classifier and pattern recognition is performed using the photon counts detected from unknown objects. The Fisher’s LDA has appeared in numerous researches for the classification of irradiance images [10–11,15]. The Fisher’s LDA maximizes the ratio of the between-class scatter to the within-class scatter [7–9]. The photon-counting LDA maximizes the Fisher’s criterion in the application of photon-limited images. It is shown that the Fisher’s criterion is solved in a slightly different way in the case of photon-limited images. By contrast to the Fisher’s LDA, the photon-counting LDA does not suffer from the singularity problem which is inevitable when the number of available data for the training is small. The photon-counting LDA determines linear hyper planes during the training on the feature space. The feature space is composed of the number of photo-counts per pixel. Thus, the dimension of the feature space is the same as the number of pixels in the image. It will be shown that an advantage of the photon-counting LDA is that it can handle such a high dimensional feature space without any dimensional reduction process. Although there have been investigations of nonlinear dimensional reduction or nonlinear kernel methods in the literature [36,37], the photon-counting LDA in this paper is shown to be robust using only linear boundaries in the high dimensional feature space. One may notice that there is difficulty in the high dimensional space for statistical learning which is called the curse of dimensionality (the required number of training data sets increases exponentially as the dimension increases). However, this notion becomes problematic not in the photo-counting LDA which is the parametric learning system, but in the non-parametric learning system [7,8].

Euclidean distance between unknown input vector and the trained class-conditional mean vectors is adopted for decision making. In the experiments, the 3D distortion is simulated by out-of-plane rotation as illustrated in Fig. 1 and photon-limited images are simulated from experimentally-captured irradiance images. The discrimination capability is evaluated by the correct and false classification rates as a function of varying number of photons, varying average irradiances. In addition we also consider an approximated within-class covariance estimation that is computationally more efficient. The performance is analyzed in terms of mean-squared distance (MSD) between the irradiance images of the distorted objects. We may anticipate that the smaller the distance between the images of the same classes and the larger the distance between the images of the different classes, the better the performance of the classifier. The experimental and simulation results show the photon-counting LDA can classify the distorted objects with a low level of photons.

The organization of the paper is as follows. In Section 2, we briefly review the advantages of II. The photon-counting model is described in Section 3. The photon-counting LDA with the overview of the Fisher’s LDA is presented in Section 4. The decision rule and performance analysis are discussed in Section 5. Experimental and simulation results are presented in Section 6. Conclusions follow in Section 7.

Fig. 1. A schematic diagram of the photon-counting II system for the distortion-tolerant object recognition.

Download Full Size | PDF

2. Overview of integral imaging

Integral imaging (II) or integral photography (IP) is a traditional 3D sensing and display technique [17–20]. A micro-lenslet or pinhole array is placed between 3D objects and an imaging sensor to sense directional irradiance information of rays from the objects as illustrated in Fig. 2(a). 3D display or reconstruction is the reversal process of the recording that the sensed image is illuminated from behind to form a 3D real image [See Fig. 2(b)]. The reconstruction can be performed optically or numerically. Figure 3 shows the multiple perspective imaging in ray optics. Each micro-lens in the lenslet array generates an elemental image on an imaging sensor which is placed on the image plane of the micro-lenses. An elemental image is a 2D image which has its own perspective and the field of view. Therefore, consecutive perspective scenes (a set of elemental images) are obtained by a single shot.

Fig. 2. (a) II sensing process, (b) II display process

Download Full Size | PDF

Fig. 3. Multiple perspective imaging in ray optics: each elemental image records the point A at the pixels which are located differently to its own optical axes (O₁~O₃) by rays 2,3,4, however, the same directional (parallel) information (rays 1,3,5) is recorded at the pixels of the same relative location.

Download Full Size | PDF

The scope of II has been extended to the object recognition and the depth estimation [12–16]. One advantage of II for 3D image recognition lies in its compactness of multiple-perspective imaging. Multi-view scenes of 3D objects are recorded by a single shot without using multiple-sensors or changing the position of the sensor. Therefore, we can build a compact system in capturing 3D information. In this paper, we assume that photon-limited scenes are generated by a photon-counting detector. Photon-limited scenes with multiple perspectives are recorded according to the corresponding lenslet as shown in Fig. 1.

3. Photon-counting detection model

The probability of counting y photons in a time interval τ can be shown to be Poisson distributed [21,22]. However, such a distribution assumes that the irradiance at the detector is perfectly uniform in time and space. Therefore, the statistics of the irradiance fluctuations should be considered if the irradiance is not uniform. However, for many cases of interest (i.e. blackbody radiation in the visible, including blackbody radiation from sources as hot as the sun) the fluctuations in irradiance are small compared to the fluctuations produced by the quantized nature of the radiation [22]. Therefore, we can assume that the probability distribution of photon-counting is Poisson distribution:

P_{d} (y) = \frac{{[aτ]}^{y} e^{- aτ}}{y!}, y = 0,1,2, . . .,

where y is the number of photon-counts produced by a detector during a time interval τ; and a is a rate parameter. The rate parameter can be given by

a = \frac{η P_{o}}{h \overset{ˉ}{ν}},

where η is the quantum efficiency of the detection process; P_o is the optical power incident on the detector; h is Plank’s constant; and ν̄ is the mean frequency of the quasi-monochromatic light source. The mean of photo-counts n_p is given by [38]:

n_{p} = E_{y} (y) = aτ = \frac{η P_{o}}{h \overset{ˉ}{ν}} τ,

where E(·) denotes the expectation operator.

Without the loss of generality we can assume that Poisson parameter for the photon-counts at each pixel is proportional to the irradiance of the pixel in a detector [16]. Therefore, the probability of photon event at pixel i can be given by

P_{d} (y_{i}; n_{p} (i)) = \frac{n_{p} {(i)}^{y_{i}} e^{{- n}_{p} (i)}}{y_{i}!}, y_{i} = 0,1,2, . . .,

where y_i is the number of photons detected at pixel i. The parameter n_p(i) is given by

n_{p} (i) = N_{P} x_{i},

where N_P is an expected number of photon-counts in the scene; x_i is normalized irradiance at NT pixel i such that $\sum_{i = 1}^{N_{T}} x_{i} = 1; N_{T}$ is the total number of pixels in the scene, that is, N_T = N_x×N_y; and N_x and N_y are the size of the image in the x and y directions, respectively.

We can define a random vector y with the number of photon-counts of all pixels as

y = {[y_{1} \dots y_{N_{T}}]}^{t},

where each component of y follows independent Poisson distribution in Eq. (4); and the superscript t denotes transpose. The conditional mean vector and conditional covariance matrix of y given x, respectively become

μ_{y ∣ x} = E_{y ∣ x} (y ∣ x) = N_{p} x = N_{p} {[x_{1} \dots x_{N_{T}}]}^{t},

and

Σ_{y y ∣ x} = E_{y ∣ x} [(y - μ_{y ∣ x}) {(y - μ_{y ∣ x})}^{t} | x] = diag (μ_{y ∣ x}) = N_{p} diag (x),

where diag(·) denotes the diagonal matrix operator.

4. Classification of photon-limited images

In this section, first, the Fisher’s LDA is overviewed, and then, a new approach of the photon-counting LDA is discussed. Lastly, the parameter estimation is presented.

4.1 Overview of Fisher’s LDA

In conventional image classification, we often deal with irradiance images. Let a column vector composed of the irradiance values of the pixels be one realization of a random vector x∈R ^d×1, where R ^d×1 is d-dimensional Euclidean space; and d is the same with the number of pixels (N_T). The Fisher’s LDA maximizes the ratio of determinant of between-class covariance matrix to determinant of within-class covariance matrix. The within-class covariance is a measure of the concentration of each class, which is defined as

Σ_{xx}^{W} = E_{w_{j}} {E_{x ∣ w_{j}} [(x - μ_{x ∣ w_{j}}) {(x - μ_{x ∣ w_{j}})}^{t} ∣ w_{j}]},

where w_j represents the event that the random vector x is a member of class j; and μ _{x|w_j}is the class-conditional mean vector of x:

μ_{x | w_{j}} = E_{x | w_{j}} (x | w_{j}) .

The between-class covariance is a measure of the separations of the classes. The between-class covariance matrix is defined as

\sum_{xx}^{B} = E_{w_{j}} [(μ_{x | w_{j}} - μ_{x}) {(μ_{x ∣ w_{j}} - μ_{x})}^{t}],

where μ _x is the mean vector of x:

μ_{x} = E_{x} (x) .

It is noted that ∑_xx=∑^W_xx+∑^B_xx, where ∑_xx is the covariance matrix of x:

\sum_{xx} = E_{x} [(x - μ_{x}) {(x - μ_{x})}^{t}] .

The Fisher’s LDA can be implemented as

z = W_{F}^{t} x,

where the linear projector W_F∈R ^d×r satisfies the following Fisher’s criterion:

W_{F} = \underset{W \in R^{d \times r}}{\arg \max} \frac{∣ W^{t} \sum_{xx}^{B} W ∣}{∣ W^{t} \sum_{xx}^{W} W ∣} .

It is well known that Eq. (15) is equivalent with the generalized eigenvalues problem. The column vectors of W_F are eigenvectors of (∑^W_xx)^-1∑^B_xx corresponding to non-zero eigenvalues of (∑^W_xx)^-1∑^B_xx. It is noted that rank(∑_B)≤min[n_c-1,d]. Therefore W_F is composed of at most n_c-1 orthogonal vectors when d is larger than n_c-1. The maximum value of Eq. (15), |W^t_F∑^B_xxW_F|/|W^t_F∑^W_xxW_F| is equal to the summation of non-zero eigenvalues of (∑^W_xx)^-1∑^B_xx.

4.2 Photon-counting LDA

For photon-limited images, a random vector y is composed of numbers of photons detected. Let one realization of the random vector y be photon events at all pixels as in Eq. (6). Each component of y follows independent Poisson distribution. We find the following relationships of the first and the second moments between the irradiance random vector x and the photon event random vector y [See Appendix A]:

μ_{y} = N_{p} μ_{x},

and

\sum_{yy} = N_{p} diag (μ_{x}) + N_{p}^{2} \sum_{xx} .

The within-class covariance matrix and the between-class covariance matrixes of y are, respectively derived as [See Appendix A]:

\sum_{yy}^{W} = N_{p} diag (μ_{x}) + N_{p}^{2} \sum_{xx}^{W},

and

\sum_{yy}^{B} = N_{p}^{2} \sum_{xx}^{B} .

As in the Fisher’s LDA, the photon-counting linear discriminant function can be defined as

z = W_{p}^{t} y,

where w_P maximizes the following criterion:

W_{p} = \underset{W \in R^{d \times r}}{\arg max} \frac{∣ W^{t} \sum_{yy}^{B} W ∣}{∣ W^{t} \sum_{yy}^{W} W ∣}

It is noted that Eq. (21) is solved the same as in Eq. (15). The column vectors of W_F are eigenvectors of (∑^W_yy)^-1∑^B_yy corresponding to non-zero eigenvalues of (∑^W_yy)^-1∑^B_yy. The rank of ∑^B_yy is the same with that of ∑^B_xx. The maximum value of Eq. (21), |W^t_p∑^B_yyW_p|/|W^t_p∑^W_yyW_p| is equal to the summation of non-zero eigenvalues of {W^t_p[diag(μ _x)/N_p+∑^W_xx]W_p}^-1∑^B_xx. Also note that as N_p becomes large, Eq. (21) becomes the same as Eq. (15).

4.3 Parameter estimation

In the case that no probability information of x is available (true for many of the practical classification examples we encounter), needed parameters should be estimated in a proper way. In the following, the conventional estimators of the within-class covariance matrix and the between-class covariance matrix of the Fisher’s LDA are described, and then they are applied to the photon-counting LDA.

The within-class covariance matrix can be estimated by $\sum^{^}$ ^W_xx, which is derived in Appendix B:

{\sum^{̂}}_{xx}^{W} = \frac{1}{n_{t}} \sum_{j = 1}^{n_{c}} \sum_{n = 1}^{n_{j}} (x_{j} (n) - {\hat{μ}}_{x ∣ w_{j}}) {(x_{j} (n) - {\hat{μ}}_{x ∣ w_{j}})}^{t},

where x _j(n) is the n-th realization of the object that belongs to class j; n_c is the number of classes; n_j is the number of realizations in class j; and n_t is the total number of realizations used to train the classifier, i.e., $n_{t} = \sum_{j = 1}^{n_{c}} n_{j}$ ; and μ̂_{x|w_j} is the class-conditional sample mean vector for class j:

{\hat{μ}}_{x ∣ w_{j}} = \frac{1}{n_{j}} \sum_{n = 1}^{n_{j}} x_{j} (n) .

The between-class covariance matrix can be estimated by ∑^B_xx [See Appendix B]:

{\sum^{̂}}_{xx}^{B} = \frac{1}{n_{t}} \sum_{j = 1}^{n_{c}} n_{j} ({\hat{μ}}_{x ∣ w_{j}} - {\hat{μ}}_{x}) {({\hat{μ}}_{x ∣ w_{j}} - {\hat{μ}}_{x})}^{t},

where μ̂_x. is the sample mean vector of x:

{\hat{μ}}_{x} = \frac{1}{n_{t}} \sum_{j = 1}^{n_{c}} \sum_{n = 1}^{n_{j}} x_{j} (n) .

Similarly, the within-class covariance matrix and the between-class covariance matrix for the photon-counting LDA can be estimated by [See Appendix B]:

{\sum^{̂}}_{yy}^{B} = N_{p}^{2} {\sum^{̂}}_{xx}^{B},

and

{\sum^{̂}}_{yy}^{W} = N_{p} diag ({\hat{μ}}_{x}) + N_{p}^{2} \sum_{xx}^{W} .

For the Fisher’s LDA of irradiance images, the number of images used in the training process n_t is usually smaller than the dimension d of the image. Therefore, $\sum^{^}$ ^W_xx is a singular matrix since rank( $\sum^{^}$ ^W_xx)≤min[n_t-n_c,d]. To overcome the singularity problem of $\sum^{^}$ ^W_xx, dimension reduction techniques such as principal component analysis (PCA) can be applied prior to the Fisher’s LDA [10,11,15] while direct implementation methods have been developed in [39]. However, as shown in Eq. (27), $\sum^{^}$ ^W_yy is nonsingular with the non-zero components of μ̂_x, therefore no additional process is required to avoid the singularity problem.

It is also noted that the sample covariance $\sum^{^}$ _{xx|w_j} in Appendix B may not be a proper estimator when n_j≤d since $\sum^{^}$ _{xx|w_j} is not a positive definite matrix which is the nature of the covariance matrix of any nonsingular random vector x. In the literature, there have been efforts to estimate the covariance matrix with a small sample (available training data) size and a large scale (dimension) [40]. For the photon-counting LDA, the within-class covariance matrix estimator $\sum^{^}$ ^W_yy in Eq. (27) is more governed by the diagonal components since it is anticipated that the covariance between the irradiance values of two different components (pixels) is much smaller than the mean values and closer to zero when they are further apart. Therefore, one may postulate that the following approximation holds:

{\sum^{̂}}_{yy}^{W} \approx N_{p} diag ({\hat{μ}}_{x}) + N_{p}^{2} diag ({[{\hat{σ}}_{W, 1}^{2} \dots {\hat{σ}}_{W, d}^{2}]}^{t}),

and

{\hat{σ}}_{W, i}^{2} = \frac{1}{n_{t}} \sum_{j = 1}^{n_{c}} \sum_{n = 1}^{n_{j}} {∣ x_{j, i} (n) - {\hat{μ}}_{x | w_{j}, i} ∣}^{2},

where x _j,i(n) and μĮ_{x|w_j,i}. are the i-th component of the n-th training vector x _j(n) and the class-conditional sample mean vector μ̂_{x|w_j}, respectively, that is, ∑^W_yy in Eq. (28) is approximated by merely adopting the diagonal components of $\sum^{^}$ ^W_yy in Eq. (27). It is noted that the computational complexity of the inversion of Eq. (28) is significantly lower than that of Eq. (27) when d is large. In the experiments described in Section 6 we compare the performance of the photon-counting LDA between using Eq. (28) and Eq. (27).

5. Decision rule and performance evaluation

In this section, the decision rule and measures for performance evaluation and analysis are discussed. Our decision rule adopts Euclidean distance between unknown input vector and the trained class-conditional mean vectors. In the experiments, the photon-limited integral image (a set of elemental images from one lenslet array) is obtained from different object orientations. The random vectors x and y, respectively are the irradiance and the photon event vectors corresponding to one elemental image. Therefore, multiple-perspective training data of a specific orientation can be constituted by a single exposure. During the test, the multiple photon event vectors are used to take advantage of the multiple perspective imaging, thus, the test vector for an unknown input scene is

z_{test} = W_{p}^{t} \sum_{n = 1}^{n_{test}} y_{test} (n),

where y _test(n) is the photon event vector corresponding to the n-th photon-limited elemental image and n_test is the number of elemental images tested. The images used for testing are gathered from unknown rotations of object.

We classify a vector z _test as the member of class ĵ if

\hat{j} = \underset{j = 1, \dots, n_{c}}{\arg min} ∥ z_{test} - {\hat{μ}}_{z ∣ w_{j}} ∥,

where ∥·∥ stands for Euclidean norm; and μ̂_{z|w_j} is the estimate of the class-conditional mean vector. Assuming the distribution of y _test is of the same as the distribution of the images y used for training, we can show that

{\hat{μ}}_{z ∣ w_{j}} = n_{test} W_{p}^{t} {\hat{μ}}_{y ∣ w_{j}}

To evaluate the performance, two performance measures are calculated: correct classification rate and false classification rate which are, respectively defined as

r_{c} (j) = \frac{Number of decision for class j}{Number of test images in class j},

and

r_{f} (j) = \frac{Number of decision for class j, but are not in class j}{Number of test images in all classes except for class j} .

The performance is analyzed in terms of MSD between elemental images used for training and elemental images used testing. The MSD between two random vectors is defined as

MSD (j; s, i_{s}) = E {∥ x_{j} - x_{test} (s, i_{s}) ∥}^{2},

and estimated as

M \hat{S} D (j; s, i_{s}) = \frac{1}{n_{j} n_{test}} \sum_{n = 1}^{n_{j}} \sum_{m = 1}^{n_{test}} {∥ x_{j} (n) - x_{test} (m; s, i_{s}) ∥}^{2},

where x _j is the irradiance random vector for training representing class j; x _j(n) is the n-th realization of x _j; x _test(s,i_s) is the irradiance random vector for testing labeled as class s and rotation angle i_s; and x _test(m;s,i_s) is the m-th realization of x _test(s,i_s). In the experiments, the rotation angle for training is fixed. It is noted that the irradiance vectors in Eq. (36) are the normalized elemental images used for training and testing due to the normalized condition of the irradiance image in Eq. (5).

6. Experimental and simulation results

In this section, we present the experimental and simulation results. The experiments and simulation consist of (1) capturing irradiance of the integral images of objects with varying orientations, (2) aligning elemental images and training the estimator with the photon-counting LDA, and (3) simulating photon-counts and testing them with the NN decision rule. The validity of Eq. (28) and the effects of the average irradiance variation are experimented. The performances are analyzed in terms of MSD in Eq. (36).

6.1 Integral imaging acquisition and preprocessing

The optical set-up is composed of a micro-lenslet array, an imaging lens, and a CCD camera. The focal length of each micro lenslet is about 3 mm, the focal length of the imaging lens is 50 mm, and the f-number of the imaging lens is 2.5. The imaging lens is placed between the lenslet array and the CCD camera due to the short focal length of the lenslets. Three classes of toy cars are used in the experiments as shown in Fig. 4. The size of three toy cars is about 2.5 cm×2.5 cm×4.5 cm. The distance between the CCD camera and the imaging lens is about 7.2 cm, and the distance between the micro-lenslet array and the imaging lens is about 2.9 cm. Integral images of the toy cars are gathered at rotation angles of 30°, 33°, 36°, 39°, 42°, and 45°. Rotation is with respect to the perpendicular to the optical axis of the micro-lenslet array. Thus, six integral images for each toy car are obtained; one at each of the six different out-of-plane rotation angles. Captured irradiance images are the same ones in [15] except that 30 (5 by 6) elemental images located in the center are used in this paper.

Before simulating photon-limited images, each elemental image is aligned and cropped. The alignment is the pre-process to enhance the system performance. Through the alignment, we aim to minimize the MSD in the same class. All elemental images are shifted by maximizing cross-correlation coefficients as in [15]. The reference elemental image for the alignment is the central elemental image in the integral image of the object rotated at 36° for each class. After the alignment, each elemental image is cropped by 60×125 pixels considering the computational load and accuracy of computing the linear discriminant function. Therefore, the dimension (d) of the vectors x and y is 7500 (=60×125). The sizes of the integral image in the row and the column directions are 300 (=60×5) and 750 (=125×6), respectively. Figure 5 shows the movies of 6 integral images (frames) for each toy car with the out-of-plane rotation. In a practical situation, photon-limited elemental images can be aligned and cropped by a pre-process like non-linear correlation filtering of photon events [16].

Fig. 4. Three toy cars used in the experiments; they represent class 1, 2 and 3 from right to left.
Download Full Size | PDF

Fig. 5. Movies of integral images of three objects with the out-of-plane rotation, (a) (the movie file size: 3.64 MB) class 1, [Media 1] (b) (the movie file size: 3.64 MB) class 2, [Media 2] (c) (the movie file size: 3.64 MB) class 3. [Media 3]
Download Full Size | PDF

6.2 Classification results and performance analysis

Our goal is to categorize each integral image into its correct class of the toy cars. For training, only one integral image is used for each class (object). That image is the one associated with a rotation angle of 36°. It corresponds to the third frame of each movie in Fig. 5. The other five integral images for each class are used only for testing. The integral image used for training in each class is composed of 30 elemental images, thus, the number of vectors (n_j) associated with training from each integral image is 30. The number of classes (n_c) is 3 so the total number of vectors (n_t) used in training the estimator is 90 (=30×3).

For the test, all of 18 integral images are used, including the three integral images used in training. Each integral image is considered an unknown input scene. 1000 photon-counting scenes are generated for each integral image and the correct and false classification rates in Eqs. (33) and (34) are obtained from these 1000 realizations. The photon number is simulated by the Poisson random number generator in MATLAB with N_p = 3 for each elemental image normalized. The number of test elemental images, n_test in Eqs. (30) and (36) is 30 since each integral image is composed of 30 elemental images. Therefore, the mean number of photon-counts (N_p×n_test) in the entire scene (integral image) is 90 (=3×30). The averaged classification results are illustrated in Fig. 6.

The training and test are repeated with N_p = 5 and 10 when the corresponding mean numbers of photon-counts are 150 and 300, respectively. The averaged classification results are presented in Figs. 7 and 8, respectively. As illustrated in Figs. 6 to 8, a low level of photons can classify the distorted objects. The averaged correct classification rates increase when a larger number of photons are used while the averaged false classification rates decrease. Figure 9 shows an example of the test input scene (photon-limited integral image) with N_p = 10. Figure 9 is generated from the integral image of the class 1 with the rotation angle 30° which corresponds to the first frame of the movie in Fig. 5(a). The actual number of photons in Fig. 9 is 289.

Fig. 6. Classification results when the mean photon number in the test scene is 90. (a) averaged correct classification rate for each class over 1000 runs, (b) averaged false classification rate for each class over 1000 runs. ‘Avg’ denotes the average value of 6 rotation angles.
Download Full Size | PDF

Fig. 7. Classification results when the mean photon number in the test scene is 150. (a) averaged correct classification rate for each class over 1000 runs, (b) averaged false classification rate for each class over 1000 runs. ‘Avg’ denotes the average value of 6 rotation angles.
Download Full Size | PDF

Fig. 8. Classification results when the mean photon number in the test scene is 300. (a) averaged correct classification rate for each class over 1000 runs, (b) averaged false classification rate for each class over 1000 runs. ‘Avg’ denotes the average value of 6 rotation angles.
Download Full Size | PDF

Fig. 9. An example of the test input scene corresponding to class 1 with the rotation angle 30°, the mean photon number in the test scene is 300, and the actual number of photon-counts in this scene is 289.
Download Full Size | PDF

The performance is analyzed in terms of the MSD in Eq. (36). The different classification results between the classes and the rotation angles may be correlated with the MSD between the irradiance images. Figure 10 shows the MSD between the elemental images used for training and elemental images used for testing. The MSD is computed for images within the same and different classes. Overall performance of the class 3 is shown to be the best since the MSD between class 3 and other classes are larger than other MSD’s as illustrated in Figs. 10(a) and 10(b). For the correct classification rates, the performance of the class 1 is worse than others since the MSD between the class 1 and other classes are smaller as illustrated in Figs. 10(b) and 10(c).

Fig. 10. MSD in Eq. (36) when (a) j = 1, (b) j = 2, (c) j = 3.
Download Full Size | PDF

Figure 11 shows the averaged classification results when the approximated within-class covariance estimator in Eq. (28) is used. The mean number of photon-counts in the test scene is 150 (N_p = 5), which is the same as in Fig. 7. Figure 11 shows very similar performance with Fig. 7.

Fig. 11. Classification results when the mean photon number in the test scene is 150, Eq. (28) is used for the within-class covariance estimation, (a) averaged correct classification rate for each class over 1000 runs, (b) averaged false classification rate for each class over 1000 runs, ‘Avg’ denotes the average value of 6 rotation angles.
Download Full Size | PDF

6.3 Classification results with the average irradiance variation

In a practical situation, the average irradiance might vary. Even though the irradiances are normalized, the resulting histogram of the normalized images may narrow or widen as a result of the change in average irradiance. Therefore, we performed experiments in which the average irradiance is increased. In the first experiment, it is assumed that the increased average irradiance is known, and in the second experiment, the average irradiance is unknown.

The irradiance values of the integral images in Fig. 5 are gray scaled between 0 and 255. The average irradiance values of three trained integral images are 129.86, 127.26, and 129.84, respectively. In the first experiment, the irradiance values of all the pixels are increased by 50 before normalizing elemental images, and the classification performance is tested with N_p = 10 over 1000 runs, that is, the mean photon number in the test input scene is 300. Figure 12 shows the averaged classification results and Fig. 13 shows the MSD between the elemental images used for training and the elemental images used for testing. As shown in Fig. 12, the performance is degraded compared with Fig. 8. The smaller MSD presented in Fig. 13 than in Fig. 10 may cause the performance degradation. Figure 14 shows the classification results with the approximated within-class covariance estimator in Eq. (28), showing very similar results with Fig. 12.

Fig. 12. Classification results when the mean photon number in the test scene is 300, (a) averaged correct classification rate for each class over 1000 runs, (b) averaged false classification rate for each class over 1000 runs, ‘Avg’ denotes the average value of 6 rotation angles.
Download Full Size | PDF

Fig. 13. MSD in Eq. (36) when (a) j = 1, (b) j = 2, (c) j = 3.
Download Full Size | PDF

Fig. 14. Classification results when the mean photon number in the test scene is 300, Eq. (28) is used for the within-class covariance estimation, (a) averaged correct classification rate for each class over 1000 runs, (b) averaged false classification rate for each class over 1000 runs, ‘Avg’ denotes the average value of 6 rotation angles.
Download Full Size | PDF

There might be the case that the average irradiance in the scenes is unexpectedly changed. In the following experiments, the performance of the photon-counting LDA is empirically tested in the case of the unknown average irradiance variation. The average irradiance is increased by 50 only for the test input integral images before the normalization of the elemental images and the photon-counts simulation while the trained images are the same ones in subsection 6b. Therefore, there is significant average irradiance variation between the trained scenes and test scenes. Figure 15 shows the averaged classification results over the all rotation angles and 1000 runs. The mean photon number in the test scene varies from 90, 450, 900, 4500, and 9000, that correspond to N_p = 3, 15, 30, 150, and 300, respectively. As illustrated in Fig. 15, when the mean photon number is 900, reliable classification results (more than about 0.95 for the correct classification rate and less than about 0.05 for the false classification rate) are obtained for all the classes. It is noted that the mean photon number 900 in the entire scene can be considered a very low level of photons (900 photons at an average wavelength of 4000nm represents an energy of less than 10^-16 J incident on the focal plane). The experimental results with the approximated within-class covariance matrix estimator in Eq. (28) are shown in Fig. 16 showing very similar results with Fig. 15. From Figs. 15 and 16, the photon-counting LDA is shown to overcome the unknown average irradiance problem with a moderate number of photon-counts. It can be thought that with a considerably large expected photon number N_p, the photon-counting LDA criterion in Eq. (21) becomes closer to the Fisher’s criterion in Eq. (15), and the effect of the average irradiance variation can be approximately compensated by the ratio of the between-class covariance matrix to the within-class covariance matrix in Eq. (15). However, because the class-condition mean vectors in the decision rule in Eq. (31) are estimated without the average irradiance variation, the performance can be degraded as shown in Figs. 15 and 16.

Fig. 15. Classification results when the average irradiance is increased by 50 only for the test images, (a) averaged correct classification rates versus the mean photon numbers, (b) averaged false classification rates versus the mean photon numbers.
Download Full Size | PDF

Fig. 16. Classification results when the average irradiance is increased by 50 only for the test images, Eq. (28) is used for the within-class covariance estimation, (a) averaged correct classification rates versus the mean photon numbers, (b) averaged false classification rates versus the mean photon numbers.
Download Full Size | PDF

7. Conclusions

In this paper, we propose a distortion-tolerant automatic recognition system using the photon-counting integral imaging. The photon-counting detector combined with the micro-lenslet array generates photon-limited multi-view scenes. Photon events are modeled by Poisson distribution. A new approach of the photon-counting LDA is proposed for the classification of photon-limited images. In the photon-counting LDA, the irradiance values are used for training while photon-limited images are tested to classify unknown input objects. A compact 3D information processing is possible and the performance can be enhanced by means of the multiple perspective photon-limited scenes of II.

It may be possible to extend the proposed photon-counting 3D image recognition system to broader applications including ordinary irradiance imaging since the photon-limited images can be extracted from the irradiance values by either reducing the input light and/or using the Poisson model for photon counting. In that case, we may take advantage of the low number of photons generated by the photon-counting detector to improve the computational efficiency of the recognition process. Further experimentations are necessary to investigate the potential benefit of the proposed method to a broad class of image recognition systems.

Appendix A

We prove the mean vector of y in Eq. (16) as follows:

μ_{y} = E_{x} [E_{y ∣ x} (y ∣ x)]

since μ _y|x=N_p x in Eq. (7). Similarly, for the class-conditional mean vector of y, it can be shown that

μ_{{y | w}_{j}} = N_{p} μ_{{x | w}_{j}} .

The covariance matrix of y in Eq. (17) can be proved as follows:

Σ_{yy} = E_{y} [{(y - μ_{y}) (y - μ_{y})}^{t}]

Eqs. (A1) and (A3) are different from Eqs. (7) and (8), which are derived in case that x is given. It is noted that, for the deterministic value of x, Eqs. (7) and (8) are the same with Eqs. (A1) and (A3), respectively since μ _x = x and ∑_xx = 0. Similarly, for the class-conditional covariance matrix of y, we can show that

Σ_{{yy ∣ w}_{j}} = E_{{y ∣ w}_{j}} [{(y - μ_{{y ∣ w}_{j}}) (y - μ_{{y ∣ w}_{j}})}^{t} ∣ w_{j}]

Therefore, the within-class covariance of y in Eq. (18) can be derived as follows:

Σ_{yy}^{W} = E_{w_{j}} (E_{{y ∣ w}_{j}} [{(y - μ_{{y ∣ w}_{j}}) (y - μ_{{y ∣ w}_{j}})}^{t} | w_{j}])

since μ _x = E_{w_j}(μ _{x|w_j}) and ∑^W_xx = E_{w_j}(∑_{xx|w_j}).

For the between-class covariance matrix of y in Eq. (19), we can prove that

Σ_{yy}^{B} = E_{w_{j}} [{(μ_{{y ∣ w}_{j}} - μ_{y}) (μ_{{y ∣ w}_{j}} - μ_{y})}^{t}],

since μ _y = N_p μ _x and μ _{y|w_j}=N_p μ _{x|w_j} in Eqs. (A1) and (A2), respectively. It is manifest that ∑_yy = ∑^W_yy+∑^B_yy.

Appendix B

First, we estimate the class-conditional mean vector of x by the sample mean vector as

{\hat{μ}}_{{x ∣ w}_{j}} = \frac{1}{n_{j}} \sum_{n = 1}^{n_{j}} x_{j} (n),

and the class-conditional covariance matrix of x can be estimated by the class-conditional sample covariance matrix as

{\hat{Σ}}_{{xx ∣ w}_{j}} = \frac{1}{n_{j}} \sum_{n = 1}^{n_{j}} {(x_{j} (n) - {\hat{μ}}_{{x ∣ w}_{j}}) (x_{j} (n) - {\hat{μ}}_{{x ∣ w}_{j}})}^{t} .

Eqs. (B1) and (B2) are the maximum likelihood estimation (MLE) of parameters for Gaussian distribution which is assumed for class j [41–43].

For the within-class covariance matrix of x, we can show that

\sum_{xx}^{W} = W_{w_{j}} {E_{x ∣ w_{j}} [(x - μ_{x ∣ w_{j}}) {(x - μ_{x ∣ w_{j}})}^{t} ∣ w_{j}]}

where π_j is the probability mass function of the event W_j. which is the rate of observations of class j:

π_{j} = \frac{n_{j}}{n_{t}} .

Therefore, by the invariant property of MLE [41–43], the MLE of ∑^W_xx is

{\hat{Σ}}_{xx}^{W} = \sum_{j = 1}^{n_{c}} π_{j} {\hat{Σ}}_{{xx | w}_{j}}

For the mean vector and the between-class covariance matrix of x, the following relationships hold, respectively:

μ_{x} = E_{w_{j}} [E_{{x | w}_{j}} (x ∣ w_{j})]

and

Σ_{xx}^{B} = E_{w_{j}} [{(μ_{{x ∣ w}_{j}} - μ_{x}) (μ_{{x ∣ w}_{j}} - μ_{x})}^{t}]

The MLE’s of μ _x and ∑^B_xx can be, respectively derived by Eqs. (B1), (B4), (B6) and (B7), and the invariant property of MLE as

{\hat{μ}}_{x} = \sum_{j = 1}^{w_{j}} π_{j} {\hat{μ}}_{{x ∣ w}_{j}}

and

{\hat{Σ}}_{xx}^{B} = \sum_{j = 1}^{n_{c}} π_{j} {({\hat{μ}}_{{x ∣ w}_{j}} - {\hat{μ}}_{x}) ({\hat{μ}}_{{x ∣ w}_{j}} - {\hat{μ}}_{x})}^{t}

For the photon-counting LDA, the MLE of the within-class covariance matrix and the between-class covariance matrix in Eqs. (26) and (27) can be derived by Eqs. (B5), (B8), and (B9), and the invariant property of MLE.

Acknowledgments

Authors are thankful to anonymous reviewers for their valuable comments.

References and links

1. A. Mahalanobis and F. Goudail, “Methods for automatic target recognition by use of electro-optic sensors: introduction to the feature issue,” Appl. Opt. 43,207–209 (2004). [CrossRef]

2. P. Refregier, Noise Theory and Applications to Physics (Springer, 2004).

3. F. A. Sadjadi, ed., Selected Papers on Automatic Target Recognition (SPIE-CDROM, 1999).

4. B. Javidi, ed., Image Recognition and Classification: Algorithms, Systems, and Applications (Marcel Dekker, New York, 2002). [CrossRef]

5. H. Kwon and N. M. Nasrabadi, “Kernel RX-algorithm: a nonlinear anomaly detector for hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens. 43,388–397 (2005). [CrossRef]

6. B. Javidi, ed., Optical Imaging Sensors and Systems for Homeland Security Applications (Springer, NewYork, 2005).

7. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification 2^nd ed. (Wiley Interscience, New York, 2001). [PubMed]

8. K. Fukunaga, Introduction to Statistical Pattern Recognition 2^nd ed. (Academic Press, Boston, 1990). [PubMed]

9. C. M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, New York, 1995).

10. D. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Trans. Pattern. Anal. Mach. Intell. 18,831–836 (1996). [CrossRef]

11. P. N. Belhumer, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,” IEEE Trans. Pattern. Anal. Mach. Intell. 19,711–720 (1997). [CrossRef]

12. O. Matoba, E. Tajahuerce, and B. Javidi, “Real-time three-dimensional object recognition with multiple perspectives imaging,” Appl. Opt. 40,3318–3325 (2001). [CrossRef]

13. Y. Frauel and B. Javidi, “Digital three-dimensional image correlation by use of computer-reconstructed integral imaging,” Appl. Opt. 41,5488–5496 (2002). [CrossRef] [PubMed]

14. S. Kishk and B. Javidi, “Improved resolution 3D object sensing and recognition using time multiplexed computational integral imaging,” Opt. Express 11,3528–3541 (2003). [CrossRef] [PubMed]

15. S. Yeom and B. Javidi, “Three-dimensional distortion tolerant object recognition using integral imaging,” Opt. Express 12,5795–5809 (2004). [CrossRef] [PubMed]

16. S. Yeom, B. Javidi, and E. Watson, “Photon counting passive 3D image sensing for automatic target recognition,” Opt. Express 13,9310–9330 (2005). [CrossRef] [PubMed]

17. B. Javidi and F. Okano, eds., Three-dimensional Television, Video, and Display Technologies (Springer, New York, 2002).

18. J.-S. Jang and B. Javidi, “Time-multiplexed integral imaging for 3D sensing and display,” Optics and Photonics News15,36–43 (2004). http://www.osa-opn.org/abstract.cfm?URI=OPN-15-4-36.

19. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36,1598–1603 (1997). [CrossRef] [PubMed]

20. M. Martínez-Corral, B. Javidi, R. Martínez-Cuenca, and G. Saavedra, “Multifacet structure of observed reconstructed integral images,” J. Opt. Soc. Am. A. 22,597–603 (2005). [CrossRef]

21. E. Hecht, Optics 4^th ed. (Addison Wesley, 2001). [PubMed]

22. J. W. Goodman, Statistical Optics (Jonh Wiley & Sons Inc., 1985), Chap 9.

23. G. M. Morris, “Scene matching using photon-limited images,” J. Opt. Soc. Am. A. 1,482–488 (1984). [CrossRef]

24. G. M. Morris, “Image correlation at low light levels: a computer simulation,” Appl. Opt. 23,3152–3159 (1984). [CrossRef] [PubMed]

25. E. A. Watson and G. M. Morris, “Comparison of infrared upconversion methods for photon-limited imaging,” J. Appl. Phys. 67,6075–6084 (1990). [CrossRef]

26. E. A. Watson and G. M. Morris, “Imaging thermal objects with photon-counting detector,” Appl. Opt. 31,4751–4757 (1992). [CrossRef] [PubMed]

27. M. N. Wernick and G. M. Morris, “Image classification at low light levels” J. Opt. Soc. Am. A. 3,2179–2187 (1986). [CrossRef]

28. L. A. Saaf and G. M. Morris, “Photon-limited image classification with a feedforward neural network,” Appl. Opt. 34,3963–3970 (1995). [CrossRef] [PubMed]

29. D. Stucki, G. Ribordy, A. Stefanov, H. Zbinden, J. G. Rarity, and T. Wall, “Photon counting for quantum key distribution with Peltier cooled InGaAs/InP APDs,” J. Mod. Opt. 48,1967–1981 (2001). [CrossRef]

30. P. A. Hiskett, G. S. Buller, A. Y. Loudon, J. M. Smith, I Gontijo, A. C. Walker, P. D. Townsend, and M. J. Robertson, “Performance and design of InGaAs/InP photodiodes for single-photon counting at 1.55 um,” Appl. Opt. 39,6818–6829 (2000). [CrossRef]

31. L. Duraffourg, J.-M. Merolla, J.-P. Goedgebuer, N. Butterlin, and W. Rhods, “Photon counting in the 1540-nm wavelength region with a Germanium avalanche photodiode,” IEEE J. Quantum Electron. 37,75–79 (2001). [CrossRef]

32. J. G. Rarity, T. E. Wall, K. D. Ridley, P. C. M. Owens, and P. R. Tapster, “Single-photon counting for the 1300-1600-nm range by use of Peltier-cooled and passively quenched InGaAs avalanche photodiodes,” Appl. Opt. 39,6746–6753 (2000). [CrossRef]

33. M. Guillaume, P. Melon, and P. Refregier, “Maximum-likelihood estimation of an astronomical image from a sequence at low photon levels,” J. Opt. Soc. Am. A. 15,2841–2848 (1998). [CrossRef]

34. K. E. Timmermann and R. D. Nowak, “Multiscale modeling and estimation of Poisson processes with application to photon-limited imaging,” IEEE Trans. Infor. Theor. 45,846–862 (1999). [CrossRef]

35. Ph. Refregier, F. Goudail, and G. Delyon, “Photon noise effect on detection in coherent active images,” Opt. Lett. 29,162–164 (2004). [CrossRef] [PubMed]

36. K-R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, “An introduction to kernel-based learning algorithms,” IEEE Trans. Neural Networks 12,181–201 (2001). [CrossRef]

37. A. Ruiz and P. E. Lopez-de-Teruel, “Nonlinear kernel-based statistical pattern analysis,” IEEE Trans. Neural Networks 12,16–32 (2001). [CrossRef]

38. A. Papoulis, Probability, Random Variables, and Stochastic Processes 3^rd ed. (McGraw-Hill, Inc. 1991). [PubMed]

39. Y. Cheng, Y. Zhuang, and J. Yang, “Optimal Fisher discriminant analysis using the rank decomposition,” Patt. Recog. 25,101–111 (1992). [CrossRef]

40. J. Schäfer and K. Strimmer, “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics,” Statistical applications in genetics and molecular biology ,4, 32.1–30 (2005). [CrossRef]

41. S. M. Kay, Fundamentals of Statistical Signal Processing (Prentice Hall, New Jersey, 1993).

42. N. Mukhopadhyay, Probability and Statistical Inference (Marcel Dekker, Inc. New York, 2000).

43. N. Ravishanker and D. K. Dey, A First Course in Linear Model Theory (Chapman & Hall/CRC, 2002).

Three-dimensional distortion-tolerant object recognition using photon-counting integral imaging

Abstract

1. Introduction

2. Overview of integral imaging

3. Photon-counting detection model

4. Classification of photon-limited images

4.1 Overview of Fisher’s LDA

4.2 Photon-counting LDA

4.3 Parameter estimation

5. Decision rule and performance evaluation

6. Experimental and simulation results

6.1 Integral imaging acquisition and preprocessing

6.2 Classification results and performance analysis

6.3 Classification results with the average irradiance variation

7. Conclusions

Appendix A

Appendix B

Acknowledgments

References and links

Supplementary Material (3)

Cited By

Figures (16)

Equations (71)

Optics Express