Learning in the dark: 3D integral imaging object recognition in very low illumination conditions using convolutional neural networks

Adam Markman; Bahram Javidi

doi:10.1364/OSAC.1.000373

1. Introduction

Imaging a scene in low illumination conditions using conventional image sensors that operate in the visible spectrum is difficult as the captured images become read-noise dominant. Thus, signal-to-noise ratio (SNR) suffers resulting in poor scene visualization in addition to making object recognition a difficult task. There is much interest in a broad range of fields to image in low-light conditions such as remote sensing [1], underwater imaging [2], night vision [3,4], etc. Image sensors that are designed for imaging in low-light conditions include electron-multiplying CCD cameras (EM-CCD) [3,4], scientific CMOS (sCMOS) cameras [2], or night vision cameras. However, both the EM-CCD and sCMOS cameras are expensive and bulky. In particular, the EM-CCD needs to be cooled to around −55° C prior to operation. Night vision operates by amplifying the number of photons in the scene. If too few photons are available, an active near infrared source is required to illuminate the scene. Infrared cameras are effective in low light conditions. However, they have lower resolution compared with visible range cameras and may require bulkier and more expensive optics.

Passive cameras for 3D imaging using three-dimensional (3D) integral imaging (InIm) [5] have been reported [6–10]. In 3D InIm, an array of cameras or a single moving camera may be used to capture a scene, with each camera obtaining a unique perspective of the scene known as an elemental image (EI). Using the acquired EIs, a 3D image can be reconstructed computationally or optically. Integral imaging has been investigated in low illumination conditions. In [11], a photon-counting model was used to simulate photon-limited images from EIs that captured a 3D scene under sufficient illumination. Computational 3D InIm reconstruction was performed using photon-limited EIs. It was shown that the 3D InIm reconstruction produces the maximum likelihood estimate of objects that lie on the corresponding 3D reconstructed depth plane. Thus, the 3D reconstructed image has higher SNR compared with a single 2D image. In [12], a 16-bit cooled camera was used to obtain EIs of objects under photon-starved conditions. After 3D reconstruction and denoising using total-variation denoising, object visualization was achieved whereas it was not possible using a single 2D image. In [13], 3D InIm was used to obtain EIs of an outdoor scene containing an object behind occlusion under low illumination conditions. With a single 2D image, face detection was not possible in the experiments. After computational 3D InIm reconstruction, object detection was successful. However, object classification in low light levels was not possible in this approach.

In this paper, we show for the first time that it is possible not only to detect, but also to classify 3D objects in a scene in very low illumination conditions by 3D InIm. The novelty of the manuscript stems from the unique approach to the 3D training of the CNN classifier. We train the CNN classifier by using denoised 3D reconstructed images acquired using elemental images obtained under various low illumination conditions. By using 3D training data under these illumination conditions, the CNN is able to perform face recognition as it has been trained to recognize the face under non-optimal illumination conditions. Thus, the novelty lies in enabling a novel 3D object recognition approach for object recognition in low light levels, which may not have been possible using conventional 2D approaches.

We use low cost passive image sensors that operate in visible spectrum. The EIs are read-noise dominant. 3D InIm is naturally optimal in the maximum likelihood sense for read-noise dominant images as this follows a Gaussian distribution. Upon 3D InIm reconstruction, SNR increases resulting in improved image visualization. The scene is then denoised with total-variation regularization using an augmented Lagrange approach (TV-denoising) [14]. Regions of interest are obtained to detect faces in the scene [15], which are then inputted into a pre-trained convolutional neural network [16,17] for facial recognition. We demonstrate by experiments that the 3D InIm system trained in the dark with CNN was able to perform successfully face detection and classification in low light levels.

2. Three-dimensional integral imaging in low illumination conditions

3D InIm is a 3D imaging technique that uses a lenslett array, an array of cameras, or a moving camera to capture different perspectives of a scene, known as elemental images. The 3D InIm captures both intensity and angular information. Figure 1(a)

Fig. 1 Integral Imaging. (a) pickup, and (b) 3D reconstruction stages. c = sensor size, p = pitch, f = focal length, z = distance. (c) Parameters details in (a). R_i = chief ray, ℓ_i = the i-th lens; θ_I = azimuth angle; ϕ_i = zenith angle.

Download Full Size | PDF

depicts the integral imaging pickup stage. Once the EIs have been acquired, the scene can be reconstructed, as shown in Fig. 1(b), by back-propagating the captured light rays through a virtual pin hole to a particular depth plane a distance z away. Figure 1(c) depicts the chief ray, R_i, from the object surface in 3D space (x,y,z) at location (x₀,y₀,z₀) with azimuth angle θ_i and zenith angle ϕ_i being imaged by the i-th lens located at (x₁,y₁,z₁) and arriving at the sensor plane at (τ,ψ). Using the acquired elemental images, 3D InIm reconstruction can be performed optically or computationally. Figure 2

Fig. 2 Synthetic aperture integral imaging (SAII) pick-up and reconstruction stage.

Download Full Size | PDF

depicts the synthetic aperture integral imaging (SAII) pick-up and reconstruction stage. Computational 3D InIm reconstruction is implemented as follows [6]:

I (x, y; z) = \frac{1}{O (x, y)} \sum_{k = 0}^{K - 1} \sum_{B = 0}^{B - 1} E^{_{k, b}} (x - k \frac{L_{x} \times p_{x}}{c_{x} \times M}, y - b \frac{L_{y} \times p_{y}}{c_{y} \times M}),

where (x, y) is pixel index, z is reconstruction distance, O(x, y) is the overlapping number on (x, y), K and B are the total number of elemental images obtained in each column and row, respectively; E^k,b is the elemental image in the kth column and bth row, L_x and L_y are the total number of pixels in each column and row, respectively, for each E_k,b, M is the magnification factor and equals z ∕ g, g is the focal length, p_x and p_y is the pitch between image sensors, c_x and c_y are the size of the image sensor.

A captured image can be defined as E(x,y) = I(x,y)r(x,y) where I(x,y)>0 is the illumination factor and r(x,y) is the reflection coefficient between 0 and 1 [18]. As the scene illumination decreases, the illumination factor diminishes. Moreover, read-noise becomes greater than the scene signal hindering adequate scene visualization. Thus, the image becomes read-noise dominant. Read-noise results from on-chip sensor noise, is additive, and can be modeled as a zero-mean Gaussian distribution. Using Eq. (1), the 3D InIm reconstruction with read-noise is:

\begin{array}{l} I (x, y; z) = \frac{1}{O (x, y)} \sum_{k = 0}^{K - 1} \sum_{b = 0}^{B - 1} (E^{k, b} (x', y') + ε_{r}^{k, b} (x', y')), \\ = \frac{1}{O (x, y)} \sum_{k = 0}^{K - 1} \sum_{b = 0}^{B - 1} E^{k, b} (x', y') + \frac{1}{O (x, y)} \sum_{k = 0}^{K - 1} \sum_{b = 0}^{B - 1} ε_{r}^{k, b} (x', y'), \end{array}

where

ε_{r}^{k, b} (x', y')

is zero mean additive white Gaussian noise (i.e. read noise) for the elemental image in the kth column and bth row at location

(x', y')

,

x' = x - k (L_{x} \times p_{x}) / (c_{x} \times M)

and

y' = y - k (L_{y} \times p_{y}) / (c_{y} \times M)

.

Taking the variance of Eq. (2), the variance of the noise component for a fixed z, assuming that noise is wide sense stationary, is:

var (\frac{1}{O (x, y)} \sum_{k = 0}^{K - 1} \sum_{b = 0}^{B - 1} ε_{r}^{k, b} (x', y')) = \frac{1}{O (x, y)} σ^{2},

where var(.) is variance and σ² is the variance of the read-noise.

As the number of overlapping images increases, the variance or noise power, of the read noise decreases. It was shown that integral imaging reconstruction is naturally optimal in the maximum-likelihood sense for read-noise limited images as the distribution is approximately Gaussian [13]. Without photon counting devices to measure the flux density, the SNR of the image is estimated as [12]:

S N R = \sqrt{(〈 g_{o}^{2} 〉 - 〈 N^{2} 〉) / 〈 N^{2} 〉},

where g_o is the average power of the object region in the EI, <N²> is average noise power defined as <N²> = (Φ_ο + Φ_b)Q_et + Dt + n_r², where Φ_ο and Φ_b are the photon flux of the object and background (photons/pixel/second), D is dark current (electrons /pixel/second), Q_e is the quantum efficiency (electrons/photons), t is exposure time (seconds), n_r is read noise (RMS electrons/ pixel/second), and <> denotes mean ensemble, respectively.

The number of photons per pixel (N_photons) can be estimated, assuming dark current noise is negligible and the exposure time is sufficiently short, as:

Φ_{o} t = N_{p h o t o n s} \approx S N R \times n_{r} / Q_{e},

where N_photons is the estimated number of photons.

3. Experimental results

A synthetic aperture integral imaging experiment was conducted using Allied Vision Mako-192 camera with 86.4 mm x 44 mm x 29 mm camera dimensions. The sensor is an e2v EV 76C570 and a CMOS sensor type. The F/# is F/1.8 with focal length of 50 mm, pixel size of 4.5 um x 4.5 um, sensor size of 7.2 mm (H) x 5.4 mm (V), and image size of 1600 (H) x 1200 (V). The camera read-noise is 20.47 electrons/pixel/sec and the quantum efficiency at 525 nm is 0.44 electron/photons. A gain of 0 dB was used in the experiments. The InIm setup consisted of 72 elemental images using 3 x 24 array with 10 mm (H) x 80 mm (V) a pitch and exposure time of 0.015 s.

The experimental setup for low illumination conditions consisted of a 3D integral imaging set up with 6 subjects located a distance 4.5 m away from the camera array. Experiments were conducted for each subject under different illumination conditions resulting in different SNR levels. The illumination conditions were altered by adjusting the intensity of the light source. Figure 3(a)

Fig. 3 Sample elemental images with corresponding SNR and photon/pixel. The SNR of the images shown in Fig. 3(d)–3(f) cannot be computed as the average power of the object regions is less than that of the background. N/A = not applicable.

Download Full Size | PDF

depicts the elemental image [reference image] with an SNR of 10.41 dB (i.e. good illumination) and Fig. 4(a)

Fig. 4 Three-dimensional (3D) reconstructed images at z = 4.5 m corresponding to the elemental images shown in Fig. 3. The SNR and photons/pixel can be computed for low light levels.

Download Full Size | PDF

shows the 3D reconstructed image at z = 4.5 m with an SNR of 12.39 dB. Prior to 3D reconstruction, the elemental images were registered and aligned due to the experimental conditions (e.g. unbalanced camera array). Fifty bias frames were taken and averaged for each camera and subtracted from the elemental images. SNR was computed by assuming <g_o²> is the object (i.e. the person’s face) while <N²> is an area of scene that is completely dark. The elemental images acquired using the 3D InIm under low illumination conditions are shown in Fig. 3(b)–3(f), which is in order of decreasing illumination and dominated by read-noise. Measuring the number of photons in the outdoor scene under these conditions requires sophisticated instruments, which are not easily field portable. In Fig. 3(b), the SNR was −1.20 dB with approximately 40.53 photons/pixel on the object. The person captured is still visible. In Fig. 3(c), the SNR decreases to −9.13 dB with 16.26 photons/pixel. The average power of object is lower than the noise power for the images shown in Fig. 3(d)–3(f). As a result, SNR cannot be computed as

〈 N^{2} 〉 > 〈 g_{0}^{2} 〉

resulting in an imaginary number in Eq. (4).

3D reconstructed images at z = 4.5 m are shown in Fig. 4, with the 3D reconstructed images corresponding to the elemental images shown in Fig. 3. In Fig. 4(b)–4(f), the SNR increases to 8.93 dB, 0.96 dB, −5.28 dB, −9.67 dB, and −12.38 dB, respectively. Moreover, the corresponding photons/pixel for Fig. 4(b)–4(f) is 130.02, 51.95 photons/pixel, 25.34 photons/pixel, 15.27 photons/pixel, and 11.18 photons/pixel, respectively. Figure 5(a)

Fig. 5 SNR vs illumination. (a) Graph of SNR (in dB) as a function of decreasing illumination for both 3D reconstructed (Recon.) images taken at z = 4.5 m and 2D elemental images. (b) SNR (dB) as photons/pixel on the object increases for 3D reconstructed image and elemental images. Illumination levels 1 to 17 correspond to the scene light levels used in the experiments with 1 corresponding to the highest illumination level. The photons/pixel for each SNR computed is shown in (b).

Download Full Size | PDF

depicts a graph of the SNR (see Eq. (4) of the elemental images and the corresponding 3D reconstructed images a z = 4.5 m as a function of illumination. Illumination levels 1 to 17 correspond to the scene light levels used in the experiments with 1 corresponding to the highest illumination level. The SNR of the 3D reconstructed images is higher than that of the 2D EIs. We note that the SNR could not be computed for EIs with SNRs below −21.35 dB as the noise became greater than the signal. Figure 5(b) depicts a graph displaying SNR (in dB) as a function of the number of photons/pixel. Overall, the 3D reconstructed images have a higher number of photons/pixel relative to their corresponding 2D EI.

Additional experiments were carried out to evaluate the advantages of 3D InIm in low light conditions when compared with increasing the exposure time of a single camera and recording multiple 2D elemental images using a single camera perspective followed by taking the average of the images. To evaluate image quality, we define the following metric:

S N R_{c o n t r a s t} = μ_{o b j} / σ_{n o i s e},

where μ_sig is the mean of the object and σ_noise is the standard deviation of the background noise, which was an area of the image containing very low pixel values.

A 3D InIm experiment was conducted using the experimental parameters described above. Figure 6(a)

Fig. 6 The (a) elemental image and (b) 3D reconstructed image at z = 4500 mm. The SNR_contrasts are 31.5 dB and 33.76 dB, respectively.

Download Full Size | PDF

depicts the reference image while Fig. 6(b) depicts the 3D reconstructed image at z = 4.5 m, with corresponding SNR_contrast of 31.5 dB and 33.76 dB, respectively. A low-light condition experiment was then conducted. First, the scene was captured using a single image, but using three different exposure times. Figure 7(a)–7(c)

Fig. 7 2D captured images having an exposure time of (a) 0.010 s, (b) 0.015 s and (c) 0.203 s under low light conditions. The SNR_contrast of the images shown in (a) and (b) cannot be computed as the object region intensity is less than that of the background. The SNR_contrast of (c) is 8.213 dB.

Download Full Size | PDF

, depicts the captured images having an exposure time of 0.010 s, 0.015 s and 0.203 s under low light conditions, respectively. The SNR_contrast of the images shown in Fig. 7(a) and Fig. 7(b) cannot be computed as the object region intensity is less than that of the background. The SNR_contrast of Fig. 7(c) is 8.213 dB. Another set of experiments was carried out to capture 72 images from a single perspective along with a 3D InIm experiment. As shown in Fig. 8(a)

Fig. 8 (a) Average of 72 elemental 2D images, and (b) the 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.015 s for each elemental image. The SNR_contrast is 6.38 dB in (a) and 16.702 dB in (b), respectively. (c) Average of 72 elemental 2D images and (d) the corresponding 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.010 s for each elemental image. The SNR_contrast is 2.152 dB in (c) and 15.94 dB in (d), respectively.

Download Full Size | PDF

, 72 images from a single perspective were taken using an exposure time of 0.015 s and averaged while in Fig. 8(b) the 3D reconstructed image at z = 4.5 m is shown. The SNR_contrasts are 6.38 dB and 16.702 dB, respectively. The experiment was then repeated once more using an exposure time of 0.010 s. Figure 8(c) depicts the average of 72 images obtained from a single perspective while Fig. 8(d) depicts the 3D reconstructed image at z = 4.5 m. The SNR_contrasts are 2.152 dB and 15.94 dB, respectively. Thus, by capturing both intensity and angular information, image contrast and visualization using 3D InIm image reconstruction is improved compared with taking the average of multiple images captured using a single perspective or increasing the exposure time and capturing a single image. One reason is that 3D InIm segments out the object of interest from the background.

4. Object classification using convolutional neural networks

A Convolutional Neural Network (CNN) [16,17] was then trained on low illumination data for face recognition. An advantage of deep learning over other machine learning algorithms (e.g. Support Vector Machines or Random Forest Classifier) is that feature extraction is not needed. However, deep learning increases the computational complexity. In addition, deep learning requires a sufficiently large training set. The training images were 3D reconstructed images of faces after TV-denoising obtained under different illumination conditions. The customized CNN employed in the experiments used larger filters in the convolutional layers as these performed well on images obtained under low-light conditions. Training in photon-starved environments helps improve the classifier’s ability to discern between different subjects in low illumination conditions. To illustrate the need for learning in the dark, normalized correlation was used to demonstrate the difficulty in discriminating faces under low illumination conditions. Figure 9(a)

Fig. 9 Correlation between 3D reconstructed image at z = 4.5 m after TV-denoising using EIs (a) obtained under SNR of 10.41 dB as a reference. This 3D image was correlated with 3D reconstructed images after TV-denoising whose EIs were obtained under an SNR of −12.75 dB for (b) true class object, and (c) false class object; with correlation values of 0.58 and 0.48, respectively. Classification is difficult.

Download Full Size | PDF

shows a 3D reconstruction reference image at z = 4.5 m after TV-denoising obtained using EIs under an SNR of 10.41 dB. This image was correlated with 3D reconstructed images after TV-denoising whose EIs were obtained under an SNR of −12.75 dB, shown in Fig. 9(b) (true class object) and 9(c) (false class object), with correlation values of 0.58 and 0.48, respectively. Note that 1 indicates the images are perfectly correlated, and 0 indicates no correlation. Thus, it is difficult to discriminate objects under low-light conditions without training the classifier information about what object appears in low-light.

A CNN was trained to perform facial recognition using data from the 3D InIm reconstructed data. A data set was collected consisting of 6 different subjects, and 17 different illumination conditions acquired using the 3D InIm. The images were then computationally reconstructed over distances of 4 m to 5 meters with a step size of 50 mm where the true object distance was 4.5 meter. Figure 10

Fig. 10 Example denoised 3D reconstructed training images acquired at various SNRs, reconstruction depths, additive noise, and rotations.

Download Full Size | PDF

below depicts examples of the training images used. The data set was then split into training and testing whereas 4 randomly chosen illumination conditions having SNRs of approximately −1.41 dB, −8.322 dB, −8.971 dB, and −12.75 dB was not used to train the classifier (test set) and the other 13 illumination conditions were used for training. Thus, there were 24 test scenes. The training images were grayscale images of size 256 x 256 pixels and were perturbed by adding additive Gaussian noise with mean 0 and standard deviation of 0.01, 0.05 and 0.1, rotated −1, −0.5, 0.5 and 1 degrees, and translated −5, −3, 3, and 5 pixels in both the x- and y- directions generating a total of 29,232 images. The data was then denoised using total-variation regularization using an augmented Lagrange approach with a regularization parameter of 20000 [14]. Figure 10 depicts examples of denoised 3D reconstructed training images acquired at various SNRs, reconstruction depths, additives noise, and rotations. The CNN consisted of: a convolution layer [13 x 13, 20 filters], rectified linear unit layer (ReLU), 2 x 2 max pooling, convolution layer [11 x 11, 20 filters], ReLU, 2 x 2 max pooling, fully connected layer, and a SoftMax layer [6 outputs]. For training, stochastic gradient descent was used with a learning rate of 0.0001 and a maximum of 10 epoch used along with the cross-entropy metric to evaluate model performance. In total, the model took approximately 4 hours to train on a high performance computer utilizing a GPU Tesla K40m running CUDA 8.0 and implemented using MATLAB.

For classification, only regions of the 3D reconstructed image consisting of information from all 72 elemental images were considered to reduce the size of the input image. The image was then denoised using total-variation regularization using an augmented Lagrange approach with a regularization parameter of 20000 [14]. Afterwards, the Viola-Jones face detector [15] was used to find regions of interest. The regions were then inputted into the CNN classifier. This process was repeated over all z. If the same face appeared in the same region and detected over multiple depths, the estimated object reconstruction depth corresponded to the face with the highest mean intensity value. The rationale is that this reconstruction depth contained the most object information (i.e. strongest signal). More specifically, the object region can be modeled as signal plus additive noise whereas incorrect depths can be considered as noise. This is not the only approach and other approaches may be considered for future work. Note that faces were not detected for the Viola-Jones classifier for EIs with SNRs below approximately −21.36 dB. Table 1

Table 1. Confusion matrix for face recognition using the CNN trained under low illumination conditions with 3D reconstructed images using 4 test scenes for each of the 6 subjects.

View Table

summarizes the results. The proposed 3D system had a 100% accuracy. Figure 11

Fig. 11 Overview of training CNN classifier and acquiring test data.

Download Full Size | PDF

below depicts an overview of the classification scheme.

5. Conclusion

In conclusion, we present a 3D InIm system trained in the dark to classify 3D objects obtained under low illumination conditions. Regions of interest obtained from 3D reconstructed images are obtained by denoising the 3D reconstructed image using total- variation regularization using an augmented Lagrange approach followed by face detection. The regions of interest are then inputted into a pre-trained Convolutional Neural Network (CNN). The CNN was trained using 3D InIm reconstructed under low illumination after TV-denoising. The EIs were obtained under various low illumination conditions having different SNRs. The CNN was able to able to recognize the 3D reconstructed faces after TV- denoising with 100% accuracy. Using a single 2D elemental image, regions of interest cannot even be extracted for low illumination conditions. Future work includes more dynamic scenes, utilizing different algorithms to improve image quality and classification in different scene conditions [19], and improving the data set size to create a more robust classifier.

Funding

Night Vision and Electronic Sensors Directorate, Communications-Electronics Research, Development and Engineering Center, US Army (W909MY-12-D-0008).

References

1. N. Levin and Q. Zhang, “A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas,” Remote Sens. Rev. 190, 366–382 (2017). [CrossRef]

2. B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016). [CrossRef]

3. D. Boening, T. W. Groemer, and J. Klingauf, “Applicability of an EM-CCD for spatially resolved TIR-ICS,” Opt. Express 18(13), 13516–13528 (2010). [CrossRef] [PubMed]

4. Z. Petrášek and K. Suhling, “Photon arrival timing with sub-camera exposure time resolution in wide-field time-resolved photon counting imaging,” Opt. Express 18(24), 24888–24901 (2010). [CrossRef] [PubMed]

5. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theory Appl. 7(1), 821–825 (1908). [CrossRef]

6. J. S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27(13), 1144–1146 (2002). [CrossRef] [PubMed]

7. A. Llavador, E. Sánchez-Ortiga, G. Saavedra, B. Javidi, and M. Martínez-Corral, “Free-depths reconstruction with synthetic impulse response in integral imaging,” Opt. Express 23(23), 30127–30135 (2015). [CrossRef] [PubMed]

8. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36(7), 1598–1603 (1997). [CrossRef] [PubMed]

9. H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A 15(8), 2059–2065 (1998). [CrossRef]

10. M. Yamaguchi and R. Higashida, “3D touchable holographic light-field display,” Appl. Opt. 55(3), A178–A183 (2016). [CrossRef] [PubMed]

11. B. Tavakoli, B. Javidi, and E. Watson, “Three dimensional visualization by photon counting computational Integral Imaging,” Opt. Express 16(7), 4426–4436 (2008). [CrossRef] [PubMed]

12. A. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral imaging under low light levels,” IEEE Photon. J. 4(4), 1188–1195 (2012). [CrossRef]

13. A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017). [CrossRef] [PubMed]

14. S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011). [CrossRef] [PubMed]

15. P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005). [CrossRef]

16. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in the Neural Information Processing Systems Conference (2012), pp. 1097–1105.

17. S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997). [CrossRef] [PubMed]

18. R. Gonzalez and R. Woods, Digital Image Processing (Pearson, 2008).

19. F. Sadjadi and A. Mahalanobis, “Automatic target recognition,” Proc. SPIE 10648, 106480I (2018).

		Predicted
		Class 1	Class 2	Class 3	Class 4	Class 5	Class 6
Actual	Class 1	4	0	0	0	0	0
	Class 2	0	4	0	0	0	0
	Class 3	0	0	4	0	0	0
	Class 4	0	0	0	4	0	0
	Class 5	0	0	0	0	4	0
	Class 6	0	0	0	0	0	4

Learning in the dark: 3D integral imaging object recognition in very low illumination conditions using convolutional neural networks

Abstract

1. Introduction

2. Three-dimensional integral imaging in low illumination conditions

3. Experimental results

4. Object classification using convolutional neural networks

5. Conclusion

Funding

References

Cited By

Figures (11)

Tables (1)

Equations (6)

OSA Continuum