Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Learning in the dark: 3D integral imaging object recognition in very low illumination conditions using convolutional neural networks

Open Access Open Access

Abstract

We propose a framework for three-dimensional (3D) object recognition and classification in very low illumination environments using convolutional neural networks (CNNs). 3D images are reconstructed using 3D integral imaging (InIm) with conventional visible spectrum image sensors. After imaging the low light scene using 3D InIm, the 3D reconstructed image has a higher signal-to-noise ratio than a single 2D image, which is a result of 3D InIm being optimal in the maximum likelihood sense for read-noise dominant images. Once 3D reconstruction has been performed, the 3D image is denoised and regions of interest are extracted to detect 3D objects in a scene. The extracted regions are then inputted into a CNN, which was trained under low illumination conditions using 3D InIm reconstructed images, to perform object recognition. To the best of our knowledge, this is the first report of utilizing 3D InIm and convolutional neural networks for 3D training and 3D object classification under very low illumination conditions.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Imaging a scene in low illumination conditions using conventional image sensors that operate in the visible spectrum is difficult as the captured images become read-noise dominant. Thus, signal-to-noise ratio (SNR) suffers resulting in poor scene visualization in addition to making object recognition a difficult task. There is much interest in a broad range of fields to image in low-light conditions such as remote sensing [1], underwater imaging [2], night vision [3,4], etc. Image sensors that are designed for imaging in low-light conditions include electron-multiplying CCD cameras (EM-CCD) [3,4], scientific CMOS (sCMOS) cameras [2], or night vision cameras. However, both the EM-CCD and sCMOS cameras are expensive and bulky. In particular, the EM-CCD needs to be cooled to around −55° C prior to operation. Night vision operates by amplifying the number of photons in the scene. If too few photons are available, an active near infrared source is required to illuminate the scene. Infrared cameras are effective in low light conditions. However, they have lower resolution compared with visible range cameras and may require bulkier and more expensive optics.

Passive cameras for 3D imaging using three-dimensional (3D) integral imaging (InIm) [5] have been reported [6–10]. In 3D InIm, an array of cameras or a single moving camera may be used to capture a scene, with each camera obtaining a unique perspective of the scene known as an elemental image (EI). Using the acquired EIs, a 3D image can be reconstructed computationally or optically. Integral imaging has been investigated in low illumination conditions. In [11], a photon-counting model was used to simulate photon-limited images from EIs that captured a 3D scene under sufficient illumination. Computational 3D InIm reconstruction was performed using photon-limited EIs. It was shown that the 3D InIm reconstruction produces the maximum likelihood estimate of objects that lie on the corresponding 3D reconstructed depth plane. Thus, the 3D reconstructed image has higher SNR compared with a single 2D image. In [12], a 16-bit cooled camera was used to obtain EIs of objects under photon-starved conditions. After 3D reconstruction and denoising using total-variation denoising, object visualization was achieved whereas it was not possible using a single 2D image. In [13], 3D InIm was used to obtain EIs of an outdoor scene containing an object behind occlusion under low illumination conditions. With a single 2D image, face detection was not possible in the experiments. After computational 3D InIm reconstruction, object detection was successful. However, object classification in low light levels was not possible in this approach.

In this paper, we show for the first time that it is possible not only to detect, but also to classify 3D objects in a scene in very low illumination conditions by 3D InIm. The novelty of the manuscript stems from the unique approach to the 3D training of the CNN classifier. We train the CNN classifier by using denoised 3D reconstructed images acquired using elemental images obtained under various low illumination conditions. By using 3D training data under these illumination conditions, the CNN is able to perform face recognition as it has been trained to recognize the face under non-optimal illumination conditions. Thus, the novelty lies in enabling a novel 3D object recognition approach for object recognition in low light levels, which may not have been possible using conventional 2D approaches.

We use low cost passive image sensors that operate in visible spectrum. The EIs are read-noise dominant. 3D InIm is naturally optimal in the maximum likelihood sense for read-noise dominant images as this follows a Gaussian distribution. Upon 3D InIm reconstruction, SNR increases resulting in improved image visualization. The scene is then denoised with total-variation regularization using an augmented Lagrange approach (TV-denoising) [14]. Regions of interest are obtained to detect faces in the scene [15], which are then inputted into a pre-trained convolutional neural network [16,17] for facial recognition. We demonstrate by experiments that the 3D InIm system trained in the dark with CNN was able to perform successfully face detection and classification in low light levels.

2. Three-dimensional integral imaging in low illumination conditions

3D InIm is a 3D imaging technique that uses a lenslett array, an array of cameras, or a moving camera to capture different perspectives of a scene, known as elemental images. The 3D InIm captures both intensity and angular information. Figure 1(a)

 figure: Fig. 1

Fig. 1 Integral Imaging. (a) pickup, and (b) 3D reconstruction stages. c = sensor size, p = pitch, f = focal length, z = distance. (c) Parameters details in (a). Ri = chief ray, i = the i-th lens; θI = azimuth angle; ϕi = zenith angle.

Download Full Size | PDF

depicts the integral imaging pickup stage. Once the EIs have been acquired, the scene can be reconstructed, as shown in Fig. 1(b), by back-propagating the captured light rays through a virtual pin hole to a particular depth plane a distance z away. Figure 1(c) depicts the chief ray, Ri, from the object surface in 3D space (x,y,z) at location (x0,y0,z0) with azimuth angle θi and zenith angle ϕi being imaged by the i-th lens located at (x1,y1,z1) and arriving at the sensor plane at (τ,ψ). Using the acquired elemental images, 3D InIm reconstruction can be performed optically or computationally. Figure 2
 figure: Fig. 2

Fig. 2 Synthetic aperture integral imaging (SAII) pick-up and reconstruction stage.

Download Full Size | PDF

depicts the synthetic aperture integral imaging (SAII) pick-up and reconstruction stage. Computational 3D InIm reconstruction is implemented as follows [6]:
I(x,y;z)=1O(x,y)k=0K1B=0B1Ek,b(xkLx×pxcx×M,ybLy×pycy×M),
where (x, y) is pixel index, z is reconstruction distance, O(x, y) is the overlapping number on (x, y), K and B are the total number of elemental images obtained in each column and row, respectively; Ek,b is the elemental image in the kth column and bth row, Lx and Ly are the total number of pixels in each column and row, respectively, for each Ek,b, M is the magnification factor and equals zg, g is the focal length, px and py is the pitch between image sensors, cx and cy are the size of the image sensor.

A captured image can be defined as E(x,y) = I(x,y)r(x,y) where I(x,y)>0 is the illumination factor and r(x,y) is the reflection coefficient between 0 and 1 [18]. As the scene illumination decreases, the illumination factor diminishes. Moreover, read-noise becomes greater than the scene signal hindering adequate scene visualization. Thus, the image becomes read-noise dominant. Read-noise results from on-chip sensor noise, is additive, and can be modeled as a zero-mean Gaussian distribution. Using Eq. (1), the 3D InIm reconstruction with read-noise is:

I(x,y;z)=1O(x,y)k=0K1b=0B1(Ek,b(x',y')+εrk,b(x',y')),=1O(x,y)k=0K1b=0B1Ek,b(x',y')+1O(x,y)k=0K1b=0B1εrk,b(x',y'),
where εrk,b(x',y')is zero mean additive white Gaussian noise (i.e. read noise) for the elemental image in the kth column and bth row at location (x',y'), x'=xk(Lx×px)/(cx×M) and y'=yk(Ly×py)/(cy×M).

Taking the variance of Eq. (2), the variance of the noise component for a fixed z, assuming that noise is wide sense stationary, is:

var(1O(x,y)k=0K1b=0B1εrk,b(x',y'))=1O(x,y)σ2,
where var(.) is variance and σ2 is the variance of the read-noise.

As the number of overlapping images increases, the variance or noise power, of the read noise decreases. It was shown that integral imaging reconstruction is naturally optimal in the maximum-likelihood sense for read-noise limited images as the distribution is approximately Gaussian [13]. Without photon counting devices to measure the flux density, the SNR of the image is estimated as [12]:

SNR=(go2N2)/N2,
where go is the average power of the object region in the EI, <N2> is average noise power defined as <N2> = (Φο + Φb)Qet + Dt + nr2, where Φο and Φb are the photon flux of the object and background (photons/pixel/second), D is dark current (electrons /pixel/second), Qe is the quantum efficiency (electrons/photons), t is exposure time (seconds), nr is read noise (RMS electrons/ pixel/second), and <> denotes mean ensemble, respectively.

The number of photons per pixel (Nphotons) can be estimated, assuming dark current noise is negligible and the exposure time is sufficiently short, as:

Φot=NphotonsSNR×nr/Qe,
where Nphotons is the estimated number of photons.

3. Experimental results

A synthetic aperture integral imaging experiment was conducted using Allied Vision Mako-192 camera with 86.4 mm x 44 mm x 29 mm camera dimensions. The sensor is an e2v EV 76C570 and a CMOS sensor type. The F/# is F/1.8 with focal length of 50 mm, pixel size of 4.5 um x 4.5 um, sensor size of 7.2 mm (H) x 5.4 mm (V), and image size of 1600 (H) x 1200 (V). The camera read-noise is 20.47 electrons/pixel/sec and the quantum efficiency at 525 nm is 0.44 electron/photons. A gain of 0 dB was used in the experiments. The InIm setup consisted of 72 elemental images using 3 x 24 array with 10 mm (H) x 80 mm (V) a pitch and exposure time of 0.015 s.

The experimental setup for low illumination conditions consisted of a 3D integral imaging set up with 6 subjects located a distance 4.5 m away from the camera array. Experiments were conducted for each subject under different illumination conditions resulting in different SNR levels. The illumination conditions were altered by adjusting the intensity of the light source. Figure 3(a)

 figure: Fig. 3

Fig. 3 Sample elemental images with corresponding SNR and photon/pixel. The SNR of the images shown in Fig. 3(d)–3(f) cannot be computed as the average power of the object regions is less than that of the background. N/A = not applicable.

Download Full Size | PDF

depicts the elemental image [reference image] with an SNR of 10.41 dB (i.e. good illumination) and Fig. 4(a)
 figure: Fig. 4

Fig. 4 Three-dimensional (3D) reconstructed images at z = 4.5 m corresponding to the elemental images shown in Fig. 3. The SNR and photons/pixel can be computed for low light levels.

Download Full Size | PDF

shows the 3D reconstructed image at z = 4.5 m with an SNR of 12.39 dB. Prior to 3D reconstruction, the elemental images were registered and aligned due to the experimental conditions (e.g. unbalanced camera array). Fifty bias frames were taken and averaged for each camera and subtracted from the elemental images. SNR was computed by assuming <go2> is the object (i.e. the person’s face) while <N2> is an area of scene that is completely dark. The elemental images acquired using the 3D InIm under low illumination conditions are shown in Fig. 3(b)–3(f), which is in order of decreasing illumination and dominated by read-noise. Measuring the number of photons in the outdoor scene under these conditions requires sophisticated instruments, which are not easily field portable. In Fig. 3(b), the SNR was −1.20 dB with approximately 40.53 photons/pixel on the object. The person captured is still visible. In Fig. 3(c), the SNR decreases to −9.13 dB with 16.26 photons/pixel. The average power of object is lower than the noise power for the images shown in Fig. 3(d)–3(f). As a result, SNR cannot be computed as N2>g02resulting in an imaginary number in Eq. (4).

3D reconstructed images at z = 4.5 m are shown in Fig. 4, with the 3D reconstructed images corresponding to the elemental images shown in Fig. 3. In Fig. 4(b)–4(f), the SNR increases to 8.93 dB, 0.96 dB, −5.28 dB, −9.67 dB, and −12.38 dB, respectively. Moreover, the corresponding photons/pixel for Fig. 4(b)–4(f) is 130.02, 51.95 photons/pixel, 25.34 photons/pixel, 15.27 photons/pixel, and 11.18 photons/pixel, respectively. Figure 5(a)

 figure: Fig. 5

Fig. 5 SNR vs illumination. (a) Graph of SNR (in dB) as a function of decreasing illumination for both 3D reconstructed (Recon.) images taken at z = 4.5 m and 2D elemental images. (b) SNR (dB) as photons/pixel on the object increases for 3D reconstructed image and elemental images. Illumination levels 1 to 17 correspond to the scene light levels used in the experiments with 1 corresponding to the highest illumination level. The photons/pixel for each SNR computed is shown in (b).

Download Full Size | PDF

depicts a graph of the SNR (see Eq. (4) of the elemental images and the corresponding 3D reconstructed images a z = 4.5 m as a function of illumination. Illumination levels 1 to 17 correspond to the scene light levels used in the experiments with 1 corresponding to the highest illumination level. The SNR of the 3D reconstructed images is higher than that of the 2D EIs. We note that the SNR could not be computed for EIs with SNRs below −21.35 dB as the noise became greater than the signal. Figure 5(b) depicts a graph displaying SNR (in dB) as a function of the number of photons/pixel. Overall, the 3D reconstructed images have a higher number of photons/pixel relative to their corresponding 2D EI.

Additional experiments were carried out to evaluate the advantages of 3D InIm in low light conditions when compared with increasing the exposure time of a single camera and recording multiple 2D elemental images using a single camera perspective followed by taking the average of the images. To evaluate image quality, we define the following metric:

SNRcontrast=μobj/σnoise,
where μsig is the mean of the object and σnoise is the standard deviation of the background noise, which was an area of the image containing very low pixel values.

A 3D InIm experiment was conducted using the experimental parameters described above. Figure 6(a)

 figure: Fig. 6

Fig. 6 The (a) elemental image and (b) 3D reconstructed image at z = 4500 mm. The SNRcontrasts are 31.5 dB and 33.76 dB, respectively.

Download Full Size | PDF

depicts the reference image while Fig. 6(b) depicts the 3D reconstructed image at z = 4.5 m, with corresponding SNRcontrast of 31.5 dB and 33.76 dB, respectively. A low-light condition experiment was then conducted. First, the scene was captured using a single image, but using three different exposure times. Figure 7(a)–7(c)
 figure: Fig. 7

Fig. 7 2D captured images having an exposure time of (a) 0.010 s, (b) 0.015 s and (c) 0.203 s under low light conditions. The SNRcontrast of the images shown in (a) and (b) cannot be computed as the object region intensity is less than that of the background. The SNRcontrast of (c) is 8.213 dB.

Download Full Size | PDF

, depicts the captured images having an exposure time of 0.010 s, 0.015 s and 0.203 s under low light conditions, respectively. The SNRcontrast of the images shown in Fig. 7(a) and Fig. 7(b) cannot be computed as the object region intensity is less than that of the background. The SNRcontrast of Fig. 7(c) is 8.213 dB. Another set of experiments was carried out to capture 72 images from a single perspective along with a 3D InIm experiment. As shown in Fig. 8(a)
 figure: Fig. 8

Fig. 8 (a) Average of 72 elemental 2D images, and (b) the 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.015 s for each elemental image. The SNRcontrast is 6.38 dB in (a) and 16.702 dB in (b), respectively. (c) Average of 72 elemental 2D images and (d) the corresponding 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.010 s for each elemental image. The SNRcontrast is 2.152 dB in (c) and 15.94 dB in (d), respectively.

Download Full Size | PDF

, 72 images from a single perspective were taken using an exposure time of 0.015 s and averaged while in Fig. 8(b) the 3D reconstructed image at z = 4.5 m is shown. The SNRcontrasts are 6.38 dB and 16.702 dB, respectively. The experiment was then repeated once more using an exposure time of 0.010 s. Figure 8(c) depicts the average of 72 images obtained from a single perspective while Fig. 8(d) depicts the 3D reconstructed image at z = 4.5 m. The SNRcontrasts are 2.152 dB and 15.94 dB, respectively. Thus, by capturing both intensity and angular information, image contrast and visualization using 3D InIm image reconstruction is improved compared with taking the average of multiple images captured using a single perspective or increasing the exposure time and capturing a single image. One reason is that 3D InIm segments out the object of interest from the background.

4. Object classification using convolutional neural networks

A Convolutional Neural Network (CNN) [16,17] was then trained on low illumination data for face recognition. An advantage of deep learning over other machine learning algorithms (e.g. Support Vector Machines or Random Forest Classifier) is that feature extraction is not needed. However, deep learning increases the computational complexity. In addition, deep learning requires a sufficiently large training set. The training images were 3D reconstructed images of faces after TV-denoising obtained under different illumination conditions. The customized CNN employed in the experiments used larger filters in the convolutional layers as these performed well on images obtained under low-light conditions. Training in photon-starved environments helps improve the classifier’s ability to discern between different subjects in low illumination conditions. To illustrate the need for learning in the dark, normalized correlation was used to demonstrate the difficulty in discriminating faces under low illumination conditions. Figure 9(a)

 figure: Fig. 9

Fig. 9 Correlation between 3D reconstructed image at z = 4.5 m after TV-denoising using EIs (a) obtained under SNR of 10.41 dB as a reference. This 3D image was correlated with 3D reconstructed images after TV-denoising whose EIs were obtained under an SNR of −12.75 dB for (b) true class object, and (c) false class object; with correlation values of 0.58 and 0.48, respectively. Classification is difficult.

Download Full Size | PDF

shows a 3D reconstruction reference image at z = 4.5 m after TV-denoising obtained using EIs under an SNR of 10.41 dB. This image was correlated with 3D reconstructed images after TV-denoising whose EIs were obtained under an SNR of −12.75 dB, shown in Fig. 9(b) (true class object) and 9(c) (false class object), with correlation values of 0.58 and 0.48, respectively. Note that 1 indicates the images are perfectly correlated, and 0 indicates no correlation. Thus, it is difficult to discriminate objects under low-light conditions without training the classifier information about what object appears in low-light.

A CNN was trained to perform facial recognition using data from the 3D InIm reconstructed data. A data set was collected consisting of 6 different subjects, and 17 different illumination conditions acquired using the 3D InIm. The images were then computationally reconstructed over distances of 4 m to 5 meters with a step size of 50 mm where the true object distance was 4.5 meter. Figure 10

 figure: Fig. 10

Fig. 10 Example denoised 3D reconstructed training images acquired at various SNRs, reconstruction depths, additive noise, and rotations.

Download Full Size | PDF

below depicts examples of the training images used. The data set was then split into training and testing whereas 4 randomly chosen illumination conditions having SNRs of approximately −1.41 dB, −8.322 dB, −8.971 dB, and −12.75 dB was not used to train the classifier (test set) and the other 13 illumination conditions were used for training. Thus, there were 24 test scenes. The training images were grayscale images of size 256 x 256 pixels and were perturbed by adding additive Gaussian noise with mean 0 and standard deviation of 0.01, 0.05 and 0.1, rotated −1, −0.5, 0.5 and 1 degrees, and translated −5, −3, 3, and 5 pixels in both the x- and y- directions generating a total of 29,232 images. The data was then denoised using total-variation regularization using an augmented Lagrange approach with a regularization parameter of 20000 [14]. Figure 10 depicts examples of denoised 3D reconstructed training images acquired at various SNRs, reconstruction depths, additives noise, and rotations. The CNN consisted of: a convolution layer [13 x 13, 20 filters], rectified linear unit layer (ReLU), 2 x 2 max pooling, convolution layer [11 x 11, 20 filters], ReLU, 2 x 2 max pooling, fully connected layer, and a SoftMax layer [6 outputs]. For training, stochastic gradient descent was used with a learning rate of 0.0001 and a maximum of 10 epoch used along with the cross-entropy metric to evaluate model performance. In total, the model took approximately 4 hours to train on a high performance computer utilizing a GPU Tesla K40m running CUDA 8.0 and implemented using MATLAB.

For classification, only regions of the 3D reconstructed image consisting of information from all 72 elemental images were considered to reduce the size of the input image. The image was then denoised using total-variation regularization using an augmented Lagrange approach with a regularization parameter of 20000 [14]. Afterwards, the Viola-Jones face detector [15] was used to find regions of interest. The regions were then inputted into the CNN classifier. This process was repeated over all z. If the same face appeared in the same region and detected over multiple depths, the estimated object reconstruction depth corresponded to the face with the highest mean intensity value. The rationale is that this reconstruction depth contained the most object information (i.e. strongest signal). More specifically, the object region can be modeled as signal plus additive noise whereas incorrect depths can be considered as noise. This is not the only approach and other approaches may be considered for future work. Note that faces were not detected for the Viola-Jones classifier for EIs with SNRs below approximately −21.36 dB. Table 1

Tables Icon

Table 1. Confusion matrix for face recognition using the CNN trained under low illumination conditions with 3D reconstructed images using 4 test scenes for each of the 6 subjects.

summarizes the results. The proposed 3D system had a 100% accuracy. Figure 11
 figure: Fig. 11

Fig. 11 Overview of training CNN classifier and acquiring test data.

Download Full Size | PDF

below depicts an overview of the classification scheme.

5. Conclusion

In conclusion, we present a 3D InIm system trained in the dark to classify 3D objects obtained under low illumination conditions. Regions of interest obtained from 3D reconstructed images are obtained by denoising the 3D reconstructed image using total- variation regularization using an augmented Lagrange approach followed by face detection. The regions of interest are then inputted into a pre-trained Convolutional Neural Network (CNN). The CNN was trained using 3D InIm reconstructed under low illumination after TV-denoising. The EIs were obtained under various low illumination conditions having different SNRs. The CNN was able to able to recognize the 3D reconstructed faces after TV- denoising with 100% accuracy. Using a single 2D elemental image, regions of interest cannot even be extracted for low illumination conditions. Future work includes more dynamic scenes, utilizing different algorithms to improve image quality and classification in different scene conditions [19], and improving the data set size to create a more robust classifier.

Funding

Night Vision and Electronic Sensors Directorate, Communications-Electronics Research, Development and Engineering Center, US Army (W909MY-12-D-0008).

References

1. N. Levin and Q. Zhang, “A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas,” Remote Sens. Rev. 190, 366–382 (2017). [CrossRef]  

2. B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016). [CrossRef]  

3. D. Boening, T. W. Groemer, and J. Klingauf, “Applicability of an EM-CCD for spatially resolved TIR-ICS,” Opt. Express 18(13), 13516–13528 (2010). [CrossRef]   [PubMed]  

4. Z. Petrášek and K. Suhling, “Photon arrival timing with sub-camera exposure time resolution in wide-field time-resolved photon counting imaging,” Opt. Express 18(24), 24888–24901 (2010). [CrossRef]   [PubMed]  

5. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theory Appl. 7(1), 821–825 (1908). [CrossRef]  

6. J. S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27(13), 1144–1146 (2002). [CrossRef]   [PubMed]  

7. A. Llavador, E. Sánchez-Ortiga, G. Saavedra, B. Javidi, and M. Martínez-Corral, “Free-depths reconstruction with synthetic impulse response in integral imaging,” Opt. Express 23(23), 30127–30135 (2015). [CrossRef]   [PubMed]  

8. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36(7), 1598–1603 (1997). [CrossRef]   [PubMed]  

9. H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A 15(8), 2059–2065 (1998). [CrossRef]  

10. M. Yamaguchi and R. Higashida, “3D touchable holographic light-field display,” Appl. Opt. 55(3), A178–A183 (2016). [CrossRef]   [PubMed]  

11. B. Tavakoli, B. Javidi, and E. Watson, “Three dimensional visualization by photon counting computational Integral Imaging,” Opt. Express 16(7), 4426–4436 (2008). [CrossRef]   [PubMed]  

12. A. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral imaging under low light levels,” IEEE Photon. J. 4(4), 1188–1195 (2012). [CrossRef]  

13. A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017). [CrossRef]   [PubMed]  

14. S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011). [CrossRef]   [PubMed]  

15. P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005). [CrossRef]  

16. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in the Neural Information Processing Systems Conference (2012), pp. 1097–1105.

17. S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997). [CrossRef]   [PubMed]  

18. R. Gonzalez and R. Woods, Digital Image Processing (Pearson, 2008).

19. F. Sadjadi and A. Mahalanobis, “Automatic target recognition,” Proc. SPIE 10648, 106480I (2018).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1
Fig. 1 Integral Imaging. (a) pickup, and (b) 3D reconstruction stages. c = sensor size, p = pitch, f = focal length, z = distance. (c) Parameters details in (a). Ri = chief ray, i = the i-th lens; θI = azimuth angle; ϕi = zenith angle.
Fig. 2
Fig. 2 Synthetic aperture integral imaging (SAII) pick-up and reconstruction stage.
Fig. 3
Fig. 3 Sample elemental images with corresponding SNR and photon/pixel. The SNR of the images shown in Fig. 3(d)–3(f) cannot be computed as the average power of the object regions is less than that of the background. N/A = not applicable.
Fig. 4
Fig. 4 Three-dimensional (3D) reconstructed images at z = 4.5 m corresponding to the elemental images shown in Fig. 3. The SNR and photons/pixel can be computed for low light levels.
Fig. 5
Fig. 5 SNR vs illumination. (a) Graph of SNR (in dB) as a function of decreasing illumination for both 3D reconstructed (Recon.) images taken at z = 4.5 m and 2D elemental images. (b) SNR (dB) as photons/pixel on the object increases for 3D reconstructed image and elemental images. Illumination levels 1 to 17 correspond to the scene light levels used in the experiments with 1 corresponding to the highest illumination level. The photons/pixel for each SNR computed is shown in (b).
Fig. 6
Fig. 6 The (a) elemental image and (b) 3D reconstructed image at z = 4500 mm. The SNRcontrasts are 31.5 dB and 33.76 dB, respectively.
Fig. 7
Fig. 7 2D captured images having an exposure time of (a) 0.010 s, (b) 0.015 s and (c) 0.203 s under low light conditions. The SNRcontrast of the images shown in (a) and (b) cannot be computed as the object region intensity is less than that of the background. The SNRcontrast of (c) is 8.213 dB.
Fig. 8
Fig. 8 (a) Average of 72 elemental 2D images, and (b) the 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.015 s for each elemental image. The SNRcontrast is 6.38 dB in (a) and 16.702 dB in (b), respectively. (c) Average of 72 elemental 2D images and (d) the corresponding 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.010 s for each elemental image. The SNRcontrast is 2.152 dB in (c) and 15.94 dB in (d), respectively.
Fig. 9
Fig. 9 Correlation between 3D reconstructed image at z = 4.5 m after TV-denoising using EIs (a) obtained under SNR of 10.41 dB as a reference. This 3D image was correlated with 3D reconstructed images after TV-denoising whose EIs were obtained under an SNR of −12.75 dB for (b) true class object, and (c) false class object; with correlation values of 0.58 and 0.48, respectively. Classification is difficult.
Fig. 10
Fig. 10 Example denoised 3D reconstructed training images acquired at various SNRs, reconstruction depths, additive noise, and rotations.
Fig. 11
Fig. 11 Overview of training CNN classifier and acquiring test data.

Tables (1)

Tables Icon

Table 1 Confusion matrix for face recognition using the CNN trained under low illumination conditions with 3D reconstructed images using 4 test scenes for each of the 6 subjects.

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

I( x,y;z )= 1 O( x,y ) k=0 K1 B=0 B1 E k,b ( xk L x × p x c x ×M ,yb L y × p y c y ×M ) ,
I( x,y;z )= 1 O( x,y ) k=0 K1 b=0 B1 ( E k,b ( x',y' )+ ε r k,b ( x',y' ) ), = 1 O( x,y ) k=0 K1 b=0 B1 E k,b ( x',y' )+ 1 O( x,y ) k=0 K1 b=0 B1 ε r k,b ( x',y' ) ,
var( 1 O( x,y ) k=0 K1 b=0 B1 ε r k,b ( x',y' ) )= 1 O( x,y ) σ 2 ,
SNR= ( g o 2 N 2 )/ N 2 ,
Φ o t= N photons SNR× n r / Q e ,
SN R contrast = μ obj / σ noise ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.