Abstract

We propose a framework for three-dimensional (3D) object recognition and classification in very low illumination environments using convolutional neural networks (CNNs). 3D images are reconstructed using 3D integral imaging (InIm) with conventional visible spectrum image sensors. After imaging the low light scene using 3D InIm, the 3D reconstructed image has a higher signal-to-noise ratio than a single 2D image, which is a result of 3D InIm being optimal in the maximum likelihood sense for read-noise dominant images. Once 3D reconstruction has been performed, the 3D image is denoised and regions of interest are extracted to detect 3D objects in a scene. The extracted regions are then inputted into a CNN, which was trained under low illumination conditions using 3D InIm reconstructed images, to perform object recognition. To the best of our knowledge, this is the first report of utilizing 3D InIm and convolutional neural networks for 3D training and 3D object classification under very low illumination conditions.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Full Article  |  PDF Article
OSA Recommended Articles
Spatial-temporal human gesture recognition under degraded conditions using three-dimensional integral imaging

Xin Shen, Hee-seung Kim, Komatsu Satoru, Adam Markman, and Bahram Javidi
Opt. Express 26(11) 13938-13951 (2018)

Distortion-tolerant 3D recognition of underwater objects using neural networks

Robert Schulein, Cuong Manh Do, and Bahram Javidi
J. Opt. Soc. Am. A 27(3) 461-468 (2010)

Polarimetric 3D integral imaging in photon-starved conditions

Artur Carnicer and Bahram Javidi
Opt. Express 23(5) 6408-6417 (2015)

References

  • View by:
  • |
  • |
  • |

  1. N. Levin and Q. Zhang, “A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas,” Remote Sens. Rev. 190, 366–382 (2017).
    [Crossref]
  2. B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
    [Crossref]
  3. D. Boening, T. W. Groemer, and J. Klingauf, “Applicability of an EM-CCD for spatially resolved TIR-ICS,” Opt. Express 18(13), 13516–13528 (2010).
    [Crossref] [PubMed]
  4. Z. Petrášek and K. Suhling, “Photon arrival timing with sub-camera exposure time resolution in wide-field time-resolved photon counting imaging,” Opt. Express 18(24), 24888–24901 (2010).
    [Crossref] [PubMed]
  5. G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theory Appl. 7(1), 821–825 (1908).
    [Crossref]
  6. J. S. Jang and B. Javidi, “Three-dimensional synthetic aperture integral imaging,” Opt. Lett. 27(13), 1144–1146 (2002).
    [Crossref] [PubMed]
  7. A. Llavador, E. Sánchez-Ortiga, G. Saavedra, B. Javidi, and M. Martínez-Corral, “Free-depths reconstruction with synthetic impulse response in integral imaging,” Opt. Express 23(23), 30127–30135 (2015).
    [Crossref] [PubMed]
  8. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36(7), 1598–1603 (1997).
    [Crossref] [PubMed]
  9. H. Hoshino, F. Okano, H. Isono, and I. Yuyama, “Analysis of resolution limitation of integral photography,” J. Opt. Soc. Am. A 15(8), 2059–2065 (1998).
    [Crossref]
  10. M. Yamaguchi and R. Higashida, “3D touchable holographic light-field display,” Appl. Opt. 55(3), A178–A183 (2016).
    [Crossref] [PubMed]
  11. B. Tavakoli, B. Javidi, and E. Watson, “Three dimensional visualization by photon counting computational Integral Imaging,” Opt. Express 16(7), 4426–4436 (2008).
    [Crossref] [PubMed]
  12. A. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral imaging under low light levels,” IEEE Photon. J. 4(4), 1188–1195 (2012).
    [Crossref]
  13. A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017).
    [Crossref] [PubMed]
  14. S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
    [Crossref] [PubMed]
  15. P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005).
    [Crossref]
  16. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in the Neural Information Processing Systems Conference (2012), pp. 1097–1105.
  17. S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997).
    [Crossref] [PubMed]
  18. R. Gonzalez and R. Woods, Digital Image Processing (Pearson, 2008).
  19. F. Sadjadi and A. Mahalanobis, “Automatic target recognition,” Proc. SPIE 10648, 106480I (2018).

2018 (1)

F. Sadjadi and A. Mahalanobis, “Automatic target recognition,” Proc. SPIE 10648, 106480I (2018).

2017 (2)

A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017).
[Crossref] [PubMed]

N. Levin and Q. Zhang, “A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas,” Remote Sens. Rev. 190, 366–382 (2017).
[Crossref]

2016 (2)

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

M. Yamaguchi and R. Higashida, “3D touchable holographic light-field display,” Appl. Opt. 55(3), A178–A183 (2016).
[Crossref] [PubMed]

2015 (1)

2012 (1)

A. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral imaging under low light levels,” IEEE Photon. J. 4(4), 1188–1195 (2012).
[Crossref]

2011 (1)

S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
[Crossref] [PubMed]

2010 (2)

2008 (1)

2005 (1)

P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005).
[Crossref]

2002 (1)

1998 (1)

1997 (2)

F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36(7), 1598–1603 (1997).
[Crossref] [PubMed]

S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997).
[Crossref] [PubMed]

1908 (1)

G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theory Appl. 7(1), 821–825 (1908).
[Crossref]

Aloni, D.

A. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral imaging under low light levels,” IEEE Photon. J. 4(4), 1188–1195 (2012).
[Crossref]

Arai, J.

Back, A. D.

S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997).
[Crossref] [PubMed]

Boening, D.

Chan, S. H.

S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
[Crossref] [PubMed]

Gibson, K. B.

S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
[Crossref] [PubMed]

Giles, C. L.

S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997).
[Crossref] [PubMed]

Gill, P. E.

S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
[Crossref] [PubMed]

Groemer, T. W.

Gruber, D.

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

Higashida, R.

Hinton, G.

A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in the Neural Information Processing Systems Conference (2012), pp. 1097–1105.

Hoshino, H.

Isono, H.

Jang, J. S.

Javidi, B.

Jones, M.

P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005).
[Crossref]

Khoshabeh, R.

S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
[Crossref] [PubMed]

Klingauf, J.

Krizhevsky, A.

A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in the Neural Information Processing Systems Conference (2012), pp. 1097–1105.

Lawrence, S.

S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997).
[Crossref] [PubMed]

Levin, N.

N. Levin and Q. Zhang, “A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas,” Remote Sens. Rev. 190, 366–382 (2017).
[Crossref]

Lippmann, G.

G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theory Appl. 7(1), 821–825 (1908).
[Crossref]

Llavador, A.

Mahalanobis, A.

F. Sadjadi and A. Mahalanobis, “Automatic target recognition,” Proc. SPIE 10648, 106480I (2018).

Markman, A.

Martínez-Corral, M.

Nguyen, T. Q.

S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
[Crossref] [PubMed]

Okano, F.

Petrášek, Z.

Phillips, B.

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

Pieribone, V.

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

Roman, C.

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

Saavedra, G.

Sadjadi, F.

F. Sadjadi and A. Mahalanobis, “Automatic target recognition,” Proc. SPIE 10648, 106480I (2018).

Sánchez-Ortiga, E.

Shen, X.

Snow, D.

P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005).
[Crossref]

Sparks, J.

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

Stern, A. A.

A. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral imaging under low light levels,” IEEE Photon. J. 4(4), 1188–1195 (2012).
[Crossref]

Suhling, K.

Sutskever, I.

A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in the Neural Information Processing Systems Conference (2012), pp. 1097–1105.

Tavakoli, B.

Tsoi, A. C.

S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997).
[Crossref] [PubMed]

Vasan, G.

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

Viola, P.

P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005).
[Crossref]

Watson, E.

Yamaguchi, M.

Yuyama, I.

Zhang, Q.

N. Levin and Q. Zhang, “A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas,” Remote Sens. Rev. 190, 366–382 (2017).
[Crossref]

Appl. Opt. (2)

Deep Sea Res. Part I Oceanogr. Res. Pap. (1)

B. Phillips, D. Gruber, G. Vasan, C. Roman, V. Pieribone, and J. Sparks, “Observations of in situ deep-sea marine bioluminescence with a high-speed, high-resolution sCMOS camera,” Deep Sea Res. Part I Oceanogr. Res. Pap. 111, 102–109 (2016).
[Crossref]

IEEE Photon. J. (1)

A. A. Stern, D. Aloni, and B. Javidi, “Experiments with three-dimensional integral imaging under low light levels,” IEEE Photon. J. 4(4), 1188–1195 (2012).
[Crossref]

IEEE Trans. Image Process. (1)

S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process. 20(11), 3097–3111 (2011).
[Crossref] [PubMed]

IEEE Trans. Neural Netw. (1)

S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach,” IEEE Trans. Neural Netw. 8(1), 98–113 (1997).
[Crossref] [PubMed]

Int. J. Comput. Vis. (1)

P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. J. Comput. Vis. 63(2), 153–161 (2005).
[Crossref]

J. Opt. Soc. Am. A (1)

J. Phys. Theory Appl. (1)

G. Lippmann, “Épreuves réversibles donnant la sensation du relief,” J. Phys. Theory Appl. 7(1), 821–825 (1908).
[Crossref]

Opt. Express (4)

Opt. Lett. (2)

Proc. SPIE (1)

F. Sadjadi and A. Mahalanobis, “Automatic target recognition,” Proc. SPIE 10648, 106480I (2018).

Remote Sens. Rev. (1)

N. Levin and Q. Zhang, “A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas,” Remote Sens. Rev. 190, 366–382 (2017).
[Crossref]

Other (2)

A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in the Neural Information Processing Systems Conference (2012), pp. 1097–1105.

R. Gonzalez and R. Woods, Digital Image Processing (Pearson, 2008).

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1
Fig. 1 Integral Imaging. (a) pickup, and (b) 3D reconstruction stages. c = sensor size, p = pitch, f = focal length, z = distance. (c) Parameters details in (a). Ri = chief ray, i = the i-th lens; θI = azimuth angle; ϕi = zenith angle.
Fig. 2
Fig. 2 Synthetic aperture integral imaging (SAII) pick-up and reconstruction stage.
Fig. 3
Fig. 3 Sample elemental images with corresponding SNR and photon/pixel. The SNR of the images shown in Fig. 3(d)–3(f) cannot be computed as the average power of the object regions is less than that of the background. N/A = not applicable.
Fig. 4
Fig. 4 Three-dimensional (3D) reconstructed images at z = 4.5 m corresponding to the elemental images shown in Fig. 3. The SNR and photons/pixel can be computed for low light levels.
Fig. 5
Fig. 5 SNR vs illumination. (a) Graph of SNR (in dB) as a function of decreasing illumination for both 3D reconstructed (Recon.) images taken at z = 4.5 m and 2D elemental images. (b) SNR (dB) as photons/pixel on the object increases for 3D reconstructed image and elemental images. Illumination levels 1 to 17 correspond to the scene light levels used in the experiments with 1 corresponding to the highest illumination level. The photons/pixel for each SNR computed is shown in (b).
Fig. 6
Fig. 6 The (a) elemental image and (b) 3D reconstructed image at z = 4500 mm. The SNRcontrasts are 31.5 dB and 33.76 dB, respectively.
Fig. 7
Fig. 7 2D captured images having an exposure time of (a) 0.010 s, (b) 0.015 s and (c) 0.203 s under low light conditions. The SNRcontrast of the images shown in (a) and (b) cannot be computed as the object region intensity is less than that of the background. The SNRcontrast of (c) is 8.213 dB.
Fig. 8
Fig. 8 (a) Average of 72 elemental 2D images, and (b) the 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.015 s for each elemental image. The SNRcontrast is 6.38 dB in (a) and 16.702 dB in (b), respectively. (c) Average of 72 elemental 2D images and (d) the corresponding 3D InIm reconstructed image at z = 4.5 m using an exposure time of 0.010 s for each elemental image. The SNRcontrast is 2.152 dB in (c) and 15.94 dB in (d), respectively.
Fig. 9
Fig. 9 Correlation between 3D reconstructed image at z = 4.5 m after TV-denoising using EIs (a) obtained under SNR of 10.41 dB as a reference. This 3D image was correlated with 3D reconstructed images after TV-denoising whose EIs were obtained under an SNR of −12.75 dB for (b) true class object, and (c) false class object; with correlation values of 0.58 and 0.48, respectively. Classification is difficult.
Fig. 10
Fig. 10 Example denoised 3D reconstructed training images acquired at various SNRs, reconstruction depths, additive noise, and rotations.
Fig. 11
Fig. 11 Overview of training CNN classifier and acquiring test data.

Tables (1)

Tables Icon

Table 1 Confusion matrix for face recognition using the CNN trained under low illumination conditions with 3D reconstructed images using 4 test scenes for each of the 6 subjects.

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

I( x,y;z )= 1 O( x,y ) k=0 K1 B=0 B1 E k,b ( xk L x × p x c x ×M ,yb L y × p y c y ×M ) ,
I( x,y;z )= 1 O( x,y ) k=0 K1 b=0 B1 ( E k,b ( x',y' )+ ε r k,b ( x',y' ) ), = 1 O( x,y ) k=0 K1 b=0 B1 E k,b ( x',y' )+ 1 O( x,y ) k=0 K1 b=0 B1 ε r k,b ( x',y' ) ,
var( 1 O( x,y ) k=0 K1 b=0 B1 ε r k,b ( x',y' ) )= 1 O( x,y ) σ 2 ,
SNR= ( g o 2 N 2 )/ N 2 ,
Φ o t= N photons SNR× n r / Q e ,
SN R contrast = μ obj / σ noise ,

Metrics