Abstract

Single-photon light detection and ranging (LiDAR) techniques use emerging single-photon detectors (SPADs) to push 3D imaging capabilities to unprecedented ranges. However, it remains challenging to robustly estimate scene depth from the noisy and otherwise corrupted measurements recorded by a SPAD. Here, we propose a deep sensor fusion strategy that combines corrupted SPAD data and a conventional 2D image to estimate the depth of a scene. Our primary contribution is a neural network architecture—SPADnet—that uses a monocular depth estimation algorithm together with a SPAD denoising and sensor fusion strategy. This architecture, together with several techniques in network training, achieves state-of-the-art results for RGB-SPAD fusion with simulated and captured data. Moreover, SPADnet is more computationally efficient than previous RGB-SPAD fusion networks.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Computer Vision – ACCV 2016, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, eds. (Springer International Publishing, Cham, 2017), pp. 213–228
  2. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The Int. J. Robotics Res. 32(11), 1231–1237 (2013).
    [Crossref]
  3. J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
    [Crossref]
  4. P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
    [Crossref]
  5. C. Cadena, A. R. Dick, and I. D. Reid, “Multi-modal auto-encoders as joint estimators for robotics scene understanding,” in Robotics: Science and Systems, vol. 5 (2016), p. 1.
  6. H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).
  7. D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2014), pp. 2366–2374.
  8. A. Gordon, H. Li, R. Jonschkowski, and A. Angelova, “Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras,” CoRR abs/1904.04998 (2019).
  9. D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.
  10. C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).
  11. A. McCarthy, N. J. Krichel, N. R. Gemmell, X. Ren, M. G. Tanner, S. N. Dorenbos, V. Zwiller, R. H. Hadfield, and G. S. Buller, “Kilometer-range, high resolution depth imaging via 1560 nm wavelength single-photon detection,” Opt. Express 21(7), 8904–8915 (2013).
    [Crossref]
  12. A. M. Pawlikowska, A. Halimi, R. A. Lamb, and G. S. Buller, “Single-photon three-dimensional imaging at up to 10 kilometers range,” Opt. Express 25(10), 11919–11931 (2017).
    [Crossref]
  13. Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).
  14. A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
    [Crossref]
  15. J. Rapp and V. K. Goyal, “A few photons among many: Unmixing signal and noise for photon-efficient active imaging,” IEEE Trans. Comput. Imaging 3(3), 445–459 (2017).
    [Crossref]
  16. Y. Altmann, R. Aspden, M. Padgett, and S. McLaughlin, “A bayesian approach to denoising of single-photon binary images,” IEEE Trans. Comput. Imaging 3(3), 460–471 (2017).
    [Crossref]
  17. F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018).
    [Crossref]
  18. A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in Proc. CVPR, (2019).
  19. M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).
  20. S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.
  21. D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
    [Crossref]
  22. D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015).
    [Crossref]
  23. D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3d imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018).
    [Crossref]
  24. C. Ti, R. Yang, J. Davis, and Z. Pan, “Simultaneous time-of-flight sensing and photometric stereo with a single tof sensor,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),, (2015).
  25. J. Diebel and S. Thrun, “An application of markov random fields to range sensing,” in Advances in neural information processing systems, (2006), pp. 291–298.
  26. D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of the IEEE International Conference on Computer Vision, (2013), pp. 993–1000.
  27. J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).
    [Crossref]
  28. P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
    [Crossref]
  29. J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).
  30. Y. Zhang and T. Funkhouser, “Deep depth completion of a single rgb-d image,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).
  31. F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera,” (2018).
  32. J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.
  33. A. Eldesokey, M. Felsberg, and F. S. Khan, “Confidence propagation through cnns for guided sparse depth regression,” IEEE Trans. Pattern Anal. Mach. Intell.1 (2019).
  34. M. Jaritz, R. D. Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV), (2018), pp. 52–60.
  35. T.-W. Hui, C. C. Loy, and X. Tang, “Depth map super-resolution by deep multi-scale guidance,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 353–369.
  36. I. Eichhardt, D. Chetverikov, and Z. Jankó, “Image-guided tof depth upsampling: a survey,” Mach. Vis. Appl. 28(3-4), 267–282 (2017).
    [Crossref]
  37. Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
    [Crossref]
  38. Y. Li, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep joint image filtering,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 154–169.
  39. I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” (2018).
  40. A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).
  41. A. Ingle, A. Velten, and M. Gupta, “High flux passive imaging with single-photon sensors,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).
  42. E. Charbon, “Single-photon imaging in complementary metal oxide semiconductor processes,” Phil. Trans. R. Soc. A 372(2012), 20130100 (2014).
    [Crossref]
  43. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).
  44. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).
  45. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision – ECCV 2012, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012) pp. 746–760.
  46. S. Burri, C. Bruschini, and E. Charbon, “Linospad: a compact linear spad camera system with 64 fpga-based tdc modules for versatile 50 ps resolution time-resolved imaging,” Instruments 1(1), 6 (2017).
    [Crossref]
  47. S. Burri, H. Homulle, C. Bruschini, and E. Charbon, “Linospad: a time-resolved 256x1 cmos spad line sensor system featuring 64 fpga-based tdc channels running at up to 8.5 giga-events per second,” in Optical Sensing and Detection IV, vol. 9899 (International Society for Optics and Photonics, 2016), p. 98990D.
  48. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.
  49. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2014).
  50. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
    [Crossref]
  51. K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, (Springer, 2016), 630–645.

2019 (1)

Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
[Crossref]

2018 (3)

F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018).
[Crossref]

D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3d imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018).
[Crossref]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
[Crossref]

2017 (5)

S. Burri, C. Bruschini, and E. Charbon, “Linospad: a compact linear spad camera system with 64 fpga-based tdc modules for versatile 50 ps resolution time-resolved imaging,” Instruments 1(1), 6 (2017).
[Crossref]

I. Eichhardt, D. Chetverikov, and Z. Jankó, “Image-guided tof depth upsampling: a survey,” Mach. Vis. Appl. 28(3-4), 267–282 (2017).
[Crossref]

A. M. Pawlikowska, A. Halimi, R. A. Lamb, and G. S. Buller, “Single-photon three-dimensional imaging at up to 10 kilometers range,” Opt. Express 25(10), 11919–11931 (2017).
[Crossref]

J. Rapp and V. K. Goyal, “A few photons among many: Unmixing signal and noise for photon-efficient active imaging,” IEEE Trans. Comput. Imaging 3(3), 445–459 (2017).
[Crossref]

Y. Altmann, R. Aspden, M. Padgett, and S. McLaughlin, “A bayesian approach to denoising of single-photon binary images,” IEEE Trans. Comput. Imaging 3(3), 460–471 (2017).
[Crossref]

2016 (2)

J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
[Crossref]

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

2015 (1)

D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015).
[Crossref]

2014 (2)

E. Charbon, “Single-photon imaging in complementary metal oxide semiconductor processes,” Phil. Trans. R. Soc. A 372(2012), 20130100 (2014).
[Crossref]

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

2013 (2)

2012 (2)

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

2007 (1)

J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).
[Crossref]

Adam, H.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Ahuja, N.

Y. Li, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep joint image filtering,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 154–169.

Alhashim, I.

I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” (2018).

Altmann, Y.

Y. Altmann, R. Aspden, M. Padgett, and S. McLaughlin, “A bayesian approach to denoising of single-photon binary images,” IEEE Trans. Comput. Imaging 3(3), 460–471 (2017).
[Crossref]

Andreetto, M.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Angelova, A.

A. Gordon, H. Li, R. Jonschkowski, and A. Angelova, “Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras,” CoRR abs/1904.04998 (2019).

Ashraf, K.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).

Aspden, R.

Y. Altmann, R. Aspden, M. Padgett, and S. McLaughlin, “A bayesian approach to denoising of single-photon binary images,” IEEE Trans. Comput. Imaging 3(3), 460–471 (2017).
[Crossref]

Ba, J.

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2014).

Batmanghelich, K.

H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

Bischof, H.

D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of the IEEE International Conference on Computer Vision, (2013), pp. 993–1000.

Brostow, G. J.

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

Brox, T.

J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.

Bruschini, C.

S. Burri, C. Bruschini, and E. Charbon, “Linospad: a compact linear spad camera system with 64 fpga-based tdc modules for versatile 50 ps resolution time-resolved imaging,” Instruments 1(1), 6 (2017).
[Crossref]

S. Burri, H. Homulle, C. Bruschini, and E. Charbon, “Linospad: a time-resolved 256x1 cmos spad line sensor system featuring 64 fpga-based tdc channels running at up to 8.5 giga-events per second,” in Optical Sensing and Detection IV, vol. 9899 (International Society for Optics and Photonics, 2016), p. 98990D.

Buller, G. S.

Burri, S.

S. Burri, C. Bruschini, and E. Charbon, “Linospad: a compact linear spad camera system with 64 fpga-based tdc modules for versatile 50 ps resolution time-resolved imaging,” Instruments 1(1), 6 (2017).
[Crossref]

S. Burri, H. Homulle, C. Bruschini, and E. Charbon, “Linospad: a time-resolved 256x1 cmos spad line sensor system featuring 64 fpga-based tdc channels running at up to 8.5 giga-events per second,” in Optical Sensing and Detection IV, vol. 9899 (International Society for Optics and Photonics, 2016), p. 98990D.

Cadena, C.

C. Cadena, A. R. Dick, and I. D. Reid, “Multi-modal auto-encoders as joint estimators for robotics scene understanding,” in Robotics: Science and Systems, vol. 5 (2016), p. 1.

Cao, Y.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Cavalheiro, G. V.

F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera,” (2018).

Charbon, E.

S. Burri, C. Bruschini, and E. Charbon, “Linospad: a compact linear spad camera system with 64 fpga-based tdc modules for versatile 50 ps resolution time-resolved imaging,” Instruments 1(1), 6 (2017).
[Crossref]

E. Charbon, “Single-photon imaging in complementary metal oxide semiconductor processes,” Phil. Trans. R. Soc. A 372(2012), 20130100 (2014).
[Crossref]

S. Burri, H. Homulle, C. Bruschini, and E. Charbon, “Linospad: a time-resolved 256x1 cmos spad line sensor system featuring 64 fpga-based tdc channels running at up to 8.5 giga-events per second,” in Optical Sensing and Detection IV, vol. 9899 (International Society for Optics and Photonics, 2016), p. 98990D.

Charette, R. D.

M. Jaritz, R. D. Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV), (2018), pp. 52–60.

Chen, B.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Chen, L.-C.

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
[Crossref]

Chetverikov, D.

I. Eichhardt, D. Chetverikov, and Z. Jankó, “Image-guided tof depth upsampling: a survey,” Mach. Vis. Appl. 28(3-4), 267–282 (2017).
[Crossref]

Cohen, M. F.

J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).
[Crossref]

Colaço, A.

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

Cremers, D.

C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Computer Vision – ACCV 2016, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, eds. (Springer International Publishing, Cham, 2017), pp. 213–228

Cui, Z.

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

Dally, W. J.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).

Davis, J.

C. Ti, R. Yang, J. Davis, and Z. Pan, “Simultaneous time-of-flight sensing and photometric stereo with a single tof sensor,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),, (2015).

Diamond, S.

F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018).
[Crossref]

M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).

Dick, A. R.

C. Cadena, A. R. Dick, and I. D. Reid, “Multi-modal auto-encoders as joint estimators for robotics scene understanding,” in Robotics: Science and Systems, vol. 5 (2016), p. 1.

Diebel, J.

J. Diebel and S. Thrun, “An application of markov random fields to range sensing,” in Advances in neural information processing systems, (2006), pp. 291–298.

do Monte Lima, J. P. S.

J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
[Crossref]

Domokos, C.

C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Computer Vision – ACCV 2016, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, eds. (Springer International Publishing, Cham, 2017), pp. 213–228

Dorenbos, S. N.

Eichhardt, I.

I. Eichhardt, D. Chetverikov, and Z. Jankó, “Image-guided tof depth upsampling: a survey,” Mach. Vis. Appl. 28(3-4), 267–282 (2017).
[Crossref]

Eigen, D.

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2014), pp. 2366–2374.

Eldesokey, A.

A. Eldesokey, M. Felsberg, and F. S. Khan, “Confidence propagation through cnns for guided sparse depth regression,” IEEE Trans. Pattern Anal. Mach. Intell.1 (2019).

Felsberg, M.

A. Eldesokey, M. Felsberg, and F. S. Khan, “Confidence propagation through cnns for guided sparse depth regression,” IEEE Trans. Pattern Anal. Mach. Intell.1 (2019).

Feng, D. D.

Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
[Crossref]

Fergus, R.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision – ECCV 2012, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012) pp. 746–760.

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2014), pp. 2366–2374.

Ferstl, D.

D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of the IEEE International Conference on Computer Vision, (2013), pp. 993–1000.

Fischer, P.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.

Fox, D.

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

Franke, U.

J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.

Fu, H.

H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

Funkhouser, T.

Y. Zhang and T. Funkhouser, “Deep depth completion of a single rgb-d image,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

Geiger, A.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The Int. J. Robotics Res. 32(11), 1231–1237 (2013).
[Crossref]

J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.

Gemmell, N. R.

Gkioulekas, I.

S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.

Godard, C.

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

Gong, M.

H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

Gordon, A.

A. Gordon, H. Li, R. Jonschkowski, and A. Angelova, “Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras,” CoRR abs/1904.04998 (2019).

Goyal, V. K.

J. Rapp and V. K. Goyal, “A few photons among many: Unmixing signal and noise for photon-efficient active imaging,” IEEE Trans. Comput. Imaging 3(3), 445–459 (2017).
[Crossref]

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015).
[Crossref]

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

Gupta, A.

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in Proc. CVPR, (2019).

Gupta, M.

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in Proc. CVPR, (2019).

A. Ingle, A. Velten, and M. Gupta, “High flux passive imaging with single-photon sensors,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

Hadfield, R. H.

Halimi, A.

Han, S.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).

Hazirbas, C.

C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Computer Vision – ACCV 2016, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, eds. (Springer International Publishing, Cham, 2017), pp. 213–228

He, K.

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, (Springer, 2016), 630–645.

Heide, F.

F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018).
[Crossref]

M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).

Henry, P.

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

Herbst, E.

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

Hirschmüller, H.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

Hoiem, D.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision – ECCV 2012, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012) pp. 746–760.

Homulle, H.

S. Burri, H. Homulle, C. Bruschini, and E. Charbon, “Linospad: a time-resolved 256x1 cmos spad line sensor system featuring 64 fpga-based tdc channels running at up to 8.5 giga-events per second,” in Optical Sensing and Detection IV, vol. 9899 (International Society for Optics and Photonics, 2016), p. 98990D.

Howard, A. G.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Huang, J.-B.

Y. Li, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep joint image filtering,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 154–169.

Huang, X.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Hui, T.-W.

T.-W. Hui, C. C. Loy, and X. Tang, “Depth map super-resolution by deep multi-scale guidance,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 353–369.

Iandola, F. N.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).

Ingle, A.

A. Ingle, A. Velten, and M. Gupta, “High flux passive imaging with single-photon sensors,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in Proc. CVPR, (2019).

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

Jankó, Z.

I. Eichhardt, D. Chetverikov, and Z. Jankó, “Image-guided tof depth upsampling: a survey,” Mach. Vis. Appl. 28(3-4), 267–282 (2017).
[Crossref]

Jaritz, M.

M. Jaritz, R. D. Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV), (2018), pp. 52–60.

Jin, W.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Jonschkowski, R.

A. Gordon, H. Li, R. Jonschkowski, and A. Angelova, “Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras,” CoRR abs/1904.04998 (2019).

Kalenichenko, D.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Karaman, S.

F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera,” (2018).

Keutzer, K.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).

Khan, F. S.

A. Eldesokey, M. Felsberg, and F. S. Khan, “Confidence propagation through cnns for guided sparse depth regression,” IEEE Trans. Pattern Anal. Mach. Intell.1 (2019).

Kingma, D. P.

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2014).

Kirmani, A.

D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015).
[Crossref]

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

Kitajima, Y.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

Kohli, P.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision – ECCV 2012, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012) pp. 746–760.

Kokkinos, I.

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
[Crossref]

Kopf, J.

J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).
[Crossref]

Krainin, M.

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

Krathwohl, G.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

Krichel, N. J.

Kutulakos, K. N.

S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.

Lamb, R. A.

Lenz, P.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The Int. J. Robotics Res. 32(11), 1231–1237 (2013).
[Crossref]

Li, H.

A. Gordon, H. Li, R. Jonschkowski, and A. Angelova, “Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras,” CoRR abs/1904.04998 (2019).

Li, P.

Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
[Crossref]

Li, Y.

Y. Li, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep joint image filtering,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 154–169.

Li, Y.-H.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Li, Z.-P.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Lin, W.

Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
[Crossref]

Lindell, D. B.

D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3d imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018).
[Crossref]

F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018).
[Crossref]

M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).

Lischinski, D.

J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).
[Crossref]

Liu, S.

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

Loy, C. C.

T.-W. Hui, C. C. Loy, and X. Tang, “Depth map super-resolution by deep multi-scale guidance,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 353–369.

Lussana, R.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

Ma, F.

F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera,” (2018).

Ma, L.

C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Computer Vision – ACCV 2016, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, eds. (Springer International Publishing, Cham, 2017), pp. 213–228

Mac Aodha, O.

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

Marchand, E.

J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
[Crossref]

McCarthy, A.

McLaughlin, S.

Y. Altmann, R. Aspden, M. Padgett, and S. McLaughlin, “A bayesian approach to denoising of single-photon binary images,” IEEE Trans. Comput. Imaging 3(3), 460–471 (2017).
[Crossref]

Moskewicz, M. W.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).

Murphy, K.

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
[Crossref]

Narasimhan, S. G.

S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.

Nashashibi, F.

M. Jaritz, R. D. Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV), (2018), pp. 52–60.

Nešic, N.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

Nousias, S.

S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.

O’Toole, M.

D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3d imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018).
[Crossref]

M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).

Padgett, M.

Y. Altmann, R. Aspden, M. Padgett, and S. McLaughlin, “A bayesian approach to denoising of single-photon binary images,” IEEE Trans. Comput. Imaging 3(3), 460–471 (2017).
[Crossref]

Pan, Z.

C. Ti, R. Yang, J. Davis, and Z. Pan, “Simultaneous time-of-flight sensing and photometric stereo with a single tof sensor,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),, (2015).

Papandreou, G.

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
[Crossref]

Pawlikowska, A. M.

Peng, C.-Z.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Perrotton, X.

M. Jaritz, R. D. Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV), (2018), pp. 52–60.

Pollefeys, M.

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

Puhrsch, C.

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2014), pp. 2366–2374.

Qiu, J.

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

Ranftl, R.

D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of the IEEE International Conference on Computer Vision, (2013), pp. 993–1000.

Rapp, J.

J. Rapp and V. K. Goyal, “A few photons among many: Unmixing signal and noise for photon-efficient active imaging,” IEEE Trans. Comput. Imaging 3(3), 445–459 (2017).
[Crossref]

Reid, I. D.

C. Cadena, A. R. Dick, and I. D. Reid, “Multi-modal auto-encoders as joint estimators for robotics scene understanding,” in Robotics: Science and Systems, vol. 5 (2016), p. 1.

Reinbacher, C.

D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of the IEEE International Conference on Computer Vision, (2013), pp. 993–1000.

Ren, S.

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, (Springer, 2016), 630–645.

Ren, X.

A. McCarthy, N. J. Krichel, N. R. Gemmell, X. Ren, M. G. Tanner, S. N. Dorenbos, V. Zwiller, R. H. Hadfield, and G. S. Buller, “Kilometer-range, high resolution depth imaging via 1560 nm wavelength single-photon detection,” Opt. Express 21(7), 8904–8915 (2013).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

Ronneberger, O.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.

Rüther, M.

D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of the IEEE International Conference on Computer Vision, (2013), pp. 993–1000.

Sankaranarayanan, A. C.

S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.

Scharstein, D.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

Schneider, L.

J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.

Schneider, N.

J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.

Shapiro, J. H.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015).
[Crossref]

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

Sheng, B.

Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
[Crossref]

Shin, D.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015).
[Crossref]

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

Silberman, N.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision – ECCV 2012, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012) pp. 746–760.

Simões, F. P. M.

J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
[Crossref]

Stiller, C.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The Int. J. Robotics Res. 32(11), 1231–1237 (2013).
[Crossref]

Sun, J.

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, (Springer, 2016), 630–645.

Tang, X.

T.-W. Hui, C. C. Loy, and X. Tang, “Depth map super-resolution by deep multi-scale guidance,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 353–369.

Tanner, M. G.

Tao, D.

H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

Teichrieb, V.

J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
[Crossref]

Thrun, S.

J. Diebel and S. Thrun, “An application of markov random fields to range sensing,” in Advances in neural information processing systems, (2006), pp. 291–298.

Ti, C.

C. Ti, R. Yang, J. Davis, and Z. Pan, “Simultaneous time-of-flight sensing and photometric stereo with a single tof sensor,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),, (2015).

Uchiyama, H.

J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
[Crossref]

Uhrig, J.

J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.

Urtasun, R.

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The Int. J. Robotics Res. 32(11), 1231–1237 (2013).
[Crossref]

Uyttendaele, M.

J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).
[Crossref]

Velten, A.

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in Proc. CVPR, (2019).

A. Ingle, A. Velten, and M. Gupta, “High flux passive imaging with single-photon sensors,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

Venkatraman, D.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

Villa, F.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

Wang, B.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Wang, C.

H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

Wang, W.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Wang, X.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

Wen, Y.

Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
[Crossref]

Westling, P.

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

Wetzstein, G.

F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018).
[Crossref]

D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3d imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018).
[Crossref]

M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).

Weyand, T.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Wirbel, E.

M. Jaritz, R. D. Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV), (2018), pp. 52–60.

Wong, F. N.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

Wonka, P.

I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” (2018).

Xin, S.

S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.

Xu, F.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

Yang, M.-H.

Y. Li, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep joint image filtering,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 154–169.

Yang, R.

C. Ti, R. Yang, J. Davis, and Z. Pan, “Simultaneous time-of-flight sensing and photometric stereo with a single tof sensor,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),, (2015).

Yu, C.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Yuille, A. L.

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
[Crossref]

Zang, K.

M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).

Zappa, F.

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

Zeng, B.

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

Zhang, J.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Zhang, Q.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

Zhang, X.

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, (Springer, 2016), 630–645.

Zhang, Y.

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

Y. Zhang and T. Funkhouser, “Deep depth completion of a single rgb-d image,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

Zhu, M.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

Zwiller, V.

ACM Trans. Graph. (2)

J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96 (2007).
[Crossref]

D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3d imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018).
[Crossref]

IEEE Trans. Comput. Imaging (3)

J. Rapp and V. K. Goyal, “A few photons among many: Unmixing signal and noise for photon-efficient active imaging,” IEEE Trans. Comput. Imaging 3(3), 445–459 (2017).
[Crossref]

Y. Altmann, R. Aspden, M. Padgett, and S. McLaughlin, “A bayesian approach to denoising of single-photon binary images,” IEEE Trans. Comput. Imaging 3(3), 460–471 (2017).
[Crossref]

D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photon-efficient computational 3-d and reflectivity imaging with single-photon detectors,” IEEE Trans. Comput. Imaging 1(2), 112–125 (2015).
[Crossref]

IEEE Trans. on Image Process. (1)

Y. Wen, B. Sheng, P. Li, W. Lin, and D. D. Feng, “Deep color guided coarse-to-fine convolutional network cascade for depth image super-resolution,” IEEE Trans. on Image Process. 28(2), 994–1006 (2019).
[Crossref]

IEEE Trans. Pattern Anal. Mach. Intell. (1)

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018).
[Crossref]

Instruments (1)

S. Burri, C. Bruschini, and E. Charbon, “Linospad: a compact linear spad camera system with 64 fpga-based tdc modules for versatile 50 ps resolution time-resolved imaging,” Instruments 1(1), 6 (2017).
[Crossref]

Mach. Vis. Appl. (2)

I. Eichhardt, D. Chetverikov, and Z. Jankó, “Image-guided tof depth upsampling: a survey,” Mach. Vis. Appl. 28(3-4), 267–282 (2017).
[Crossref]

J. P. S. do Monte Lima, F. P. M. Simões, H. Uchiyama, V. Teichrieb, and E. Marchand, “Depth-assisted rectification for real-time object detection and pose estimation,” Mach. Vis. Appl. 27(2), 193–219 (2016).
[Crossref]

Nat. Commun. (1)

D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016).
[Crossref]

Opt. Express (2)

Phil. Trans. R. Soc. A (1)

E. Charbon, “Single-photon imaging in complementary metal oxide semiconductor processes,” Phil. Trans. R. Soc. A 372(2012), 20130100 (2014).
[Crossref]

Sci. Rep. (1)

F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3d imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018).
[Crossref]

Science (1)

A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014).
[Crossref]

The Int. J. Robotics Res. (3)

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The Int. J. Robotics Res. 32(11), 1231–1237 (2013).
[Crossref]

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments,” The Int. J. Robotics Res. 31(5), 647–663 (2012).
[Crossref]

Other (32)

C. Cadena, A. R. Dick, and I. D. Reid, “Multi-modal auto-encoders as joint estimators for robotics scene understanding,” in Robotics: Science and Systems, vol. 5 (2016), p. 1.

H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds. (Curran Associates, Inc., 2014), pp. 2366–2374.

A. Gordon, H. Li, R. Jonschkowski, and A. Angelova, “Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras,” CoRR abs/1904.04998 (2019).

D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition, X. Jiang, J. Hornegger, and R. Koch, eds. (Springer International Publishing, Cham, 2014), pp. 31–42.

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017).

K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, (Springer, 2016), 630–645.

C. Hazirbas, L. Ma, C. Domokos, and D. Cremers, “Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture,” in Computer Vision – ACCV 2016, S.-H. Lai, V. Lepetit, K. Nishino, and Y. Sato, eds. (Springer International Publishing, Cham, 2017), pp. 213–228

J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” (2018).

Y. Zhang and T. Funkhouser, “Deep depth completion of a single rgb-d image,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018).

F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera,” (2018).

J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 International Conference on 3D Vision (3DV), (2017), pp. 11–20.

A. Eldesokey, M. Felsberg, and F. S. Khan, “Confidence propagation through cnns for guided sparse depth regression,” IEEE Trans. Pattern Anal. Mach. Intell.1 (2019).

M. Jaritz, R. D. Charette, E. Wirbel, X. Perrotton, and F. Nashashibi, “Sparse and dense data with cnns: Depth completion and semantic segmentation,” in 2018 International Conference on 3D Vision (3DV), (2018), pp. 52–60.

T.-W. Hui, C. C. Loy, and X. Tang, “Depth map super-resolution by deep multi-scale guidance,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 353–369.

Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, and C.-Z. Peng, “Single-photon computational 3d imaging at 45 km,” arXiv preprint arXiv:1904.10341 (2019).

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in Proc. CVPR, (2019).

M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proc. CVPR, (2019).

S. Xin, S. Nousias, K. N. Kutulakos, A. C. Sankaranarayanan, S. G. Narasimhan, and I. Gkioulekas, “A theory of fermat paths for non-line-of-sight shape reconstruction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2019), pp. 6800–6809.

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” (2017).

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016).

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in Computer Vision – ECCV 2012, A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012) pp. 746–760.

S. Burri, H. Homulle, C. Bruschini, and E. Charbon, “Linospad: a time-resolved 256x1 cmos spad line sensor system featuring 64 fpga-based tdc channels running at up to 8.5 giga-events per second,” in Optical Sensing and Detection IV, vol. 9899 (International Society for Optics and Photonics, 2016), p. 98990D.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), 234–241.

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2014).

Y. Li, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep joint image filtering,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds. (Springer International Publishing, Cham, 2016), pp. 154–169.

I. Alhashim and P. Wonka, “High quality monocular depth estimation via transfer learning,” (2018).

A. Gupta, A. Ingle, A. Velten, and M. Gupta, “Photon-flooded single-photon 3d cameras,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

A. Ingle, A. Velten, and M. Gupta, “High flux passive imaging with single-photon sensors,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2019).

C. Ti, R. Yang, J. Davis, and Z. Pan, “Simultaneous time-of-flight sensing and photometric stereo with a single tof sensor,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),, (2015).

J. Diebel and S. Thrun, “An application of markov random fields to range sensing,” in Advances in neural information processing systems, (2006), pp. 291–298.

D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of the IEEE International Conference on Computer Vision, (2013), pp. 993–1000.

Supplementary Material (1)

NameDescription
» Visualization 1       Visualization of 3D reconstruction algorithm

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1.
Fig. 1. (a) A SPAD array captures a datacube with time-resolved photon counts whereas a conventional intensity camera records the time-integrated photon flux of a scene. (b) Monocular depth estimators allow the depth of the scene to be directly recovered from the 2D image. While the ordinal (i.e., relative) depth information of such an estimate is often good, there is scale ambiguity resulting in a large error (inset). (c) RGB-SPAD fusion approaches use neural networks to fuse the SPAD data with the 2D image to optimize depth estimation. (d) We introduce SPADnet, a neural network architecture that achieves state-of-the-art results for RGB-SPAD sensor fusion.
Fig. 2.
Fig. 2. SPADnet uses a monocular depth estimator to convert the 2D image into a rough depth map and then conduct 2D-3D up-projection to fuse it with 3D features extracted from SPAD measurement.
Fig. 3.
Fig. 3. Qualitative results comparing various approaches with fixed SBR at 0.04 (2 signal photons vs. 50 background photons). SPADnet achieves the lowest RMSE error. White boxes in figures mark out regions with extremely weak signal return. SPADnet significantly improves the prediction in these regions.
Fig. 4.
Fig. 4. Evaluation of different algorithms on captured data. The “stuff” scene has higher SBR. All methods are comparatively good under this condition. In the “kitchen” and “hallway” scenes, the SBR is much lower. These two scenes also contain regions with low reflectivity or large depth. SPADnet significantly outperforms other methods in robustness. Video and 3D visualizations of the comparison are shown in Visualization 1 (Supplementary Material)
Fig. 5.
Fig. 5. Failure case for monocular depth estimation. In this case, SPADnet cannot effectively use information provided by the estimated depth map and performs about as good as previous neural network approaches. Both Lindell et al.’s method and SPADnet perform better than Rapp and Goyal’s method in this case.
Fig. 6.
Fig. 6. Depth prediction error under different noise levels. The denoising task is sensitive to SBR. The proposed model is able to work under extremely low SBR condition, without too much sacrifice in accuracy.
Fig. 7.
Fig. 7. Influence of different patch sizes on computational cost. Log-scale rebinning facilitates model inference significantly (Dashed line means the memory consumption is beyond computational capacity available to us)
Fig. 8.
Fig. 8. Additional qualitative comparisons on the simulated dataset with a signal-to-background ratio (SBR) of 0.04. SPADnet achieves state-of-the-art performance.

Tables (3)

Tables Icon

Table 1. Results of different methods and linear/log-scale discretizations (up-arrow stands for higher is better and down-arrow stands for lower is better). SPADnet with a log-scale discretization reduces RMSE error by 80 % . The best result is shown in bold and the second best is shown with underline. Note that we use different patch size division for linear and log-scale models, which are both largest patch size within our devices’ computational capacity. See supplementary for detailed discussions on influence of patch size on model performance.

Tables Icon

Table 2. Ablation experiments on different loss functions and different monocular depth estimators. As shown in the left panel, a model trained with the ordinal regression loss gives the lowest error. As shown in the right panel, SPADnet denoising results are robust against inaccurate monocular estimations.

Tables Icon

Table 3. Influence of different patch sizes on SPADnet performance. Although smaller patch degrades the network performance. SPADnet using 64 × 64 patch size data still performs better than linear-scale model with repetition fusion.

Equations (7)

Equations on this page are rendered with MathJax. Learn more.

s [ n ] = n Δ t ( n + 1 ) Δ t ( g f ) ( t 2 d / c ) d t ,
h [ n ] P ( N ( η γ s [ n ] + η a + d c ) )
n ^ i j = n n h ^ i j [ n ] , d ^ i j = n ^ i j 2 c Δ t ,
L O R ( h , h ^ ) = 1 M 1 × M 2 i j ( n = 1 l log ( 1 P i j [ n ] ) + n = l + 1 K log ( P i j [ n ] ) ) P i j [ n ] = cumsum ( h ^ i j [ n ] ) ,
L t o t a l = L O R ( h , h ^ ) + λ T V d ^ T V .
n 1 ( k ) = B × q k 1 q 1 , n 2 ( k ) = B × q k + 1 1 q 1
n ^ i j l o g = k n 1 ( k ) + n 2 ( k ) 2 h ^ i j l o g [ k ]

Metrics