Abstract

Three-dimensional (3D) light field display, as a potential future display method, has attracted considerable attention. However, there still exist certain issues to be addressed, especially the capture of dense views in real 3D scenes. Using sparse cameras associated with view synthesis algorithm has become a practical method. Supervised convolutional neural network (CNN) is used to synthesize virtual views. However, such a large amount of training target views is sometimes difficult to be obtained and the training position is relatively fixed. Novel views can also be synthesized by unsupervised network MPVN, but the method has strict requirements on capturing multiple uniform horizontal viewpoints, which is not suitable in practice. Here, a method of dense-view synthesis based on unsupervised learning is presented, which can synthesize arbitrary virtual views with multiple free-posed views captured in the real 3D scene based on unsupervised learning. Multiple posed views are reprojected to the target position and input into the neural network. The network outputs a color tower and a selection tower indicating the scene distribution along the depth direction. A single image is yielded by the weighted summation of two towers. The proposed network is end-to-end trained based on unsupervised learning by minimizing errors during reconstructions of posed views. A virtual view can be predicted in a high quality by reprojecting posed views to the desired position. Additionally, a sequence of dense virtual views can be generated for 3D light-field display by repeated predictions. Experimental results demonstrate the validity of our proposed network. PSNR of synthesized views are around 30dB and SSIM are over 0.90. Since multiple cameras are supported to be placed in free-posed positions, there are not strict physical requirements and the proposed method can be flexibly used for the real scene capture. We believe this approach will contribute to the wide applications of 3D light-field display in the future.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Full Article  |  PDF Article
OSA Recommended Articles
Multi-parallax views synthesis for three-dimensional light-field display using unsupervised CNN

Duo Chen, Xinzhu Sang, Wang Peng, Xunbo Yu, and Hua Chun Wang
Opt. Express 26(21) 27585-27598 (2018)

A crosstalk-suppressed dense multi-view light-field display based on real-time light-field pickup and reconstruction

Le Yang, Xinzhu Sang, Xunbo Yu, Boyang Liu, Binbin Yan, Kuiru Wang, and Chongxiu Yu
Opt. Express 26(26) 34412-34427 (2018)

Liquid-crystal-display-based touchable light field three-dimensional display using display-capture mapping calibration

Yifan Peng, Haifeng Li, Qing Zhong, and Xu Liu
Appl. Opt. 51(25) 6014-6019 (2012)

References

  • View by:
  • |
  • |
  • |

  1. X. Sang, X. Gao, X. Yu, S. Xing, Y. Li, and Y. Wu, “Interactive floating full-parallax digital three-dimensional light-field display based on wavefront recomposing,” Opt. Express 26(7), 8883–8889 (2018).
    [Crossref] [PubMed]
  2. X. Yu, X. Sang, X. Gao, Z. Chen, D. Chen, W. Duan, B. Yan, C. Yu, and D. Xu, “Large viewing angle three-dimensional display with smooth motion parallax and accurate depth cues,” Opt. Express 23(20), 25950–25958 (2015).
    [Crossref] [PubMed]
  3. R. Ng, M. Levoy, and M. Brédif, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report 2(11), 1–11 (2005).
  4. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
    [Crossref]
  5. H. Deng, Q.-H. Wang, and D. Li, “Method of generating orthoscopic elemental image array from sparse camera array,” Chin. Opt. Lett. 10(6), 31–33 (2012).
  6. K. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imaging Syst. Technol. 20(4), 378–390 (2010).
    [Crossref]
  7. S. Chan, H. Shum, and K. Ng, “Image-Based Rendering and Synthesis,” IEEE Signal Process. Mag. 24(6), 22–33 (2007).
    [Crossref]
  8. J. Xiao and M. Shah, “Tri-view morphing,” Comput. Vis. Image Underst. 96(3), 345–366 (2004).
    [Crossref]
  9. S. Chan, Z. Gan, and H. Shum, “An object-based approach to plenoptic video processing,” in Proceedings of IEEE International Symposium on Circuits and Systems, (IEEE, 2007), 985–988.
    [Crossref]
  10. D. Ji, J. Kwon, and M. Mcfarland, “Deep view morphing,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 7092–7100.
  11. T. Zhou, S. Tulsiani, and W. Sun, “View synthesis by appearance flow,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 286–301.
  12. N. Kalantari, T. Wang, and R. Ramamoorthi, “Learning-based view synthesis for light field cameras,” ACM T. Graphic 35(6), 193 (2016).
    [Crossref]
  13. J. Flynn, I. Neulander, and J. Philbin, “Deep stereo: learning to predict new views from the world’s imagery,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), 5515–5524.
    [Crossref]
  14. G. Wu, M. Zhao, and L. Wang, “Light field reconstruction using deep convolutional network on EPI,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6319–6327.
    [Crossref]
  15. D. Chen, X. Sang, W. Peng, X. Yu, and H. C. Wang, “Multi-parallax views synthesis for three-dimensional light-field display using unsupervised CNN,” Opt. Express 26(21), 27585–27598 (2018).
    [Crossref] [PubMed]
  16. R. Szeliski, Computer Vision: Algorithms and Applications (Springer, 2011).
  17. P. Huang, K. Matzen, and J. Kopf, “Deepmvs: Learning multi-view stereopsis,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2018), 2821–2830.
  18. Y. Yao, Z. Luo, and S. Li, “Mvsnet: Depth inference for unstructured multi-view stereo,” in Proceedings of European Conference on Computer Vision, (Springer, 2018), 767–783.
    [Crossref]
  19. R. Garg, V. BG, and G. Carneiro, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 740–756.
  20. C. Godard, O. Aodha, and G. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6602–6611.
    [Crossref]
  21. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006).
    [Crossref] [PubMed]
  22. V. Vaish, M. Levoy, and R. Szeliski, “Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust Measures,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2006), 2331–2338.
    [Crossref]
  23. Stanford Computer Graphics Lab, “The (New) Stanford Light Field Archive,” http://lightfield.stanford.edu/index.html .

2018 (2)

2016 (1)

N. Kalantari, T. Wang, and R. Ramamoorthi, “Learning-based view synthesis for light field cameras,” ACM T. Graphic 35(6), 193 (2016).
[Crossref]

2015 (1)

2012 (1)

H. Deng, Q.-H. Wang, and D. Li, “Method of generating orthoscopic elemental image array from sparse camera array,” Chin. Opt. Lett. 10(6), 31–33 (2012).

2010 (1)

K. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imaging Syst. Technol. 20(4), 378–390 (2010).
[Crossref]

2007 (1)

S. Chan, H. Shum, and K. Ng, “Image-Based Rendering and Synthesis,” IEEE Signal Process. Mag. 24(6), 22–33 (2007).
[Crossref]

2006 (1)

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006).
[Crossref] [PubMed]

2005 (2)

R. Ng, M. Levoy, and M. Brédif, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report 2(11), 1–11 (2005).

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

2004 (1)

J. Xiao and M. Shah, “Tri-view morphing,” Comput. Vis. Image Underst. 96(3), 345–366 (2004).
[Crossref]

Adams, A.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

Antunez, E.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

Aodha, O.

C. Godard, O. Aodha, and G. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6602–6611.
[Crossref]

Barth, A.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

Brédif, M.

R. Ng, M. Levoy, and M. Brédif, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report 2(11), 1–11 (2005).

Brostow, G.

C. Godard, O. Aodha, and G. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6602–6611.
[Crossref]

Chan, S.

S. Chan, H. Shum, and K. Ng, “Image-Based Rendering and Synthesis,” IEEE Signal Process. Mag. 24(6), 22–33 (2007).
[Crossref]

S. Chan, Z. Gan, and H. Shum, “An object-based approach to plenoptic video processing,” in Proceedings of IEEE International Symposium on Circuits and Systems, (IEEE, 2007), 985–988.
[Crossref]

Chen, D.

Chen, Z.

Deng, H.

H. Deng, Q.-H. Wang, and D. Li, “Method of generating orthoscopic elemental image array from sparse camera array,” Chin. Opt. Lett. 10(6), 31–33 (2012).

Duan, W.

Flynn, J.

J. Flynn, I. Neulander, and J. Philbin, “Deep stereo: learning to predict new views from the world’s imagery,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), 5515–5524.
[Crossref]

Gan, Z.

S. Chan, Z. Gan, and H. Shum, “An object-based approach to plenoptic video processing,” in Proceedings of IEEE International Symposium on Circuits and Systems, (IEEE, 2007), 985–988.
[Crossref]

Gao, X.

Garg, R.

R. Garg, V. BG, and G. Carneiro, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 740–756.

Godard, C.

C. Godard, O. Aodha, and G. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6602–6611.
[Crossref]

Hinton, G. E.

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006).
[Crossref] [PubMed]

Ho, Y.-S.

K. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imaging Syst. Technol. 20(4), 378–390 (2010).
[Crossref]

Horowitz, M.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

Huang, P.

P. Huang, K. Matzen, and J. Kopf, “Deepmvs: Learning multi-view stereopsis,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2018), 2821–2830.

Ji, D.

D. Ji, J. Kwon, and M. Mcfarland, “Deep view morphing,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 7092–7100.

Joshi, N.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

Kalantari, N.

N. Kalantari, T. Wang, and R. Ramamoorthi, “Learning-based view synthesis for light field cameras,” ACM T. Graphic 35(6), 193 (2016).
[Crossref]

Kopf, J.

P. Huang, K. Matzen, and J. Kopf, “Deepmvs: Learning multi-view stereopsis,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2018), 2821–2830.

Kwon, J.

D. Ji, J. Kwon, and M. Mcfarland, “Deep view morphing,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 7092–7100.

Levoy, M.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

R. Ng, M. Levoy, and M. Brédif, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report 2(11), 1–11 (2005).

V. Vaish, M. Levoy, and R. Szeliski, “Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust Measures,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2006), 2331–2338.
[Crossref]

Li, D.

H. Deng, Q.-H. Wang, and D. Li, “Method of generating orthoscopic elemental image array from sparse camera array,” Chin. Opt. Lett. 10(6), 31–33 (2012).

Li, S.

Y. Yao, Z. Luo, and S. Li, “Mvsnet: Depth inference for unstructured multi-view stereo,” in Proceedings of European Conference on Computer Vision, (Springer, 2018), 767–783.
[Crossref]

Li, Y.

Luo, Z.

Y. Yao, Z. Luo, and S. Li, “Mvsnet: Depth inference for unstructured multi-view stereo,” in Proceedings of European Conference on Computer Vision, (Springer, 2018), 767–783.
[Crossref]

Matzen, K.

P. Huang, K. Matzen, and J. Kopf, “Deepmvs: Learning multi-view stereopsis,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2018), 2821–2830.

Mcfarland, M.

D. Ji, J. Kwon, and M. Mcfarland, “Deep view morphing,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 7092–7100.

Neulander, I.

J. Flynn, I. Neulander, and J. Philbin, “Deep stereo: learning to predict new views from the world’s imagery,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), 5515–5524.
[Crossref]

Ng, K.

S. Chan, H. Shum, and K. Ng, “Image-Based Rendering and Synthesis,” IEEE Signal Process. Mag. 24(6), 22–33 (2007).
[Crossref]

Ng, R.

R. Ng, M. Levoy, and M. Brédif, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report 2(11), 1–11 (2005).

Oh, K.

K. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imaging Syst. Technol. 20(4), 378–390 (2010).
[Crossref]

Peng, W.

Philbin, J.

J. Flynn, I. Neulander, and J. Philbin, “Deep stereo: learning to predict new views from the world’s imagery,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), 5515–5524.
[Crossref]

Ramamoorthi, R.

N. Kalantari, T. Wang, and R. Ramamoorthi, “Learning-based view synthesis for light field cameras,” ACM T. Graphic 35(6), 193 (2016).
[Crossref]

Salakhutdinov, R. R.

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006).
[Crossref] [PubMed]

Sang, X.

Shah, M.

J. Xiao and M. Shah, “Tri-view morphing,” Comput. Vis. Image Underst. 96(3), 345–366 (2004).
[Crossref]

Shum, H.

S. Chan, H. Shum, and K. Ng, “Image-Based Rendering and Synthesis,” IEEE Signal Process. Mag. 24(6), 22–33 (2007).
[Crossref]

S. Chan, Z. Gan, and H. Shum, “An object-based approach to plenoptic video processing,” in Proceedings of IEEE International Symposium on Circuits and Systems, (IEEE, 2007), 985–988.
[Crossref]

Sun, W.

T. Zhou, S. Tulsiani, and W. Sun, “View synthesis by appearance flow,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 286–301.

Szeliski, R.

V. Vaish, M. Levoy, and R. Szeliski, “Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust Measures,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2006), 2331–2338.
[Crossref]

Talvala, E.-V.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

Tulsiani, S.

T. Zhou, S. Tulsiani, and W. Sun, “View synthesis by appearance flow,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 286–301.

Vaish, V.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

V. Vaish, M. Levoy, and R. Szeliski, “Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust Measures,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2006), 2331–2338.
[Crossref]

Vetro, A.

K. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imaging Syst. Technol. 20(4), 378–390 (2010).
[Crossref]

Wang, H. C.

Wang, L.

G. Wu, M. Zhao, and L. Wang, “Light field reconstruction using deep convolutional network on EPI,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6319–6327.
[Crossref]

Wang, Q.-H.

H. Deng, Q.-H. Wang, and D. Li, “Method of generating orthoscopic elemental image array from sparse camera array,” Chin. Opt. Lett. 10(6), 31–33 (2012).

Wang, T.

N. Kalantari, T. Wang, and R. Ramamoorthi, “Learning-based view synthesis for light field cameras,” ACM T. Graphic 35(6), 193 (2016).
[Crossref]

Wilburn, B.

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

Wu, G.

G. Wu, M. Zhao, and L. Wang, “Light field reconstruction using deep convolutional network on EPI,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6319–6327.
[Crossref]

Wu, Y.

Xiao, J.

J. Xiao and M. Shah, “Tri-view morphing,” Comput. Vis. Image Underst. 96(3), 345–366 (2004).
[Crossref]

Xing, S.

Xu, D.

Yan, B.

Yao, Y.

Y. Yao, Z. Luo, and S. Li, “Mvsnet: Depth inference for unstructured multi-view stereo,” in Proceedings of European Conference on Computer Vision, (Springer, 2018), 767–783.
[Crossref]

Yea, S.

K. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imaging Syst. Technol. 20(4), 378–390 (2010).
[Crossref]

Yu, C.

Yu, X.

Zhao, M.

G. Wu, M. Zhao, and L. Wang, “Light field reconstruction using deep convolutional network on EPI,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6319–6327.
[Crossref]

Zhou, T.

T. Zhou, S. Tulsiani, and W. Sun, “View synthesis by appearance flow,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 286–301.

ACM T. Graphic (2)

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM T. Graphic 24(3), 765–776 (2005).
[Crossref]

N. Kalantari, T. Wang, and R. Ramamoorthi, “Learning-based view synthesis for light field cameras,” ACM T. Graphic 35(6), 193 (2016).
[Crossref]

Chin. Opt. Lett. (1)

H. Deng, Q.-H. Wang, and D. Li, “Method of generating orthoscopic elemental image array from sparse camera array,” Chin. Opt. Lett. 10(6), 31–33 (2012).

Comput. Vis. Image Underst. (1)

J. Xiao and M. Shah, “Tri-view morphing,” Comput. Vis. Image Underst. 96(3), 345–366 (2004).
[Crossref]

IEEE Signal Process. Mag. (1)

S. Chan, H. Shum, and K. Ng, “Image-Based Rendering and Synthesis,” IEEE Signal Process. Mag. 24(6), 22–33 (2007).
[Crossref]

Int. J. Imaging Syst. Technol. (1)

K. Oh, S. Yea, A. Vetro, and Y.-S. Ho, “Virtual view synthesis method and self-evaluation metrics for free viewpoint television and 3D video,” Int. J. Imaging Syst. Technol. 20(4), 378–390 (2010).
[Crossref]

Opt. Express (3)

Science (1)

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006).
[Crossref] [PubMed]

Stanford Tech. Report (1)

R. Ng, M. Levoy, and M. Brédif, “Light field photography with a hand-held plenoptic camera,” Stanford Tech. Report 2(11), 1–11 (2005).

Other (12)

S. Chan, Z. Gan, and H. Shum, “An object-based approach to plenoptic video processing,” in Proceedings of IEEE International Symposium on Circuits and Systems, (IEEE, 2007), 985–988.
[Crossref]

D. Ji, J. Kwon, and M. Mcfarland, “Deep view morphing,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 7092–7100.

T. Zhou, S. Tulsiani, and W. Sun, “View synthesis by appearance flow,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 286–301.

V. Vaish, M. Levoy, and R. Szeliski, “Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust Measures,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2006), 2331–2338.
[Crossref]

Stanford Computer Graphics Lab, “The (New) Stanford Light Field Archive,” http://lightfield.stanford.edu/index.html .

J. Flynn, I. Neulander, and J. Philbin, “Deep stereo: learning to predict new views from the world’s imagery,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), 5515–5524.
[Crossref]

G. Wu, M. Zhao, and L. Wang, “Light field reconstruction using deep convolutional network on EPI,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6319–6327.
[Crossref]

R. Szeliski, Computer Vision: Algorithms and Applications (Springer, 2011).

P. Huang, K. Matzen, and J. Kopf, “Deepmvs: Learning multi-view stereopsis,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2018), 2821–2830.

Y. Yao, Z. Luo, and S. Li, “Mvsnet: Depth inference for unstructured multi-view stereo,” in Proceedings of European Conference on Computer Vision, (Springer, 2018), 767–783.
[Crossref]

R. Garg, V. BG, and G. Carneiro, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,” in Proceedings of European Conference on Computer Vision, (Springer, 2016), 740–756.

C. Godard, O. Aodha, and G. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), 6602–6611.
[Crossref]

Supplementary Material (1)

NameDescription
» Visualization 1       Results of synthesized virtual views, and experiments on 3D light-field display.

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (14)

Fig. 1
Fig. 1 The schematic comparison of MPVN and our proposed network. MPVN can only utilize an image array with uniform horizontal parallax to synthesize a novel view by sweeping planes. The proposed method is able to handle the case of free-posed views by reprojection after taking camera postures into consideration.
Fig. 2
Fig. 2 The overall approach of our proposed method. (a) Multiple views are captured by a sparse camera array. (b) Camera postures are estimated by multi-view calibration. (c) Posed views are reprojected to a target position at different depth. (d) These warped views are input in the network, which is trained by unsupervised learning. (e) By changing the reprojection pose, a virtual view can be predicted. (f) A sequence of dense virtual view can be obtained by repeated predictions.
Fig. 3
Fig. 3 The comparison of posture rectification for refocus results. The power adapter (a) and the ceiling line (b) becomes much clearer after posture rectification.
Fig. 4
Fig. 4 The homography transformation of view reprojection. 2D points p i ( u i , v i ,1 ) are back-projected onto different depth planes as 3D points P s ( x s , y s , z m ). These 3D points are projected on the screen of the target camera as 2D points p t ( u t , v t ,1 ). The homography matrix H m can be computed from the transformation between p i ( u i , v i ,1 ) and p t ( u t , v t ,1 ).
Fig. 5
Fig. 5 The schematic diagram of proposed unsupervised learning algorithm. In the training part, v 1 ,, v N are reprojected to one of them, which is regarded as the target view v t . CNN is trained with unsupervised learning by minimizing the error between the output view v c and the target view v t . In the predicting part, v 1 ' ,, v N ' are reprojected to the desired position, and CNN is able to output the virtual view v x ' in an acceptable quality.
Fig. 6
Fig. 6 The schematic diagram of the view synthesis procedure. Each plane of the view towers is concatenated and input into the network. The 2D color network outputs a multi-scale color result. The 2D + 3D selection network outputs a multi-scale selection tower. Each plane of the two towers are shown in the figure.
Fig. 7
Fig. 7 The final multi-scale view is weighted summation of the multi-scale color tower and the multi-scale selection tower.
Fig. 8
Fig. 8 The architecture of proposed 2D network. The structure of 2D color network and 2D selection network are the same, except the outcome result. The 2D color network outputs a 3-channel RGB image. The 2D selection network outputs a 16-channel selection feature, which is stacked as selection feature tower.
Fig. 9
Fig. 9 The architecture of 3D selection network. A multi-scale selection feature tower is input into the 3D selection network, which is used to refine the selection towers by improving the correlation of features between different planes. The 3D selection network outputs a multi-scale selection tower.
Fig. 10
Fig. 10 Simulation assessment of horizontal synthesized dense views under different parallax situations. 26 views are generated by the proposed method with sparse input views. (a) Input views are posed at different horizontal poses. (b) Input views are posed at different vertical poses. Every row of view array with different parallax are input the network. Circles represent different viewpoints. The empty circles ○ represent views at fixed poses. The solid black circles ● represent views at changed poses. The arrows are the direction of poses changing. The crosses × mean there are no input views at current poses.
Fig. 11
Fig. 11 Three types of image data sets. (a) CD cases is the image array of Stanford light field. (b) Lotus Pool is the image array of virtual scene. (c) Indoor Scene is the image array of real scene. The red lines indicate the vertical parallax in image arrays.
Fig. 12
Fig. 12 The results of synthesized posed views. The leftmost column is the result of view #3 synthesized by 2D + 3D + S network. PSNR and SSIM are calculated on the red rectangle areas, except the scene of CD Cases. The middle column is the PSNR of different synthesized views under different network structures. The rightmost column is the SSIM of different synthesized views under different network structures.
Fig. 13
Fig. 13 The virtual view results synthesized by different networks. The leftmost column is the original view prepared ahead. From the second to the rightmost column, they are the virtual views synthesized by 2D network, 2D + S network, 2D + 3D network, 2D + 3D + S network, and Optical Flow, respectively. Image details are shown in red rectangles. PSNR and SSIM are calculated on the blue rectangle areas, except the scene of CD Cases. The row below the virtual views is the residual error map of blue rectangle area indicating the absolute errors between the original view and the synthesized view.
Fig. 14
Fig. 14 The input views and the synthesized output views of the network. Sequence of dense views can be synthesized. Dense synthesized views of (a)CD Cases, (b) Lotus Pool. (c) Indoor Scene. The top-right is the EPI of 50 synthesized views. The bottom is the displaying results of dense-view sequences presented on a 3D light-field display. (see Visualization 1)

Tables (1)

Tables Icon

Table 1 Configurations of different method

Equations (11)

Equations on this page are rendered with MathJax. Learn more.

P i = R i 1 ( K i 1 p i T T i ), p i { ( 0,0,1 ),( w,0,1 ),( 0,h,1 ),( w,h,1 ) },
C i = R i 1 T i ,
{ x s = z m z i z c z i ×( x c x i )+ x i y s = z m z i z c z i ×( y c y i )+ y i .
p t T = K t ( R t P i + T t ).
( p t T ) 3×4 = H m 3×3 ( p i T ) 3×4 ,
v c r = i=0 M×r C T i r ×S T i r ,
i=0 M×r S T i r =1.
l c r = v t r v c r 1 .
s c r = v c r 1 .
s d r = v d r ×exp( v t r ) 1 .
L= r a r l c r + b r s c r + c r s d r and r a r + b r + c r =1,

Metrics