Abstract

Precise analysis of the vocal fold vibratory pattern in a stroboscopic video plays a key role in the evaluation of voice disorders. Automatic glottis segmentation is one of the preliminary steps in such analysis. In this work, it is divided into two subproblems namely, glottis localization and glottis segmentation. A two step convolutional neural network (CNN) approach is proposed for the automatic glottis segmentation. Data augmentation is carried out using two techniques :  1) Blind rotation (WB), 2) Rotation with respect to glottis orientation (WO). The dataset used in this study contains stroboscopic videos of 18 subjects with Sulcus vocalis, in which the glottis region is annotated by three speech language pathologists (SLPs). The proposed two step CNN approach achieves an average localization accuracy of 90.08% and a mean dice score of 0.65.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. I. R. Titze and F. Alipour, The myoelastic aerodynamic theory of phonation (National Center for Voice and Speech, 2006).
  2. O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Trans. Biomed. Eng. 62(3), 795–806 (2015).
    [Crossref]
  3. J. Demeyer, T. Dubuisson, B. Gosselin, and M. Remacle, “Glottis segmentation with a high-speed glottography: a fully automatic method,” in 3rd Adv. Voice Funct. Assess. Int. Workshop, (2009).
  4. T. Nawka and U. Konerding, “The interrater reliability of stroboscopy evaluations,” J. Voice 26(6), 812.E1–812.E10 (2012).
    [Crossref]
  5. L. Rudmik, Evidence-based Clinical Practice in Otolaryngology (Elsevier Health Sciences, 2018).
  6. A. Rao MV, R. Krishnamurthy, P. Gopikishore, V. Priyadharshini, and P. K. Ghosh, “Automatic glottis localization and segmentation in stroboscopic videos using deep neural network,” in Proc. Interspeech 2018, (2018), pp. 3007–3011.
  7. J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.
  8. J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
    [Crossref]
  9. J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
    [Crossref]
  10. M.-H. Laves, J. Bicker, L. A. Kahrs, and T. Ortmaier, “A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation,” Int. J. CARS 14(3), 483–492 (2019).
    [Crossref]
  11. M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
    [Crossref]
  12. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 3431–3440.
  13. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).
    [Crossref]
  14. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.
  15. R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
    [Crossref]
  16. F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
    [Crossref]
  17. Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation using fully convolutional network with transfer learning,” Comput. Med. Imag. Grap. 68, 1–15 (2018).
    [Crossref]
  18. D. Owen, “The power of student’s t-test,” J. Am. Stat. Assoc. 60(309), 320–333 (1965).
    [Crossref]
  19. L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology 26(3), 297–302 (1945).
    [Crossref]
  20. D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.
  21. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).
  22. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
    [Crossref]
  23. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).
  24. R. M. Haralick and L. G. Shapiro, Computer and Robot Vision, vol. 1 (Addison-Wesley Reading, 1992).
  25. L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010, (Springer, 2010), pp. 177–186.
  26. C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” >J. Big Data 6(1), 60 (2019).
    [Crossref]
  27. S. J. Ahn, W. Rauh, and H.-J. Warnecke, “Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola,” Pattern Recognit. 34(12), 2283–2303 (2001).
    [Crossref]
  28. F. Chollet et al., “Keras,” (2015).
  29. T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).
  30. W. R. Crum, O. Camara, and D. L. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006).
    [Crossref]
  31. D. F. Williamson, R. A. Parker, and J. S. Kendrick, “The box plot: a simple visual method to interpret data,” Ann. Intern. Med. 110(11), 916–921 (1989).
    [Crossref]
  32. A. Sadovski, “Algorithm as 74: L1-norm fit of a straight line,” J. Royal Stat. Soc. Ser. C (Applied Stat.) 23(2), 244–248 (1974).
    [Crossref]

2020 (1)

M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
[Crossref]

2019 (5)

M.-H. Laves, J. Bicker, L. A. Kahrs, and T. Ortmaier, “A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation,” Int. J. CARS 14(3), 483–492 (2019).
[Crossref]

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” >J. Big Data 6(1), 60 (2019).
[Crossref]

J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
[Crossref]

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

2018 (1)

Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation using fully convolutional network with transfer learning,” Comput. Med. Imag. Grap. 68, 1–15 (2018).
[Crossref]

2017 (1)

V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).
[Crossref]

2015 (2)

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Trans. Biomed. Eng. 62(3), 795–806 (2015).
[Crossref]

2012 (1)

T. Nawka and U. Konerding, “The interrater reliability of stroboscopy evaluations,” J. Voice 26(6), 812.E1–812.E10 (2012).
[Crossref]

2007 (1)

J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
[Crossref]

2006 (1)

W. R. Crum, O. Camara, and D. L. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006).
[Crossref]

2001 (1)

S. J. Ahn, W. Rauh, and H.-J. Warnecke, “Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola,” Pattern Recognit. 34(12), 2283–2303 (2001).
[Crossref]

1989 (1)

D. F. Williamson, R. A. Parker, and J. S. Kendrick, “The box plot: a simple visual method to interpret data,” Ann. Intern. Med. 110(11), 916–921 (1989).
[Crossref]

1974 (1)

A. Sadovski, “Algorithm as 74: L1-norm fit of a straight line,” J. Royal Stat. Soc. Ser. C (Applied Stat.) 23(2), 244–248 (1974).
[Crossref]

1965 (1)

D. Owen, “The power of student’s t-test,” J. Am. Stat. Assoc. 60(309), 320–333 (1965).
[Crossref]

1945 (1)

L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology 26(3), 297–302 (1945).
[Crossref]

Abdullah, D.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Ahn, S. J.

S. J. Ahn, W. Rauh, and H.-J. Warnecke, “Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola,” Pattern Recognit. 34(12), 2283–2303 (2001).
[Crossref]

Alain, G.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Alipour, F.

I. R. Titze and F. Alipour, The myoelastic aerodynamic theory of phonation (National Center for Voice and Speech, 2006).

Almahairi, A.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Al-Rfou, R.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Angermueller, C.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Araújo, F. H.

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Ba, J.

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

Backer, V.

J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
[Crossref]

Badrinarayanan, V.

V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).
[Crossref]

Bahdanau, D.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Ballas, N.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Bastien, F.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Bayer, J.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Belikov, A.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Bernstein, M.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Bianchi, A. G. C.

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Bicker, J.

M.-H. Laves, J. Bicker, L. A. Kahrs, and T. Ortmaier, “A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation,” Int. J. CARS 14(3), 483–492 (2019).
[Crossref]

Blaschko, M. B.

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

Bottou, L.

L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010, (Springer, 2010), pp. 177–186.

Brox, T.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

Cabeza, R.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

Camara, O.

W. R. Crum, O. Camara, and D. L. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006).
[Crossref]

Carneiro, C. M.

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Cerrolaza, J. J.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

Chollet, F.

F. Chollet et al., “Keras,” (2015).

Cipolla, R.

V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).
[Crossref]

Crum, W. R.

W. R. Crum, O. Camara, and D. L. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006).
[Crossref]

Darrell, T.

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 3431–3440.

De Boever, P.

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

Demeyer, J.

J. Demeyer, T. Dubuisson, B. Gosselin, and M. Remacle, “Glottis segmentation with a high-speed glottography: a fully automatic method,” in 3rd Adv. Voice Funct. Assess. Int. Workshop, (2009).

Deng, J.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Dice, L. R.

L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology 26(3), 297–302 (1945).
[Crossref]

Döllinger, M.

J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
[Crossref]

Dubuisson, T.

J. Demeyer, T. Dubuisson, B. Gosselin, and M. Remacle, “Glottis segmentation with a high-speed glottography: a fully automatic method,” in 3rd Adv. Voice Funct. Assess. Int. Workshop, (2009).

Elen, B.

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

Elson, D. S.

J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
[Crossref]

Erliana, C. I.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Eysholdt, U.

J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
[Crossref]

Fajriana, F.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Fehling, M. K.

M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
[Crossref]

Fischer, P.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

Ghosh, P. K.

A. Rao MV, R. Krishnamurthy, P. Gopikishore, V. Priyadharshini, and P. K. Ghosh, “Automatic glottis localization and segmentation in stroboscopic videos using deep neural network,” in Proc. Interspeech 2018, (2018), pp. 3007–3011.

Ginting, Z.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Gloger, O.

O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Trans. Biomed. Eng. 62(3), 795–806 (2015).
[Crossref]

Godino-Llorente, J. I.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

Gopikishore, P.

A. Rao MV, R. Krishnamurthy, P. Gopikishore, V. Priyadharshini, and P. K. Ghosh, “Automatic glottis localization and segmentation in stroboscopic videos using deep neural network,” in Proc. Interspeech 2018, (2018), pp. 3007–3011.

Gosselin, B.

J. Demeyer, T. Dubuisson, B. Gosselin, and M. Remacle, “Glottis segmentation with a high-speed glottography: a fully automatic method,” in 3rd Adv. Voice Funct. Assess. Int. Workshop, (2009).

Grosch, F.

M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
[Crossref]

Gutiérrez-Arriola, J. M.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

Haralick, R. M.

R. M. Haralick and L. G. Shapiro, Computer and Robot Vision, vol. 1 (Addison-Wesley Reading, 1992).

Harliana, P.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Harmayani, H.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Hemelings, R.

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

Hill, D. L.

W. R. Crum, O. Camara, and D. L. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006).
[Crossref]

Huang, Z.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Hull, J. H.

J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
[Crossref]

Jiang, Z.

Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation using fully convolutional network with transfer learning,” Comput. Med. Imag. Grap. 68, 1–15 (2018).
[Crossref]

Kahrs, L. A.

M.-H. Laves, J. Bicker, L. A. Kahrs, and T. Ortmaier, “A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation,” Int. J. CARS 14(3), 483–492 (2019).
[Crossref]

Karpathy, A.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Kendall, A.

V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).
[Crossref]

Kendrick, J. S.

D. F. Williamson, R. A. Parker, and J. S. Kendrick, “The box plot: a simple visual method to interpret data,” Ann. Intern. Med. 110(11), 916–921 (1989).
[Crossref]

Khoshgoftaar, T. M.

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” >J. Big Data 6(1), 60 (2019).
[Crossref]

Khosla, A.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Kingma, D. P.

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

Ko, S.-B.

Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation using fully convolutional network with transfer learning,” Comput. Med. Imag. Grap. 68, 1–15 (2018).
[Crossref]

Konerding, U.

T. Nawka and U. Konerding, “The interrater reliability of stroboscopy evaluations,” J. Voice 26(6), 812.E1–812.E10 (2012).
[Crossref]

Krause, J.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Krishnamurthy, R.

A. Rao MV, R. Krishnamurthy, P. Gopikishore, V. Priyadharshini, and P. K. Ghosh, “Automatic glottis localization and segmentation in stroboscopic videos using deep neural network,” in Proc. Interspeech 2018, (2018), pp. 3007–3011.

Laves, M.-H.

M.-H. Laves, J. Bicker, L. A. Kahrs, and T. Ortmaier, “A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation,” Int. J. CARS 14(3), 483–492 (2019).
[Crossref]

Lehnert, B.

O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Trans. Biomed. Eng. 62(3), 795–806 (2015).
[Crossref]

Lin, J.

J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
[Crossref]

Lohscheller, J.

M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
[Crossref]

J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
[Crossref]

Long, J.

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 3431–3440.

Ma, S.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Maryana, M.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Medeiros, F. N.

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Nawka, T.

T. Nawka and U. Konerding, “The interrater reliability of stroboscopy evaluations,” J. Voice 26(6), 812.E1–812.E10 (2012).
[Crossref]

Ortmaier, T.

M.-H. Laves, J. Bicker, L. A. Kahrs, and T. Ortmaier, “A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation,” Int. J. CARS 14(3), 483–492 (2019).
[Crossref]

Osma-Ruiz, V.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

Owen, D.

D. Owen, “The power of student’s t-test,” J. Am. Stat. Assoc. 60(309), 320–333 (1965).
[Crossref]

Parker, R. A.

D. F. Williamson, R. A. Parker, and J. S. Kendrick, “The box plot: a simple visual method to interpret data,” Ann. Intern. Med. 110(11), 916–921 (1989).
[Crossref]

Priyadharshini, V.

A. Rao MV, R. Krishnamurthy, P. Gopikishore, V. Priyadharshini, and P. K. Ghosh, “Automatic glottis localization and segmentation in stroboscopic videos using deep neural network,” in Proc. Interspeech 2018, (2018), pp. 3007–3011.

Rahim, R.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Rao MV, A.

A. Rao MV, R. Krishnamurthy, P. Gopikishore, V. Priyadharshini, and P. K. Ghosh, “Automatic glottis localization and segmentation in stroboscopic videos using deep neural network,” in Proc. Interspeech 2018, (2018), pp. 3007–3011.

Rauh, W.

S. J. Ahn, W. Rauh, and H.-J. Warnecke, “Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola,” Pattern Recognit. 34(12), 2283–2303 (2001).
[Crossref]

Remacle, M.

J. Demeyer, T. Dubuisson, B. Gosselin, and M. Remacle, “Glottis segmentation with a high-speed glottography: a fully automatic method,” in 3rd Adv. Voice Funct. Assess. Int. Workshop, (2009).

Rezende, M. T.

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Ronneberger, O.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

Rosanowski, F.

J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
[Crossref]

Rosnita, L.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Rudmik, L.

L. Rudmik, Evidence-based Clinical Practice in Otolaryngology (Elsevier Health Sciences, 2018).

Russakovsky, O.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Sadovski, A.

A. Sadovski, “Algorithm as 74: L1-norm fit of a straight line,” J. Royal Stat. Soc. Ser. C (Applied Stat.) 23(2), 244–248 (1974).
[Crossref]

Sáenz-Lechón, N.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

Satheesh, S.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Schick, B.

M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
[Crossref]

Schrade, A.

O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Trans. Biomed. Eng. 62(3), 795–806 (2015).
[Crossref]

Schuster, M. E.

M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
[Crossref]

Shapiro, L. G.

R. M. Haralick and L. G. Shapiro, Computer and Robot Vision, vol. 1 (Addison-Wesley Reading, 1992).

Shelhamer, E.

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 3431–3440.

Shorten, C.

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” >J. Big Data 6(1), 60 (2019).
[Crossref]

Siahaan, A. P. U.

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

Silva, R. R.

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Simonyan, K.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

Stalmans, I.

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

Su, H.

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

Team, T. T. D.

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

Titze, I. R.

I. R. Titze and F. Alipour, The myoelastic aerodynamic theory of phonation (National Center for Voice and Speech, 2006).

Toy, H.

J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
[Crossref]

Ushizima, D. M.

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Van Keer, K.

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

Villanueva, A.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

Völzke, H.

O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Trans. Biomed. Eng. 62(3), 795–806 (2015).
[Crossref]

Walsted, E. S.

J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
[Crossref]

Wang, Y.

Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation using fully convolutional network with transfer learning,” Comput. Med. Imag. Grap. 68, 1–15 (2018).
[Crossref]

Warnecke, H.-J.

S. J. Ahn, W. Rauh, and H.-J. Warnecke, “Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola,” Pattern Recognit. 34(12), 2283–2303 (2001).
[Crossref]

Williamson, D. F.

D. F. Williamson, R. A. Parker, and J. S. Kendrick, “The box plot: a simple visual method to interpret data,” Ann. Intern. Med. 110(11), 916–921 (1989).
[Crossref]

Zhang, H.

Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation using fully convolutional network with transfer learning,” Comput. Med. Imag. Grap. 68, 1–15 (2018).
[Crossref]

Zisserman, A.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

>J. Big Data (1)

C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” >J. Big Data 6(1), 60 (2019).
[Crossref]

Ann. Intern. Med. (1)

D. F. Williamson, R. A. Parker, and J. S. Kendrick, “The box plot: a simple visual method to interpret data,” Ann. Intern. Med. 110(11), 916–921 (1989).
[Crossref]

Comput. Med. Imag. Grap. (3)

R. Hemelings, B. Elen, I. Stalmans, K. Van Keer, P. De Boever, and M. B. Blaschko, “Artery–vein segmentation in fundus images using a fully convolutional network,” Comput. Med. Imag. Grap. 76, 101636 (2019).
[Crossref]

F. H. Araújo, R. R. Silva, D. M. Ushizima, M. T. Rezende, C. M. Carneiro, A. G. C. Bianchi, and F. N. Medeiros, “Deep learning for cell image segmentation and ranking,” Comput. Med. Imag. Grap. 72, 13–21 (2019).
[Crossref]

Z. Jiang, H. Zhang, Y. Wang, and S.-B. Ko, “Retinal blood vessel segmentation using fully convolutional network with transfer learning,” Comput. Med. Imag. Grap. 68, 1–15 (2018).
[Crossref]

Ecology (1)

L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology 26(3), 297–302 (1945).
[Crossref]

IEEE Trans. Biomed. Eng. (2)

O. Gloger, B. Lehnert, A. Schrade, and H. Völzke, “Fully automated glottis segmentation in endoscopic videos using local color and shape features of glottal regions,” IEEE Trans. Biomed. Eng. 62(3), 795–806 (2015).
[Crossref]

J. Lin, E. S. Walsted, V. Backer, J. H. Hull, and D. S. Elson, “Quantification and analysis of laryngeal closure from endoscopic videos,” IEEE Trans. Biomed. Eng. 66(4), 1127–1136 (2019).
[Crossref]

IEEE Trans. Med. Imaging (1)

W. R. Crum, O. Camara, and D. L. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Trans. Med. Imaging 25(11), 1451–1461 (2006).
[Crossref]

IEEE Trans. Pattern Anal. Mach. Intell. (1)

V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017).
[Crossref]

Int. J. CARS (1)

M.-H. Laves, J. Bicker, L. A. Kahrs, and T. Ortmaier, “A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation,” Int. J. CARS 14(3), 483–492 (2019).
[Crossref]

Int. J. Comput. Vis. (1)

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015).
[Crossref]

J. Am. Stat. Assoc. (1)

D. Owen, “The power of student’s t-test,” J. Am. Stat. Assoc. 60(309), 320–333 (1965).
[Crossref]

J. Royal Stat. Soc. Ser. C (Applied Stat.) (1)

A. Sadovski, “Algorithm as 74: L1-norm fit of a straight line,” J. Royal Stat. Soc. Ser. C (Applied Stat.) 23(2), 244–248 (1974).
[Crossref]

J. Voice (1)

T. Nawka and U. Konerding, “The interrater reliability of stroboscopy evaluations,” J. Voice 26(6), 812.E1–812.E10 (2012).
[Crossref]

Med. Image Anal. (1)

J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, and M. Döllinger, “Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos,” Med. Image Anal. 11(4), 400–413 (2007).
[Crossref]

Pattern Recognit. (1)

S. J. Ahn, W. Rauh, and H.-J. Warnecke, “Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola,” Pattern Recognit. 34(12), 2283–2303 (2001).
[Crossref]

PLoS One (1)

M. K. Fehling, F. Grosch, M. E. Schuster, B. Schick, and J. Lohscheller, “Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep convolutional lstm network,” PLoS One 15(2), e0227791 (2020).
[Crossref]

Other (14)

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 3431–3440.

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

J. Demeyer, T. Dubuisson, B. Gosselin, and M. Remacle, “Glottis segmentation with a high-speed glottography: a fully automatic method,” in 3rd Adv. Voice Funct. Assess. Int. Workshop, (2009).

L. Rudmik, Evidence-based Clinical Practice in Otolaryngology (Elsevier Health Sciences, 2018).

A. Rao MV, R. Krishnamurthy, P. Gopikishore, V. Priyadharshini, and P. K. Ghosh, “Automatic glottis localization and segmentation in stroboscopic videos using deep neural network,” in Proc. Interspeech 2018, (2018), pp. 3007–3011.

J. J. Cerrolaza, V. Osma-Ruiz, N. Sáenz-Lechón, A. Villanueva, J. M. Gutiérrez-Arriola, J. I. Godino-Llorente, and R. Cabeza, “Fully-automatic glottis segmentation with active shape models,” in MAVEBA, (2011), pp. 35–38.

F. Chollet et al., “Keras,” (2015).

T. T. D. Team, R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, and A. Belikov et al., “Theano: A python framework for fast computation of mathematical expressions,” arXiv preprint arXiv:1605.02688 (2016).

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

R. M. Haralick and L. G. Shapiro, Computer and Robot Vision, vol. 1 (Addison-Wesley Reading, 1992).

L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceedings of COMPSTAT’2010, (Springer, 2010), pp. 177–186.

I. R. Titze and F. Alipour, The myoelastic aerodynamic theory of phonation (National Center for Voice and Speech, 2006).

D. Abdullah, F. Fajriana, M. Maryana, L. Rosnita, A. P. U. Siahaan, R. Rahim, P. Harliana, H. Harmayani, Z. Ginting, and C. I. Erliana et al., “Application of interpolation image by using bi-cubic algorithm,” in Journal of Physics: Conference Series, vol. 1114 (IOP Publishing, 2018), p. 012066.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (14)

Fig. 1.
Fig. 1. Sample videostroboscopy images from 18 subjects with SV showing the variation in terms of shape of the glottis, illumination and position of the camera while recording the data.
Fig. 2.
Fig. 2. Frames where the supraglottic structures block the glottis.
Fig. 3.
Fig. 3. Screenshot of Matlab based GUI (top left) used for annotation by SLPs. Histogram of the dice score (bottom left) calculated for all pairs of annotators to show the inter-annotator agreement and three exemplary images each annotated by three SLPs $\mathbf {(a_{1},a_{2},a_{3})}$ and the corresponding mean dice score obtained by averaging the dice scores obtained from all three pairs of annotators, ($\mathbf {(a_{1},a_{2})}$, $\mathbf {(a_{2},a_{3})}$, $\mathbf {(a_{1},a_{3})}$), shown in the right side of each row.
Fig. 4.
Fig. 4. Block diagram of the proposed glottis segmentation approach where an image is passed through two steps: 1) localization step which uses CNN1 architecture, 2) segmentation step which uses CNN2 architecture. BI denotes bicubic interpolation.
Fig. 5.
Fig. 5. Orientation estimation involved in WO based augmentation process.
Fig. 6.
Fig. 6. Histogram of orientation angle ($\phi$) between the major axis of glottis and the horizontal axis measured in degrees in NO, WB and WO based augmentation.
Fig. 7.
Fig. 7. Boxplot of dice score achieved by the 11 combinations of network architecture on validation data. $M_{1100}$ achieves the highest median dice score indicated by dashed horizontal line. The labels on the boxplot indicate quartiles and medians.
Fig. 8.
Fig. 8. $B_{a}$ using CNN1 of the two step CNN approach with three data augmentation schemes, NO, WB and WO evaluated on three SLPs, namely a1, a2, a3. It is clear that the WO data augmentation technique achieves the best $B_{a}$.
Fig. 9.
Fig. 9. Boxplot of dice score obtained using CNN_C and two step CNN approaches without augmentation (NO), with WB based augmentation and WO based augmentation. The dashed horizontal line indicates the average of three median dice scores obtained using the two step CNN WO evaluated on three SLPs.
Fig. 10.
Fig. 10. Bar graph indicating localization accuracy using various approaches evaluated across three annotators.
Fig. 11.
Fig. 11. Illustration of the glottis segmentation on various sample images, one from each of the 18 subjects with SV. Column (a)$\colon$ Subject number with corresponding dice score achieved by the proposed approach (blue) and average inter annotator agreement (red) on the sample image obtained by averaging the dice scores from every pair of annotators. Column (b)$\colon$ 720 $\times$ 576 original image with bounding box obtained in the localization step. Column (c)$\colon$ cropped image obtained from the localization step. Column (d)$\colon$ The predicted segment by the two step CNN approach. Column (e)$\colon$ Groundtruth segment corresponding to the cropped image labeled by annotator a1.
Fig. 12.
Fig. 12. Foldwise normalized count (Nc) histogram of mean dice scores from every fold in the test set evaluated on three SLPs and three histogram curves corresponding to evaluation on three annotators separately.
Fig. 13.
Fig. 13. Mean dice score averaged across three SLPs vs the mean dice score obtained using the proposed algorithm evaluated on three SLPs and the $l_{1}$ line fit to plot (red).
Fig. 14.
Fig. 14. Mean dice score using the proposed method evaluated across three SLPs vs the normalized glottis opening area with respect to the image size.

Tables (2)

Tables Icon

Table 1. p-values of the Student’s t-test performed on dice scores between all pairs of segmentation methods. A bold entry in a cell indicates that the method in the corresponding column has higher mean dice score than that in the corresponding row. Blue entry in a cell indicates that the methods in the corresponding row and column do not yield significantly different average dice score

Tables Icon

Table 2. Localization accuracy and corresponding dice score achieved by the baseline and the proposed approach

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

P i , j = { 1 P r i , j > m a x i , j ( P r i , j ) ϵ 0 otherwise
D ( P , G ) = 2 × N ( P G ) N ( P ) + N ( G )