Abstract

Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not “share” memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform’s memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.

© 2011 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. T. Schmoll, C. Kolbitsch, and R. A. Leitgeb, “Ultra-high-speed volumetric tomography of human retinal blood flow,” Opt. Express 17, 4166–4176 (2009).
    [Crossref] [PubMed]
  2. M. Wojtkowski, V. Srinivasan, T. Ko, J. Fujimoto, A. Kowalczyk, and J. Duker, “Ultrahigh-resolution, high-speed, Fourier domain optical coherence tomography and methods for dispersion compensation,” Opt. Express 12, 2404–2422 (2004).
    [Crossref] [PubMed]
  3. M. K. K. Leung, A. Mariampillai, B. A. Standish, K. K. C. Lee, N. R. Munce, I. A. Vitkin, and V. X. D. Yang, “High-power wavelength-swept laser in littman telescope-less polygon filter and dual-amplifier configuration for multichannel optical coherence tomography,” Opt. Lett. 34, 2814–2816 (2009).
    [Crossref] [PubMed]
  4. Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14, 060506 (2009).
    [Crossref]
  5. K. Zhang and J. U. Kang, “Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system,” Opt. Express 18, 11772–11784 (2010).
    [Crossref] [PubMed]
  6. Q. Fang and D. A. Boas, “Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units,” Opt. Express 17, 20178–20190 (2009).
    [Crossref] [PubMed]
  7. N. Ren, J. Liang, X. Qu, J. Li, B. Lu, and J. Tian, “GPU-based Monte Carlo simulation for light propagation in complex heterogeneous tissues,” Opt. Express 18, 6811–6823(2010).
    [Crossref] [PubMed]
  8. E. Alerstam, W. C. Y. Lo, T. D. Han, J. Rose, S. Andersson-Engels, and L. Lilge, “Next-generation acceleration and code optimization for light transport in turbid media using GPUs,” Biomed. Opt. Express 1, 658–675 (2010).
    [Crossref]
  9. Y. Watanabe and T. Itagaki, “Real-time display on SD-OCT using a linear-in-wavenumber spectrometer and a graphics processing unit,” Proc. SPIE 7554, 75542S (2010).
    [Crossref]
  10. Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49, 4756–4762 (2010).
    [Crossref] [PubMed]
  11. J. Xu, L. Molday, R. Molday, and M. Sarunic, “In vivo imaging of the mouse model of X-linked juvenile retinoschisis with Fourier domain optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 50, 2989 (2009).
    [Crossref]
  12. J. Goodman, Statistical Optics (Wiley, 2000).
  13. “NVIDIA CUDA Programming Guide,” (2009), http ://developer.nvidia.com/object/cuda_2_3_downloads.html.
  14. “CUDA CUFFT Library,” 2009, http ://developer.nvidia.com/object/cuda_2_3_downloads.html.
  15. Ideally, the postprocessed data should be directly copied into the frame buffer on the GPU for display without additional copies between host and device.
  16. “CUDA Visual Profiler,” 2009, http ://developer.nvidia.com/object/cuda_2_3_downloads.html.
  17. Because of the overhead incurred from initiating data transfers between device and host memory, data copies need to be “batched” (i.e., multiple individual copies grouped together into a single multiword copy) to amortize this cost . In Fig. , a batch size of 8192 was used.
  18. Integrated GPUs are packaged on the same chip as the system memory controller and system I/O controller to provide a compact, low cost, low power-consumption solution. Because of these design constraints, they provide less processing power and fewer processing cores to meet the requirements.
  19. G. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE 86, 82–85 (1998).
    [Crossref]
  20. W. Wieser, B. R. Biedermann, T. Klein, C. M. Eigenwillig, and R. Huber, “Multi-megahertz OCT: High quality 3d imaging at 20 million a-scans and 4.5 gvoxels per second,” Opt. Express 18, 14685–14704 (2010).
    [Crossref] [PubMed]

2010 (6)

2009 (5)

2004 (1)

1998 (1)

G. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE 86, 82–85 (1998).
[Crossref]

Alerstam, E.

Andersson-Engels, S.

Aoshima, K.

Biedermann, B. R.

Boas, D. A.

Duker, J.

Eigenwillig, C. M.

Fang, Q.

Fujimoto, J.

Goodman, J.

J. Goodman, Statistical Optics (Wiley, 2000).

Han, T. D.

Hasegawa, H.

Huber, R.

Itagaki, T.

Y. Watanabe and T. Itagaki, “Real-time display on SD-OCT using a linear-in-wavenumber spectrometer and a graphics processing unit,” Proc. SPIE 7554, 75542S (2010).
[Crossref]

Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14, 060506 (2009).
[Crossref]

Kang, J. U.

Klein, T.

Ko, T.

Kolbitsch, C.

Koseki, H.

Kowalczyk, A.

Lee, K. K. C.

Leitgeb, R. A.

Leung, M. K. K.

Li, J.

Liang, J.

Lilge, L.

Lo, W. C. Y.

Lu, B.

Maeno, S.

Mariampillai, A.

Molday, L.

J. Xu, L. Molday, R. Molday, and M. Sarunic, “In vivo imaging of the mouse model of X-linked juvenile retinoschisis with Fourier domain optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 50, 2989 (2009).
[Crossref]

Molday, R.

J. Xu, L. Molday, R. Molday, and M. Sarunic, “In vivo imaging of the mouse model of X-linked juvenile retinoschisis with Fourier domain optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 50, 2989 (2009).
[Crossref]

Moore, G.

G. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE 86, 82–85 (1998).
[Crossref]

Munce, N. R.

Qu, X.

Ren, N.

Rose, J.

Sarunic, M.

J. Xu, L. Molday, R. Molday, and M. Sarunic, “In vivo imaging of the mouse model of X-linked juvenile retinoschisis with Fourier domain optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 50, 2989 (2009).
[Crossref]

Schmoll, T.

Srinivasan, V.

Standish, B. A.

Tian, J.

Vitkin, I. A.

Watanabe, Y.

Y. Watanabe and T. Itagaki, “Real-time display on SD-OCT using a linear-in-wavenumber spectrometer and a graphics processing unit,” Proc. SPIE 7554, 75542S (2010).
[Crossref]

Y. Watanabe, S. Maeno, K. Aoshima, H. Hasegawa, and H. Koseki, “Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units,” Appl. Opt. 49, 4756–4762 (2010).
[Crossref] [PubMed]

Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14, 060506 (2009).
[Crossref]

Wieser, W.

Wojtkowski, M.

Xu, J.

J. Xu, L. Molday, R. Molday, and M. Sarunic, “In vivo imaging of the mouse model of X-linked juvenile retinoschisis with Fourier domain optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 50, 2989 (2009).
[Crossref]

Yang, V. X. D.

Zhang, K.

Appl. Opt. (1)

Biomed. Opt. Express (1)

Invest. Ophthalmol. Visual Sci. (1)

J. Xu, L. Molday, R. Molday, and M. Sarunic, “In vivo imaging of the mouse model of X-linked juvenile retinoschisis with Fourier domain optical coherence tomography,” Invest. Ophthalmol. Visual Sci. 50, 2989 (2009).
[Crossref]

J. Biomed. Opt. (1)

Y. Watanabe and T. Itagaki, “Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit,” J. Biomed. Opt. 14, 060506 (2009).
[Crossref]

Opt. Express (6)

Opt. Lett. (1)

Proc. IEEE (1)

G. Moore, “Cramming more components onto integrated circuits,” Proc. IEEE 86, 82–85 (1998).
[Crossref]

Proc. SPIE (1)

Y. Watanabe and T. Itagaki, “Real-time display on SD-OCT using a linear-in-wavenumber spectrometer and a graphics processing unit,” Proc. SPIE 7554, 75542S (2010).
[Crossref]

Other (7)

J. Goodman, Statistical Optics (Wiley, 2000).

“NVIDIA CUDA Programming Guide,” (2009), http ://developer.nvidia.com/object/cuda_2_3_downloads.html.

“CUDA CUFFT Library,” 2009, http ://developer.nvidia.com/object/cuda_2_3_downloads.html.

Ideally, the postprocessed data should be directly copied into the frame buffer on the GPU for display without additional copies between host and device.

“CUDA Visual Profiler,” 2009, http ://developer.nvidia.com/object/cuda_2_3_downloads.html.

Because of the overhead incurred from initiating data transfers between device and host memory, data copies need to be “batched” (i.e., multiple individual copies grouped together into a single multiword copy) to amortize this cost . In Fig. , a batch size of 8192 was used.

Integrated GPUs are packaged on the same chip as the system memory controller and system I/O controller to provide a compact, low cost, low power-consumption solution. Because of these design constraints, they provide less processing power and fewer processing cores to meet the requirements.

Cited By

OSA participates in Crossref's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1
Fig. 1

FD-OCT processing flow.

Fig. 2
Fig. 2

The complete FD-OCT system using GPU as coprocessor.

Fig. 3
Fig. 3

Algorithm flowchart for GPU and sample retina image.

Fig. 4
Fig. 4

Percentage of GPU functions’ runtime.

Fig. 5
Fig. 5

Performance against A-scan batch sizes.

Fig. 6
Fig. 6

Processing-only line rate over different A-scan on two GPUs.

Tables (1)

Tables Icon

Table 1 Specifications for GPU in Discussion

Metrics