Abstract

The European Southern Observatory (ESO) is studying the next generation giant telescope, called the European Extremely Large Telescope (E-ELT). With a 42 m diameter primary mirror, it is a significant step from currently existing telescopes. Therefore, the E-ELT with its instruments poses new challenges in terms of cost and computational complexity for the control system, including its adaptive optics (AO). Since the conventional matrix-vector multiplication (MVM) method successfully used so far for AO wavefront reconstruction cannot be efficiently scaled to the size of the AO systems on the E-ELT, faster algorithms are needed. Among those recently developed wavefront reconstruction algorithms, three are studied in this paper from the point of view of design, implementation, and absolute speed on three multicore multi-CPU platforms. We focus on a single-conjugate AO system for the E-ELT. The algorithms are the MVM, the Fourier transform reconstructor (FTR), and the fractal iterative method (FRiM). This study enhances the scaling of these algorithms with an increasing number of CPUs involved in the computation. We discuss implementation strategies, depending on various CPU architecture constraints, and we present the first quantitative execution times so far at the E-ELT scale. MVM suffers from a large computational burden, making the current computing platform undersized to reach timings short enough for AO wavefront reconstruction. In our study, the FTR provides currently the fastest reconstruction. FRiM is a recently developed algorithm, and several strategies are investigated and presented here in order to implement it for real-time AO wavefront reconstruction, and to optimize its execution time. The difficulty to parallelize the algorithm in such architecture is enhanced. We also show that FRiM can provide interesting scalability using a sparse matrix approach.

© 2012 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. R. Gilmozzi and J. Spyromilio, “The 42 m European ELT: status,” Proc. SPIE 7012, 701219 (2008).
    [CrossRef]
  2. E. Fedrigo and R. Donaldson, “SPARTA roadmap and future challenges,” Proc. SPIE 7736, 77364O (2010).
    [CrossRef]
  3. J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).
  4. B. L. Ellerbroek, “Efficient computation of minimum-variance wavefront reconstructors with sparse matrix techniques,” J. Opt. Soc. Am. A 19, 1803–1816 (2002).
    [CrossRef]
  5. B. L. Ellerbroek and C. R. Vogel, “Inverse problems in astronomical adaptive optics,” Inverse Probl. 25, 063001 (2009).
    [CrossRef]
  6. L. Poyneer, D. Gavel, and J. Brase, “Fast wave-front reconstruction in large adaptive optics systems with use of the Fourier transform,” J. Opt. Soc. Am. A 19, 2100–2111 (2002).
    [CrossRef]
  7. É. Thiébaut and M. Tallon, “Fast minimum variance reconstruction for extremely large telescopes,” J. Opt. Soc. Am. A 27, 1046–1059 (2010).
    [CrossRef]
  8. I. Montilla, C. Béchet, M. Le Louarn, M. Reyes, and M. Tallon, “Performance comparison of wavefront reconstruction and control algorithms for extremely large telescopes,” J. Opt. Soc. Am. A 27, A9–A18 (2010).
    [CrossRef]
  9. B. Schauer, “Multicore processors, a necessity,” ProQuest Discovery Guides1–14 (2008).
  10. D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an Intel Nehalem Multiprocessor system,” in International Conference on Parallel Architectures and Compilation Techniques (IEEE Computer Society, 2009), pp. 261–270.
  11. D. an Mey and C. Terboven, “Affinity matters! OpenMP on multicore and ccNUMA architectures,” in Parallel Computing: Architecture, Algorithms and Application 15 (2008).
  12. C. E. Leiserson and A. Plaat, “Programming parallel applications in Cilk,” SIAM News 31, 122–132 (1998).
  13. Y. He, C. E. Leiserson, and W. M. Leiserson, “The Cilkview scalability analyzer,” in Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (2010), pp. 145–156.
  14. A. Fog, Optimizing Software in C++, Optimization Guide for Windows, Linux and Mac Platforms (Copenhagen University College of Engineering, 2010), pp. 124–137.
  15. Intel, “Intel 64 and IA-32 Architectures Software Developer’s Manual,” 3A: part 1, 17.12, Intel (2010), pp. 733–735.
  16. Advanced Micro Devices, “AMD64 Architecture Programmer’s Manual,” (2010), pp. 350–351.
  17. Advanced Micro Devices, “BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 10 h Processors,” (2010), p. 109.
  18. Advanced Micro Devices, “Software Optimization Guide for AMD Family 10 h and 12 h Processors,” (2010), pp. 305–306.
  19. M. A. van Dam, D. Le Mignant, and B. A. Macintosh, “Performance of the Keck Observatory adaptive-optics system,” Appl. Opt. 43, 5458–5467 (2004).
    [CrossRef]
  20. F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).
  21. R. Conan and J. P. Véran, “Advances in real-time control algorithms,” Proc. SPIE 7736, 773613 (2010).
    [CrossRef]
  22. C. R. Vogel and Q. Yang, “Fast optimal wavefront reconstruction for multi-conjugate adaptive optics using the Fourier domain preconditioned conjugate gradient algorithm,” Opt. Express 14, 7487–7498 (2006).
    [CrossRef]
  23. Q. Yang, C. R. Vogel, and B. L. Ellerbroek, “Fourier domain preconditioned conjugate gradient algorithm for atmospheric tomography,” Appl. Opt. 45, 5281–5293 (2006).
    [CrossRef]
  24. D. Cutting, D. Karger, J. Pedersen, and J. W. Tukey, “Scatter/gather: a cluster-based approach to browsing large document collections,” in Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen (1992).
  25. M. Welsh, S. D. Gribble, E. A. Brewer, and D. Culler, A Design Framework for Highly Concurrent Systems (University of California, 2000).

2010

E. Fedrigo and R. Donaldson, “SPARTA roadmap and future challenges,” Proc. SPIE 7736, 77364O (2010).
[CrossRef]

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

R. Conan and J. P. Véran, “Advances in real-time control algorithms,” Proc. SPIE 7736, 773613 (2010).
[CrossRef]

É. Thiébaut and M. Tallon, “Fast minimum variance reconstruction for extremely large telescopes,” J. Opt. Soc. Am. A 27, 1046–1059 (2010).
[CrossRef]

I. Montilla, C. Béchet, M. Le Louarn, M. Reyes, and M. Tallon, “Performance comparison of wavefront reconstruction and control algorithms for extremely large telescopes,” J. Opt. Soc. Am. A 27, A9–A18 (2010).
[CrossRef]

2009

B. L. Ellerbroek and C. R. Vogel, “Inverse problems in astronomical adaptive optics,” Inverse Probl. 25, 063001 (2009).
[CrossRef]

2008

B. Schauer, “Multicore processors, a necessity,” ProQuest Discovery Guides1–14 (2008).

D. an Mey and C. Terboven, “Affinity matters! OpenMP on multicore and ccNUMA architectures,” in Parallel Computing: Architecture, Algorithms and Application 15 (2008).

R. Gilmozzi and J. Spyromilio, “The 42 m European ELT: status,” Proc. SPIE 7012, 701219 (2008).
[CrossRef]

2006

2004

2002

1998

C. E. Leiserson and A. Plaat, “Programming parallel applications in Cilk,” SIAM News 31, 122–132 (1998).

an Mey, D.

D. an Mey and C. Terboven, “Affinity matters! OpenMP on multicore and ccNUMA architectures,” in Parallel Computing: Architecture, Algorithms and Application 15 (2008).

Baudoz, P.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Bec, M.

F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).

Béchet, C.

Beuzit, J. L.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Boccas, M.

F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).

Brase, J.

Brewer, E. A.

M. Welsh, S. D. Gribble, E. A. Brewer, and D. Culler, A Design Framework for Highly Concurrent Systems (University of California, 2000).

Buey, T.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Charton, J.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Conan, R.

R. Conan and J. P. Véran, “Advances in real-time control algorithms,” Proc. SPIE 7736, 773613 (2010).
[CrossRef]

Costille, A.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Culler, D.

M. Welsh, S. D. Gribble, E. A. Brewer, and D. Culler, A Design Framework for Highly Concurrent Systems (University of California, 2000).

Cutting, D.

D. Cutting, D. Karger, J. Pedersen, and J. W. Tukey, “Scatter/gather: a cluster-based approach to browsing large document collections,” in Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen (1992).

Dohlen, K.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Donaldson, R.

E. Fedrigo and R. Donaldson, “SPARTA roadmap and future challenges,” Proc. SPIE 7736, 77364O (2010).
[CrossRef]

Ellerbroek, B. L.

Fedrigo, E.

E. Fedrigo and R. Donaldson, “SPARTA roadmap and future challenges,” Proc. SPIE 7736, 77364O (2010).
[CrossRef]

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Fog, A.

A. Fog, Optimizing Software in C++, Optimization Guide for Windows, Linux and Mac Platforms (Copenhagen University College of Engineering, 2010), pp. 124–137.

Fusco, T.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Garcia-Rissmann, A.

F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).

Gavel, D.

Gilmozzi, R.

R. Gilmozzi and J. Spyromilio, “The 42 m European ELT: status,” Proc. SPIE 7012, 701219 (2008).
[CrossRef]

Gratadour, D.

F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).

Gribble, S. D.

M. Welsh, S. D. Gribble, E. A. Brewer, and D. Culler, A Design Framework for Highly Concurrent Systems (University of California, 2000).

Hackenberg, D.

D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an Intel Nehalem Multiprocessor system,” in International Conference on Parallel Architectures and Compilation Techniques (IEEE Computer Society, 2009), pp. 261–270.

He, Y.

Y. He, C. E. Leiserson, and W. M. Leiserson, “The Cilkview scalability analyzer,” in Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (2010), pp. 145–156.

Hubin, N.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Karger, D.

D. Cutting, D. Karger, J. Pedersen, and J. W. Tukey, “Scatter/gather: a cluster-based approach to browsing large document collections,” in Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen (1992).

Kasper, M.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Le Louarn, M.

Le Mignant, D.

Leiserson, C. E.

C. E. Leiserson and A. Plaat, “Programming parallel applications in Cilk,” SIAM News 31, 122–132 (1998).

Y. He, C. E. Leiserson, and W. M. Leiserson, “The Cilkview scalability analyzer,” in Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (2010), pp. 145–156.

Leiserson, W. M.

Y. He, C. E. Leiserson, and W. M. Leiserson, “The Cilkview scalability analyzer,” in Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (2010), pp. 145–156.

Macintosh, B. A.

Meimon, S.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Molka, D.

D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an Intel Nehalem Multiprocessor system,” in International Conference on Parallel Architectures and Compilation Techniques (IEEE Computer Society, 2009), pp. 261–270.

Montilla, I.

Mouillet, D.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Muller, M. S.

D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an Intel Nehalem Multiprocessor system,” in International Conference on Parallel Architectures and Compilation Techniques (IEEE Computer Society, 2009), pp. 261–270.

Neichel, B.

F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).

Pedersen, J.

D. Cutting, D. Karger, J. Pedersen, and J. W. Tukey, “Scatter/gather: a cluster-based approach to browsing large document collections,” in Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen (1992).

Petit, C.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Plaat, A.

C. E. Leiserson and A. Plaat, “Programming parallel applications in Cilk,” SIAM News 31, 122–132 (1998).

Poyneer, L.

Rabou, P.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Reyes, M.

Rigaut, F.

F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).

Sauvage, J. F.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Schauer, B.

B. Schauer, “Multicore processors, a necessity,” ProQuest Discovery Guides1–14 (2008).

Schone, R.

D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an Intel Nehalem Multiprocessor system,” in International Conference on Parallel Architectures and Compilation Techniques (IEEE Computer Society, 2009), pp. 261–270.

Sevin, A.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Spyromilio, J.

R. Gilmozzi and J. Spyromilio, “The 42 m European ELT: status,” Proc. SPIE 7012, 701219 (2008).
[CrossRef]

Tallon, M.

Terboven, C.

D. an Mey and C. Terboven, “Affinity matters! OpenMP on multicore and ccNUMA architectures,” in Parallel Computing: Architecture, Algorithms and Application 15 (2008).

Thiébaut, É.

Tukey, J. W.

D. Cutting, D. Karger, J. Pedersen, and J. W. Tukey, “Scatter/gather: a cluster-based approach to browsing large document collections,” in Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen (1992).

Valles, M. S.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

van Dam, M. A.

Véran, J. P.

R. Conan and J. P. Véran, “Advances in real-time control algorithms,” Proc. SPIE 7736, 773613 (2010).
[CrossRef]

Vogel, C. R.

Welsh, M.

M. Welsh, S. D. Gribble, E. A. Brewer, and D. Culler, A Design Framework for Highly Concurrent Systems (University of California, 2000).

Wildi, F.

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

Yang, Q.

Appl. Opt.

Inverse Probl.

B. L. Ellerbroek and C. R. Vogel, “Inverse problems in astronomical adaptive optics,” Inverse Probl. 25, 063001 (2009).
[CrossRef]

J. Opt. Soc. Am. A

Opt. Express

Parallel Computing: Architecture, Algorithms and Application

D. an Mey and C. Terboven, “Affinity matters! OpenMP on multicore and ccNUMA architectures,” in Parallel Computing: Architecture, Algorithms and Application 15 (2008).

Proc. SPIE

R. Conan and J. P. Véran, “Advances in real-time control algorithms,” Proc. SPIE 7736, 773613 (2010).
[CrossRef]

R. Gilmozzi and J. Spyromilio, “The 42 m European ELT: status,” Proc. SPIE 7012, 701219 (2008).
[CrossRef]

E. Fedrigo and R. Donaldson, “SPARTA roadmap and future challenges,” Proc. SPIE 7736, 77364O (2010).
[CrossRef]

J. F. Sauvage, T. Fusco, C. Petit, S. Meimon, E. Fedrigo, M. S. Valles, M. Kasper, N. Hubin, J. L. Beuzit, J. Charton, A. Costille, P. Rabou, D. Mouillet, P. Baudoz, T. Buey, A. Sevin, F. Wildi, and K. Dohlen, “SAXO, the eXtreme adaptive optics system of SPHERE. Overview and calibration procedure,” Proc. SPIE 7736, 77360F-1 (2010).

ProQuest Discovery Guides

B. Schauer, “Multicore processors, a necessity,” ProQuest Discovery Guides1–14 (2008).

SIAM News

C. E. Leiserson and A. Plaat, “Programming parallel applications in Cilk,” SIAM News 31, 122–132 (1998).

Other

Y. He, C. E. Leiserson, and W. M. Leiserson, “The Cilkview scalability analyzer,” in Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures (2010), pp. 145–156.

A. Fog, Optimizing Software in C++, Optimization Guide for Windows, Linux and Mac Platforms (Copenhagen University College of Engineering, 2010), pp. 124–137.

Intel, “Intel 64 and IA-32 Architectures Software Developer’s Manual,” 3A: part 1, 17.12, Intel (2010), pp. 733–735.

Advanced Micro Devices, “AMD64 Architecture Programmer’s Manual,” (2010), pp. 350–351.

Advanced Micro Devices, “BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 10 h Processors,” (2010), p. 109.

Advanced Micro Devices, “Software Optimization Guide for AMD Family 10 h and 12 h Processors,” (2010), pp. 305–306.

F. Rigaut, B. Neichel, M. Bec, M. Boccas, A. Garcia-Rissmann, and D. Gratadour, “A sample of GeMS calibrations and control schemes,” in First AO4ELT Conference (EDP, 2010).

D. Molka, D. Hackenberg, R. Schone, and M. S. Muller, “Memory performance and cache coherency effects on an Intel Nehalem Multiprocessor system,” in International Conference on Parallel Architectures and Compilation Techniques (IEEE Computer Society, 2009), pp. 261–270.

D. Cutting, D. Karger, J. Pedersen, and J. W. Tukey, “Scatter/gather: a cluster-based approach to browsing large document collections,” in Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen (1992).

M. Welsh, S. D. Gribble, E. A. Brewer, and D. Culler, A Design Framework for Highly Concurrent Systems (University of California, 2000).

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (22)

Fig. 1.
Fig. 1.

Cache configuration for Intel Nehalem (a) and AMD Magny–Cours (b). Both have a similar structure at the core level, with two cache levels for each core, and a shared third level of cache. Communications architecture and technology [QPI for Quickpath Interconnect in (a) and HT for HyperTransport in (b)] are responsible for the impact of affinity setting on performance improvement.

Fig. 2.
Fig. 2.

Jitter measurement for 104 times continuous calling of POSIX system function gettimeofday(). Diamond symbols, Xeon 5520 platform; square symbols, AMD Magny–Cours platform; cross symbols, Xeon 5560 platform. Values are moved up by 0.2 for square, and 0.4 for cross for clarity. Dash-dot line, maximum value on Xeon 5520; dashed line, maximum value on AMD Magny–Cours; dotted line, maximum value on Xeon 5560. Xeon 5520 provides a mean value of 0.23 μs with 0.3169 Standard Deviation (STD), AMD Magny–Cours provides a mean value of 0.24 μs with 0.3051 STD, and Xeon 5560 with a real-time patched kernel provides a mean value of 0.22 μs with 0.2187 STD.

Fig. 3.
Fig. 3.

Two schemes for FT reconstruction implementation.

Fig. 4.
Fig. 4.

Illustration of the matrix equivalent to ST·W·S for a 9×9 AO system with a circular aperture mask. This enhances how sparse this operator is, with a maximum of five nonzero elements per row, even in the 129×129 AO system of the E-ELT.

Fig. 5.
Fig. 5.

Aperture mask used for the equivalent operator of ST·W·S in the aperture-masked FRiM implementation. The white part contains the useful elements of the 129×129 vector.

Fig. 6.
Fig. 6.

Mask used for the K and KT operators implementation in the aperture-masked FRiM. White parts show the elements effectively involved in the computation.

Fig. 7.
Fig. 7.

Illustration of the computation scheme for the aperture-masked FRiM. Only computations related to elements in the white area are executed.

Fig. 8.
Fig. 8.

Parallel-masked FRiM scheme for four threads.

Fig. 9.
Fig. 9.

Scheme in figure (a) uses a mutex vector with the same length of result vector, to protect each value in the global variable shared by four threads and doing the sum up. Scheme in (b) reduced the mutex lock by only inspecting the values that are actually shared by different threads, as can be seen in Fig. 6. This reduced the number of locks to only 3% compared to the previous method. Scheme in (c) only uses four mutex to protect a status flag array (4 bytes accessible as one integer for the main thread) as a barrier, and finishing the sum up with the main thread.

Fig. 10.
Fig. 10.

Nonzero values in the equivalent matrix for A operator in the example for a 129×129 AO system.

Fig. 11.
Fig. 11.

Execution time (ms, diamond symbols) and speed up (circle symbols) versus number of cores, on the AMD platform for row-wise parallel MVM algorithm, using ACML library.

Fig. 12.
Fig. 12.

Execution time (ms, diamond symbols) and speed up (circle symbols) versus number of cores, on Intel Xeon 5560 platform for row-wise parallel MVM algorithm, using MKL. Only values obtained with the best affinity settings are shown.

Fig. 13.
Fig. 13.

Speed up factors (diamond symbols) for different affinity settings, when using MKL for row-wise parallel MVM on Intel Xeon 5560 platform. The envelope values (solid line), i.e., best values, are reproduced in Fig. 12. Linear scaling (dashed line) is plotted for reference. Texts next to each node are the core identifiers as described in Table 3 and Subsection 4.A.

Fig. 14.
Fig. 14.

Execution times (rectangle) (μs), speed up factors (circle symbols) of the full FTR algorithm and speed up factor (diamond symbols) of FTR algorithm excluding (inverse) FT on Intel Xeon 5560 platform.

Fig. 15.
Fig. 15.

Execution time in μs (rectangle) and speed up (diamond symbols) for the application of K operator, according to different implementations. Left: full-square K. Middle: aperture-masked K. Right: parallel-masked K. For the parallel-masked K (right), the final synchronization and sum up is not included, since it is delayed until the complete application of A is completed.

Fig. 16.
Fig. 16.

Execution time in ms (rectangles) of various steps of FRiM algorithm. White rectangles: full-square FRiM implementation. Black rectangles: aperture-masked FRiM implementation. Left: execution time for A operator only. Middle: execution time for residual norm computation and first PCG iteration only. Right: execution time for the complete FRiM algorithm. The speed up curve (diamond symbols) shows the acceleration of every part when going from the full-square implementation to the aperture-masked one.

Fig. 17.
Fig. 17.

Execution time (μs) of the parallel A operator, including the synchronization and the final sum of all threads, for the three different synchronization schemes presented in Fig. 9 described in Subsection 3.C.4. Left: scheme (a), with full mutex vector. Middle: scheme (b), with reduced-size mutex vector. Right: scheme (c), with the sum done by the main thread. Test is done on Intel Xeon 5560, platform 3, with both OpenMP (white) and pthread (black).

Fig. 18.
Fig. 18.

Smoothed histograms representing the pure synchronization time as a function of the number of threads, for two methods as presented in Fig. 9 and described in Subsection 3.C.4: similar to scheme (b) on the top, and to scheme (c) on the lower plot. Intel Xeon 5560 platform is used.

Fig. 19.
Fig. 19.

Top: execution time (μs) measured over 4000 successive runs of A operator application with the parallel mask implementation of the FRiM, using the synchronization of reduced mutex vector [scheme (b) in Fig. 9]. Contentions frequently happen, producing two main peaks of speed in the representation thanks to histogram (bottom).

Fig. 20.
Fig. 20.

Execution times (rectangle) in ms and speed up factors (diamond symbols) of the complete FRiM algorithm for various implementations, on the Intel Xeon 5560 platform (3). First column: full-square FRiM. Second column: aperture-masked FRiM. Third column: parallel-masked FRiM with two threads. Fourth column: parallel-masked FRiM with four threads.

Fig. 21.
Fig. 21.

Speed up factor (left column) and execution time in ms (right column) for application of the sparse matrix A. Top panel: Intel Xeon 5560 platform. Bottom panel: AMD Magny–Cours platform. On the left, both sparse description formats (CSR and COO) are used. On the right, speed up is plotted only for the CSR format, the best performing one.

Fig. 22.
Fig. 22.

Execution times (diamond symbols) (ms) and speed up factors (circle symbols) of the full algorithm using the sparse equivalent matrix for the FRiM, on AMD Magny–Cours platform (1).

Tables (5)

Tables Icon

Table 1. Description of the Three Test Platforms

Tables Icon

Table 2. Software Packages Tested in Benchmark, and the Ones Eventually Selected for the Results Presented Here

Tables Icon

Table 3. Affinity Settings Tested on Intel Nehalem Platform

Tables Icon

Table 4. 2D Real-to-Complex FT Execution Time (μs), with MKL, for an 85×85 Array, on Intel Platforms

Tables Icon

Table 5. Best Performance Summary

Equations (9)

Equations on this page are rendered with MathJax. Learn more.

δa=R·d.
A=KT·ST·W·S·K+μI,
D=DinDout,
w=iDinυi·ui+iDoutυi·ui,
KT(iDυi·ui)=iDinυi·KT(ui)+iDoutυi·KT(ui).
KT(iDυi·ui)=iDinυi·KT(ui).
Din=j=1NDj,
KT(iDυi·ui)=j=1NiDjυi·KT(ui)
speedup=calculation time with1threadcalculation time withNthreads,

Metrics