Abstract

The optical network integrated computing environment has been thought of as a promising technology to support large-scale data-intensive distributed computing applications. For such an environment involving so many heterogeneous resources, such as high-performance processors and optical links, faults seem to be inevitable. The faults will lead to the failure of the applications or highly delay the applications' finish times. Therefore, it is necessary to analyze resources' fault probability and then to better schedule the tasks of the application onto the appropriate resources so as to minimize the fault probability of the application. We address the task-scheduling problem based on the fault probability analysis for distributed computing applications over an optical network. We quantitatively analyze the fault probability of the processors and optical links in a given interval and propose a minimal fault probability (MFP) task-scheduling algorithm to minimize the fault probability of the application. We develop a simulator to evaluate the performance of the MFP algorithm. The simulation results prove the efficiency of the MFP algorithm.

© 2008 Optical Society of America

PDF Article

References

  • View by:
  • |
  • |
  • |

  1. D. Simeonidou, C. Nejabati, G. Zervas, D. Klonidis, A. Tzanakaki, and M. J. O'Mahony, “Dynamic optical network architectures and technologies for existing and emerging grid services,” J. Lightwave Technol. 23, 3347-3357 (2005).
    [CrossRef]
  2. A. Jukan and G. Karmous-Edwards, “Optical control plane for the grid community,” IEEE Commun. Surv. Tutorials 9, 30-44 (2007).
  3. W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.
  4. J. D. Ullman, “NP-complete scheduling problems,” J. Comput. Syst. Sci. 10, 384-393 (1975).
  5. A. Gerasoulis and T. Yang, “A comparison of clustering heuristics for scheduling directed acyclic graphs onto multiprocessors,” J. Parallel Distrib. Comput. 16, 276-291 (1992).
    [CrossRef]
  6. H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Trans. Parallel Distrib. Syst. 13, 260-274 (2002).
    [CrossRef]
  7. G. C. Sih and E. A. Lee, “A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures,” IEEE Trans. Parallel Distrib. Syst. 4, 175-187 (1993).
    [CrossRef]
  8. O. Sinnen and L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst. 16, 503-515 (2005).
    [CrossRef]
  9. Y. Wang, Y. H. Jin, W. Guo, W. Q. Sun, W. S. Hu, and M. Y. Wu, “Joint scheduling for optical grid applications,” J. Opt. Netw. 6, 304-318 (2007).
    [CrossRef]
  10. Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).
  11. X. Liu and C. Qiao, “Survivable optical grids,” in Optical Fiber Communication Conference, and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWN1.
  12. S. Hwang and C. Kesselman, “A flexible framework for fault tolerance in the grid,” J. Grid. Comput. 1, 251-272 (2003).
  13. F. C. Gartner, “Fundamentals of fault-tolerant distributed computing in asynchronous environments,” ACM Comput. Surv. 31, 1-26 (1999).
  14. J. Zhang and B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networks 18(2), 41-48 (2004).
  15. O. Sinnen and L. A. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput. 30, 81-101 (2004).
  16. N. Wirth, Algorithms and Data Structures (Oberon, 2004).
  17. B. Mikac and R. Inkret, “Availability model of WDM optical networks,” in Proceedings of the Second International Workshop on the Design of Reliable Communication Networks (2000), pp. 80-85.
  18. J. Li, Y. Fan, and M. Zhou, “Performance modeling and analysis of workflow,” IEEE Trans. Syst. Man Cybern., Part A Syst. Humans 34, 229-242 (2004).
  19. L. Zhou, M. Held, and U. Sennhauser, “Connection availability analysis of shared backup path-protected mesh networks,” J. Lightwave Technol. 25, 1111-1119 (2007).
    [CrossRef]

2007 (3)

2005 (2)

2004 (3)

J. Zhang and B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networks 18(2), 41-48 (2004).

O. Sinnen and L. A. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput. 30, 81-101 (2004).

J. Li, Y. Fan, and M. Zhou, “Performance modeling and analysis of workflow,” IEEE Trans. Syst. Man Cybern., Part A Syst. Humans 34, 229-242 (2004).

2003 (1)

S. Hwang and C. Kesselman, “A flexible framework for fault tolerance in the grid,” J. Grid. Comput. 1, 251-272 (2003).

2002 (1)

H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Trans. Parallel Distrib. Syst. 13, 260-274 (2002).
[CrossRef]

1999 (1)

F. C. Gartner, “Fundamentals of fault-tolerant distributed computing in asynchronous environments,” ACM Comput. Surv. 31, 1-26 (1999).

1993 (1)

G. C. Sih and E. A. Lee, “A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures,” IEEE Trans. Parallel Distrib. Syst. 4, 175-187 (1993).
[CrossRef]

1992 (1)

A. Gerasoulis and T. Yang, “A comparison of clustering heuristics for scheduling directed acyclic graphs onto multiprocessors,” J. Parallel Distrib. Comput. 16, 276-291 (1992).
[CrossRef]

1975 (1)

J. D. Ullman, “NP-complete scheduling problems,” J. Comput. Syst. Sci. 10, 384-393 (1975).

Fan, Y.

J. Li, Y. Fan, and M. Zhou, “Performance modeling and analysis of workflow,” IEEE Trans. Syst. Man Cybern., Part A Syst. Humans 34, 229-242 (2004).

Fu, S.

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Gartner, F. C.

F. C. Gartner, “Fundamentals of fault-tolerant distributed computing in asynchronous environments,” ACM Comput. Surv. 31, 1-26 (1999).

Gerasoulis, A.

A. Gerasoulis and T. Yang, “A comparison of clustering heuristics for scheduling directed acyclic graphs onto multiprocessors,” J. Parallel Distrib. Comput. 16, 276-291 (1992).
[CrossRef]

Guo, W.

Y. Wang, Y. H. Jin, W. Guo, W. Q. Sun, W. S. Hu, and M. Y. Wu, “Joint scheduling for optical grid applications,” J. Opt. Netw. 6, 304-318 (2007).
[CrossRef]

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).

Hariri, S.

H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Trans. Parallel Distrib. Syst. 13, 260-274 (2002).
[CrossRef]

Held, M.

Hu, W.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Hu, W. S.

Hwang, S.

S. Hwang and C. Kesselman, “A flexible framework for fault tolerance in the grid,” J. Grid. Comput. 1, 251-272 (2003).

Inkret, R.

B. Mikac and R. Inkret, “Availability model of WDM optical networks,” in Proceedings of the Second International Workshop on the Design of Reliable Communication Networks (2000), pp. 80-85.

Jin, Y.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Jin, Y. H.

Jukan, A.

A. Jukan and G. Karmous-Edwards, “Optical control plane for the grid community,” IEEE Commun. Surv. Tutorials 9, 30-44 (2007).

Karmous-Edwards, G.

A. Jukan and G. Karmous-Edwards, “Optical control plane for the grid community,” IEEE Commun. Surv. Tutorials 9, 30-44 (2007).

Kesselman, C.

S. Hwang and C. Kesselman, “A flexible framework for fault tolerance in the grid,” J. Grid. Comput. 1, 251-272 (2003).

Klonidis, D.

Lee, E. A.

G. C. Sih and E. A. Lee, “A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures,” IEEE Trans. Parallel Distrib. Syst. 4, 175-187 (1993).
[CrossRef]

Li, J.

J. Li, Y. Fan, and M. Zhou, “Performance modeling and analysis of workflow,” IEEE Trans. Syst. Man Cybern., Part A Syst. Humans 34, 229-242 (2004).

Lin, X.

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Liu, H.

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Liu, X.

X. Liu and C. Qiao, “Survivable optical grids,” in Optical Fiber Communication Conference, and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWN1.

Mikac, B.

B. Mikac and R. Inkret, “Availability model of WDM optical networks,” in Proceedings of the Second International Workshop on the Design of Reliable Communication Networks (2000), pp. 80-85.

Mukherjee, B.

J. Zhang and B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networks 18(2), 41-48 (2004).

Nejabati, C.

O'Mahony, M. J.

Qiao, C.

X. Liu and C. Qiao, “Survivable optical grids,” in Optical Fiber Communication Conference, and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWN1.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).

Sennhauser, U.

Sih, G. C.

G. C. Sih and E. A. Lee, “A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures,” IEEE Trans. Parallel Distrib. Syst. 4, 175-187 (1993).
[CrossRef]

Simeonidou, D.

Sinnen, O.

O. Sinnen and L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst. 16, 503-515 (2005).
[CrossRef]

O. Sinnen and L. A. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput. 30, 81-101 (2004).

Sousa, L.

O. Sinnen and L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst. 16, 503-515 (2005).
[CrossRef]

Sousa, L. A.

O. Sinnen and L. A. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput. 30, 81-101 (2004).

Sun, W.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Sun, W. Q.

Sun, Z.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).

Topcuoglu, H.

H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Trans. Parallel Distrib. Syst. 13, 260-274 (2002).
[CrossRef]

Tzanakaki, A.

Ullman, J. D.

J. D. Ullman, “NP-complete scheduling problems,” J. Comput. Syst. Sci. 10, 384-393 (1975).

Wang, Y.

Wang, Z.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, and C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol. (to be published).

Wirth, N.

N. Wirth, Algorithms and Data Structures (Oberon, 2004).

Wu, M. Y.

Y. Wang, Y. H. Jin, W. Guo, W. Q. Sun, W. S. Hu, and M. Y. Wu, “Joint scheduling for optical grid applications,” J. Opt. Netw. 6, 304-318 (2007).
[CrossRef]

H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Trans. Parallel Distrib. Syst. 13, 260-274 (2002).
[CrossRef]

Wu, M.-Y.

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Yang, T.

A. Gerasoulis and T. Yang, “A comparison of clustering heuristics for scheduling directed acyclic graphs onto multiprocessors,” J. Parallel Distrib. Comput. 16, 276-291 (1992).
[CrossRef]

Yuan, J.

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

Zervas, G.

Zhang, J.

J. Zhang and B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networks 18(2), 41-48 (2004).

Zhou, L.

Zhou, M.

J. Li, Y. Fan, and M. Zhou, “Performance modeling and analysis of workflow,” IEEE Trans. Syst. Man Cybern., Part A Syst. Humans 34, 229-242 (2004).

ACM Comput. Surv. (1)

F. C. Gartner, “Fundamentals of fault-tolerant distributed computing in asynchronous environments,” ACM Comput. Surv. 31, 1-26 (1999).

IEEE Commun. Surv. Tutorials (1)

A. Jukan and G. Karmous-Edwards, “Optical control plane for the grid community,” IEEE Commun. Surv. Tutorials 9, 30-44 (2007).

IEEE Networks (1)

J. Zhang and B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networks 18(2), 41-48 (2004).

IEEE Trans. Parallel Distrib. Syst. (3)

H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Trans. Parallel Distrib. Syst. 13, 260-274 (2002).
[CrossRef]

G. C. Sih and E. A. Lee, “A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures,” IEEE Trans. Parallel Distrib. Syst. 4, 175-187 (1993).
[CrossRef]

O. Sinnen and L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst. 16, 503-515 (2005).
[CrossRef]

IEEE Trans. Syst. Man Cybern., Part A Syst. Humans (1)

J. Li, Y. Fan, and M. Zhou, “Performance modeling and analysis of workflow,” IEEE Trans. Syst. Man Cybern., Part A Syst. Humans 34, 229-242 (2004).

J. Comput. Syst. Sci. (1)

J. D. Ullman, “NP-complete scheduling problems,” J. Comput. Syst. Sci. 10, 384-393 (1975).

J. Grid. Comput. (1)

S. Hwang and C. Kesselman, “A flexible framework for fault tolerance in the grid,” J. Grid. Comput. 1, 251-272 (2003).

J. Lightwave Technol. (3)

J. Opt. Netw. (1)

J. Parallel Distrib. Comput. (1)

A. Gerasoulis and T. Yang, “A comparison of clustering heuristics for scheduling directed acyclic graphs onto multiprocessors,” J. Parallel Distrib. Comput. 16, 276-291 (1992).
[CrossRef]

Parallel Comput. (1)

O. Sinnen and L. A. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput. 30, 81-101 (2004).

Other (4)

N. Wirth, Algorithms and Data Structures (Oberon, 2004).

B. Mikac and R. Inkret, “Availability model of WDM optical networks,” in Proceedings of the Second International Workshop on the Design of Reliable Communication Networks (2000), pp. 80-85.

W. Guo, Y. Jin, W. Sun, W. Hu, X. LinM.-Y. Wu, H. Liu, S. Fu, and J. Yuan, “Distributed computing over optical networks (invited paper),” in Optical Fiber Communication Conference and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWF1.

X. Liu and C. Qiao, “Survivable optical grids,” in Optical Fiber Communication Conference, and Exposition and the National Fiber Optics Engineers Conference, OSA Technical Digest (CD) (Optical Society of America, 2008), paper OWN1.

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.