Abstract

Optical grid systems have been viewed as a promising virtual computing environment to support distributed real-time directed acyclic graph (DAG) applications. For such a system involving many heterogeneous computing and network resources, faults seem to be inevitable. Therefore, a fault-tolerant DAG scheduling scheme is necessary to improve the performance of the optical grid system. However, existing joint task scheduling schemes for real-time DAG applications generally do not consider the availability issues when making scheduling decisions. We develop an availability-driven scheduling scheme that improves the DAG availability iteratively by allocating two copies of one communication task to two disjoint lightpaths for data transfer while satisfying application deadline requirements. Extensive simulation results demonstrate the effectiveness and the feasibility of the proposed scheduling scheme.

© 2010 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |
  • |

  1. D. Simeonidou, R. Nejabati, G. Zervas, D. Klonidis, A. Tzanakaki, M. J. O’Mahony, “Dynamic optical-network architectures and technologies for existing and emerging grid services,” J. Lightwave Technol., vol. 23, pp. 3347–3357, 2005.
    [CrossRef]
  2. W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.
  3. T. Lehman, J. Sobieski, B. Jabbari, “DRAGON: a framework for service provisioning in heterogeneous grid networks,” IEEE Commun. Mag., vol. 44, no. 3, pp. 84–90, Mar. 2006.
    [CrossRef]
  4. R. Medeiros, W. Cirne, F. Brasileiro, J. Sauvé, “Faults in grids: why are they so bad and what can be done about it?,” in Proc. of the 4th Int. Workshop on Grid Computing, Tokyo, Japan, 2003, pp. 18–24.
  5. Y. Wang, Y. H. Jin, W. Guo, W. Q. Sun, W. S. Hu, M. Y. Wu, “Joint scheduling for optical grid applications,” J. Opt. Netw., vol. 6, pp. 304–318, 2007.
    [CrossRef]
  6. Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol., vol. 26, pp. 3011–3020, 2008.
    [CrossRef]
  7. X. Liu, W. Wei, X. Yu, C. Qiao, T. Wang, “Distributed computing task assignment and lightpath establishment (TALE),” presented at the IEEE High Speed Networks Workshop, Anchorage, AK, May 11, 2007.
  8. X. Liu, C. Qiao, T. Wang, “Survivable optical grids,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWN1.
  9. Z. Sun, W. Guo, Y. Jin, W. Sun, W. Hu, “Fault-tolerant policy for optical network based distributed computing system,” presented at the IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid), Lyon, France, May 2008.
  10. W. Guo, Z. Liang, Z. Sun, S. Xiao, Y. Jin, W. Sun, W. Hu, “Task scheduling considering fault probability for distributed computing applications over an optical network,” J. Opt. Netw., vol. 7, pp. 947–957, 2008.
    [CrossRef]
  11. L. Song, B. Mukherjee, “Impacts of multiple backups and multi-link sharing among primary and backups for dynamic service provisioning in survivable mesh networks,” in Optical Fiber Communication Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., Anaheim, CA, 2007, paper OThJ3.
  12. J. Zhang, B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networking, vol. 18, no. 2, pp. 41–48, 2004.
    [CrossRef]
  13. O. Sinnen, L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 16, pp. 503–515, 2005.
    [CrossRef]
  14. O. Sinnen, L. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput., vol. 30, pp. 81–101, 2004.
    [CrossRef]
  15. T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT, 1990.

2008 (2)

2007 (1)

2006 (1)

T. Lehman, J. Sobieski, B. Jabbari, “DRAGON: a framework for service provisioning in heterogeneous grid networks,” IEEE Commun. Mag., vol. 44, no. 3, pp. 84–90, Mar. 2006.
[CrossRef]

2005 (2)

2004 (2)

O. Sinnen, L. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput., vol. 30, pp. 81–101, 2004.
[CrossRef]

J. Zhang, B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networking, vol. 18, no. 2, pp. 41–48, 2004.
[CrossRef]

Brasileiro, F.

R. Medeiros, W. Cirne, F. Brasileiro, J. Sauvé, “Faults in grids: why are they so bad and what can be done about it?,” in Proc. of the 4th Int. Workshop on Grid Computing, Tokyo, Japan, 2003, pp. 18–24.

Cirne, W.

R. Medeiros, W. Cirne, F. Brasileiro, J. Sauvé, “Faults in grids: why are they so bad and what can be done about it?,” in Proc. of the 4th Int. Workshop on Grid Computing, Tokyo, Japan, 2003, pp. 18–24.

Cormen, T. H.

T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT, 1990.

Fu, S.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Guo, W.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol., vol. 26, pp. 3011–3020, 2008.
[CrossRef]

W. Guo, Z. Liang, Z. Sun, S. Xiao, Y. Jin, W. Sun, W. Hu, “Task scheduling considering fault probability for distributed computing applications over an optical network,” J. Opt. Netw., vol. 7, pp. 947–957, 2008.
[CrossRef]

Y. Wang, Y. H. Jin, W. Guo, W. Q. Sun, W. S. Hu, M. Y. Wu, “Joint scheduling for optical grid applications,” J. Opt. Netw., vol. 6, pp. 304–318, 2007.
[CrossRef]

Z. Sun, W. Guo, Y. Jin, W. Sun, W. Hu, “Fault-tolerant policy for optical network based distributed computing system,” presented at the IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid), Lyon, France, May 2008.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Hu, W.

W. Guo, Z. Liang, Z. Sun, S. Xiao, Y. Jin, W. Sun, W. Hu, “Task scheduling considering fault probability for distributed computing applications over an optical network,” J. Opt. Netw., vol. 7, pp. 947–957, 2008.
[CrossRef]

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol., vol. 26, pp. 3011–3020, 2008.
[CrossRef]

Z. Sun, W. Guo, Y. Jin, W. Sun, W. Hu, “Fault-tolerant policy for optical network based distributed computing system,” presented at the IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid), Lyon, France, May 2008.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Hu, W. S.

Jabbari, B.

T. Lehman, J. Sobieski, B. Jabbari, “DRAGON: a framework for service provisioning in heterogeneous grid networks,” IEEE Commun. Mag., vol. 44, no. 3, pp. 84–90, Mar. 2006.
[CrossRef]

Jin, Y.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol., vol. 26, pp. 3011–3020, 2008.
[CrossRef]

W. Guo, Z. Liang, Z. Sun, S. Xiao, Y. Jin, W. Sun, W. Hu, “Task scheduling considering fault probability for distributed computing applications over an optical network,” J. Opt. Netw., vol. 7, pp. 947–957, 2008.
[CrossRef]

Z. Sun, W. Guo, Y. Jin, W. Sun, W. Hu, “Fault-tolerant policy for optical network based distributed computing system,” presented at the IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid), Lyon, France, May 2008.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Jin, Y. H.

Klonidis, D.

Lehman, T.

T. Lehman, J. Sobieski, B. Jabbari, “DRAGON: a framework for service provisioning in heterogeneous grid networks,” IEEE Commun. Mag., vol. 44, no. 3, pp. 84–90, Mar. 2006.
[CrossRef]

Leiserson, C. E.

T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT, 1990.

Liang, Z.

Lin, X.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Liu, H.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Liu, X.

X. Liu, C. Qiao, T. Wang, “Survivable optical grids,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWN1.

X. Liu, W. Wei, X. Yu, C. Qiao, T. Wang, “Distributed computing task assignment and lightpath establishment (TALE),” presented at the IEEE High Speed Networks Workshop, Anchorage, AK, May 11, 2007.

Medeiros, R.

R. Medeiros, W. Cirne, F. Brasileiro, J. Sauvé, “Faults in grids: why are they so bad and what can be done about it?,” in Proc. of the 4th Int. Workshop on Grid Computing, Tokyo, Japan, 2003, pp. 18–24.

Mukherjee, B.

J. Zhang, B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networking, vol. 18, no. 2, pp. 41–48, 2004.
[CrossRef]

L. Song, B. Mukherjee, “Impacts of multiple backups and multi-link sharing among primary and backups for dynamic service provisioning in survivable mesh networks,” in Optical Fiber Communication Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., Anaheim, CA, 2007, paper OThJ3.

Nejabati, R.

O’Mahony, M. J.

Qiao, C.

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol., vol. 26, pp. 3011–3020, 2008.
[CrossRef]

X. Liu, W. Wei, X. Yu, C. Qiao, T. Wang, “Distributed computing task assignment and lightpath establishment (TALE),” presented at the IEEE High Speed Networks Workshop, Anchorage, AK, May 11, 2007.

X. Liu, C. Qiao, T. Wang, “Survivable optical grids,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWN1.

Rivest, R. L.

T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT, 1990.

Sauvé, J.

R. Medeiros, W. Cirne, F. Brasileiro, J. Sauvé, “Faults in grids: why are they so bad and what can be done about it?,” in Proc. of the 4th Int. Workshop on Grid Computing, Tokyo, Japan, 2003, pp. 18–24.

Simeonidou, D.

Sinnen, O.

O. Sinnen, L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 16, pp. 503–515, 2005.
[CrossRef]

O. Sinnen, L. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput., vol. 30, pp. 81–101, 2004.
[CrossRef]

Sobieski, J.

T. Lehman, J. Sobieski, B. Jabbari, “DRAGON: a framework for service provisioning in heterogeneous grid networks,” IEEE Commun. Mag., vol. 44, no. 3, pp. 84–90, Mar. 2006.
[CrossRef]

Song, L.

L. Song, B. Mukherjee, “Impacts of multiple backups and multi-link sharing among primary and backups for dynamic service provisioning in survivable mesh networks,” in Optical Fiber Communication Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., Anaheim, CA, 2007, paper OThJ3.

Sousa, L.

O. Sinnen, L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 16, pp. 503–515, 2005.
[CrossRef]

O. Sinnen, L. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput., vol. 30, pp. 81–101, 2004.
[CrossRef]

Sun, W.

W. Guo, Z. Liang, Z. Sun, S. Xiao, Y. Jin, W. Sun, W. Hu, “Task scheduling considering fault probability for distributed computing applications over an optical network,” J. Opt. Netw., vol. 7, pp. 947–957, 2008.
[CrossRef]

Z. Sun, W. Guo, Z. Wang, Y. Jin, W. Sun, W. Hu, C. Qiao, “Scheduling algorithm for workflow-based applications in optical grid,” J. Lightwave Technol., vol. 26, pp. 3011–3020, 2008.
[CrossRef]

Z. Sun, W. Guo, Y. Jin, W. Sun, W. Hu, “Fault-tolerant policy for optical network based distributed computing system,” presented at the IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid), Lyon, France, May 2008.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Sun, W. Q.

Sun, Z.

Tzanakaki, A.

Wang, T.

X. Liu, C. Qiao, T. Wang, “Survivable optical grids,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWN1.

X. Liu, W. Wei, X. Yu, C. Qiao, T. Wang, “Distributed computing task assignment and lightpath establishment (TALE),” presented at the IEEE High Speed Networks Workshop, Anchorage, AK, May 11, 2007.

Wang, Y.

Wang, Z.

Wei, W.

X. Liu, W. Wei, X. Yu, C. Qiao, T. Wang, “Distributed computing task assignment and lightpath establishment (TALE),” presented at the IEEE High Speed Networks Workshop, Anchorage, AK, May 11, 2007.

Wu, M.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Wu, M. Y.

Xiao, S.

Yu, X.

X. Liu, W. Wei, X. Yu, C. Qiao, T. Wang, “Distributed computing task assignment and lightpath establishment (TALE),” presented at the IEEE High Speed Networks Workshop, Anchorage, AK, May 11, 2007.

Yuan, J.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

Zervas, G.

Zhang, J.

J. Zhang, B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networking, vol. 18, no. 2, pp. 41–48, 2004.
[CrossRef]

IEEE Commun. Mag. (1)

T. Lehman, J. Sobieski, B. Jabbari, “DRAGON: a framework for service provisioning in heterogeneous grid networks,” IEEE Commun. Mag., vol. 44, no. 3, pp. 84–90, Mar. 2006.
[CrossRef]

IEEE Networking (1)

J. Zhang, B. Mukherjee, “A review of fault management in WDM mesh networks: basic concepts and research challenges,” IEEE Networking, vol. 18, no. 2, pp. 41–48, 2004.
[CrossRef]

IEEE Trans. Parallel Distrib. Syst. (1)

O. Sinnen, L. Sousa, “Communication contention in task scheduling,” IEEE Trans. Parallel Distrib. Syst., vol. 16, pp. 503–515, 2005.
[CrossRef]

J. Lightwave Technol. (2)

J. Opt. Netw. (2)

Parallel Comput. (1)

O. Sinnen, L. Sousa, “List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures,” Parallel Comput., vol. 30, pp. 81–101, 2004.
[CrossRef]

Other (7)

T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms. Cambridge, MA: MIT, 1990.

L. Song, B. Mukherjee, “Impacts of multiple backups and multi-link sharing among primary and backups for dynamic service provisioning in survivable mesh networks,” in Optical Fiber Communication Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., Anaheim, CA, 2007, paper OThJ3.

X. Liu, W. Wei, X. Yu, C. Qiao, T. Wang, “Distributed computing task assignment and lightpath establishment (TALE),” presented at the IEEE High Speed Networks Workshop, Anchorage, AK, May 11, 2007.

X. Liu, C. Qiao, T. Wang, “Survivable optical grids,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWN1.

Z. Sun, W. Guo, Y. Jin, W. Sun, W. Hu, “Fault-tolerant policy for optical network based distributed computing system,” presented at the IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid), Lyon, France, May 2008.

W. Guo, Y. Jin, W. Sun, W. Hu, X. Lin, M. Wu, H. Liu, S. Fu, J. Yuan, “Distributed computing over optical networks,” in Optical Fiber Communications Conf. and Expo. and the Nat. Fiber Optic Engineers Conf., San Diego, CA, 2008, paper OWF1.

R. Medeiros, W. Cirne, F. Brasileiro, J. Sauvé, “Faults in grids: why are they so bad and what can be done about it?,” in Proc. of the 4th Int. Workshop on Grid Computing, Tokyo, Japan, 2003, pp. 18–24.

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1
Fig. 1

Illustration of a DAG application.

Fig. 2
Fig. 2

Optical grid system model.

Fig. 3
Fig. 3

ADS architecture.

Fig. 4
Fig. 4

ADS algorithm.

Fig. 5
Fig. 5

16-node NSFNET network topology.

Fig. 6
Fig. 6

Performance effects of different network topologies and available wavelength resources.

Fig. 7
Fig. 7

Performance impact of the CCR on (a) job completion time, (b) availability and CTPR, and (c) network resource utilization.

Fig. 8
Fig. 8

Performance impact of the deadline on (a) availability and CTPR and (b) network resource utilization

Tables (1)

Tables Icon

Table 1 Reference System Parameters

Equations (17)

Equations on this page are rendered with MathJax. Learn more.

t ea k , v j ( v i ) = { f ( v j ) if g ( v i ) = g ( v j ) t es ( e j i ) + c ( e j i ) BW otherwise } ,
Case 1 : t es ( e j i ) = max { t ea ( lp ( e j i ) ) , f ( v j ) } .
Case 2 : t es ( e j i ) = max { t ea ( lp ( e j i P ) ) , t ea ( lp ( e j i B ) ) , f ( v j ) } .
t ea k ( v i ) = max e j i E { t ea k , v j ( v i ) } .
t es ( v i ) = min g k r G { t es k ( v i ) } .
f ( v i ) = t es ( v i ) + c ( v i ) p ,
a j = MTTF MTTF + MTTR .
A e = { j P a j NPP A p + ( 1 A p ) × A b DPP } ,
AL ( DAG ) = e E w e A e .
Maximize AL ( DAG ) = e E w e A e
Subject to : SL ( DAG ) d .
BL ( v m ) = c ( v m ) + Max v n SUC ( v m ) [ BL ( v n ) + c ( e m n ) ] .
θ e = w e Δ A e Δ H ( p ) = w e × ( A e DPP A e NPP ) ( H ( p e DPP ) + H ( b e DPP ) H ( p e NPP ) ) .
O ( | V | log | V | + | E | ) + O { k | G | ( | V | + | E | O ( routing ) ) } .
CTPR ( DAG ) = | E ADS | | E | .
NRU = e i E l P ( e i ) BW ( λ l ( e i ) ) × [ ET ( e i ) ST ( e i ) ] all invovled l and λ BW ( λ l ) × SL .
D = ( F NPP + F DPP ) 2 .