Experimental assessment of dynamic integrated restoration in GMPLS multi-layer (MPLS-TP/WSON) networks

Ricardo Martínez; Ramon Casellas; Raül Muñoz

doi:10.1364/OE.21.005481

1. Introduction

The integration of both packet switching (Multi-Protocol Label Switching – Transport Profile, MPLS-TP) and optical circuit switching (Wavelength Switched Optical Networks, WSON) in a multi-layer network (MLN) is a candidate solution to attain a more cost/energy-efficient and scalable transport network infrastructure offloading the IP convergence layer. To fully leverage the benefits of combining both layers (i.e., MPLS-TP bandwidth flexibility/granularity and WSON transport capacity) a GMPLS unified control plane (UCP) is adopted. The UCP allows deploying MLN Traffic Engineering (TE) strategies [1]. The TE functions are the processes for optimizing the performance of a telecommunications network by dynamically analyzing, predicting and regulating the traffic flows being transported throughout the network. In general, these functions are conceived to attain the most optimal/efficient use of the global network resources. In a MLN context, the purpose of TE strategies (grooming [1]) is to merge/group low data rate and flexible higher-layer connections (e.g., Packet Switch Capable, PSC connections) with small bandwidth requirements into new or already established high data rate lower-layer connections with coarse bandwidth (i.e., Lambda SC, LSC tunnels). In other words, network resources at the lower-layer LSPs (e.g., bandwidth, ports) can be efficiently reused when dynamically provisioning and restoring higher-layer LSPs. To this end, each node has a unified vision of the topology and resources in all the layers, gathered in the TE Database, TED.

MLN must be designed to be fault-tolerant [2]. Survivability schemes such as protection and restoration are crucial to rapidly and efficiently recover disrupted LSPs. In this work, we focus on dynamic, coordinated MLN GMPLS-based restoration. In the literature, restoration techniques are generally classified as uncoordinated, sequential, and integrated [2,3]. The uncoordinated scheme is the simplest one in terms of control complexity, where each layer deploys its own recovery strategy with no cooperation with other switching layers. On the other hand, the amount of network resources used for restoring the affected connection services may be huge. In the sequential scheme, the restoration strategy is transferred to the next layer once the current layer is unable to restore. Two approaches exist: bottom-up, starting at the lowest layer and escalating upwards, and top-down, starting at the highest layer and going downwards. Finally, the integrated restoration uses the global UCP vision considering all the involved layers, i.e., without performing independent restoration on each layer. This attains the most efficient use of the network resources at the expenses of increasing the complexity.

Despite several works have proposed efficient MLN restoration strategies [2,3], the restoration performance within an experimental GMPLS UCP is nonexistent. Thus, the objective of this work is twofold: firstly, to present the GMPLS UCP and path computation algorithm for the integrated restoration in MLN; secondly, to experimentally evaluate its performance within the CTTC ADRENALINE testbed in terms of blocking, restoration time and restorability.

2. GMPLS UCP for integrated restoration

Figure 1 shows the sequence of GMPLS control functions and exchanged messages to support dynamic and automatic MLN restoration. The aim is to restore MPLS-TP (PSC) LSPs set up over optical (LSC) tunnels after an optical link failure occurs.

Fig. 1 Multi-layer control plane restoration sequence.

Download Full Size | PDF

Let us assume that there is a working PSC LSP (w_PSC_LSP) connecting nodes 1 and 12. This packet LSP was set up through the path formed by the nodes 1-11-8-7-12. Consequently, this is a MLN path having a change of layer on links 1-11 (from PSC to LSC) and 7-12 (from LSC to PSC). Accordingly, before setting up the targeted PSC LSP, a lower-layer LSC LSP (w_LSC_LSP) was established between the MPLS-TP end-nodes. This, in turn, induced a logical, higher-layer TE link (Forwarding Adjacency, FA [4]) which will be used to route the demanded w_PSC_LSP.

The induced FA TE link is a logical link where most of its TE attributes are inherited from its associated (lower-layer) optical FA LSP. For example, the unreserved bandwidth of a PSC FA TE link is set to the data rate supported by the allocated DWDM channel. Additionally, the shared risk link group (SRLG) attribute, used for ensuring SRLG-disjointness between working and restoration paths, is generally set to the union of all the SLRGs tied to the underlying optical links traversed by the working LSC LSP. Hence, disjoint restoration path computation with respect to the working path can be accomplished through applying SRLG-diversity during the path computation.

Assuming the scenario shown in Fig. 1, the following control plane functions enable the automatic restoration of the affected w_PSC_LSP after an optical/physical failure occurs in the link connecting node 8 and 7. Figure 2 shows the message capturing of the routing OSPF-TE and signaling RSVP-TE protocols at node 1 (Label Switched Router 1, LSR1). For the sake of clarification, as depicted in Fig. 2, the RSVP-TE messages are routed according to the node ID IP addresses (i.e., 10.0.50.X). This Node ID information is conveyed in the corresponding protocol objects (e.g., OSPF-TE Router Address TLV, RSVP-TE HOP, etc.). Additionally, the OSPF-TE protocol uses the IP addresses associated to the control channels to route the control protocol messages. 10.0.5.13 and 10.0.5.31 are the IP addresses of the control channel between LSR1 and LSR11 (at the control plane level, each pair of LSR is connected through an IP control channel identified by a pair of IP addresses.). Finally, as described in [5], 224.0.0.5 corresponds to the OSPF multicast IP address allowing each LSR running OSPF to receive packets sent to this address.

Fig. 2 Exchanged OSPF-TE / RSVP-TE packets for MLN restoration at LSR1; time (seconds) is the elapsed time with respect to the reference time at Notify message.

Download Full Size | PDF

a. After the failure is detected, the upstream node (LSR8) adjacent to the link failure must convey the notification process, that is, for each disrupted optical connection or LSC LSP, a RSVP-TE Notify message is sent to the ingress node of such optical tunnels, carrying specific link failure information (failed link 8-7). This step is crucial to allow the restoration process computing a path disjoint to the failed link. Furthermore, an OSPF-TE link state update (LS_Upd1) needs to be flooded in order to update nodes TED about the failed link.
b. Once the Notify message reaches the LSC LSP ingress node (LSR1), a break-before-make strategy is applied, where the failed optical tunnel (w_LSC_LSP) is firstly torn down using the RSVP-TE Path Tear message. By doing so, the wavelength channels occupied on the links traversed by the failed connection are released. Consequently, the TE link attributes (e.g., bandwidth and wavelength channel status) associated to those links is updated (i.e., LS_Upd2, LS_Upd3 and LS_Upd4).
At this point, the logical (FA) link associated to the removed optical tunnel is no longer usable. In other words, for the subsequent connection requests, the path computation process cannot use such a link, since its underlying optical connection was eliminated. To this end, the TE information (OSPF-TE Link State Advertisement, TE LSA) associated to the blocked link needs to be immediately flushed / removed from the nodes TED repository. This is achieved through flooding the LS_Upd3 message, where the carried TE LSA explicitly indicates its removal from the TED at the time of being processed at every control plane instance.
After removing the blocked logical FA TE link, it is worth noting that the PSC LSPs routed over such a failed logical FA TE link will need to be restored. Thereby, additional RSVP-TE Notify messages are sent to the PSC LSP ingress nodes of each interrupted PSC LSP, notifying the affected PSC (FA) link. In the example, LSR1 acts as both the ingress of the PSC LSP and the origin of the affected PSC (FA) link. Next, the existing w_PSC_LSPs routed over the blocked FA TE link are torn down.
c. At this step, the integrated restoration path computation is triggered at the ingress LSR1 to set up the restoration of the interrupted PSC LSPs (r_PSC_LSP). The input for the restoration path computation is the topology graph constructed from the gathered TED as well as the information carried into the received Notify message (i.e., blocked FA TE link).
For the blocked logical link, it is firstly retrieved the SRLGs of the associated optical tunnel (i.e., SRLG_A, SRLG_C, SRLG_E and SRLG_H), which then are used to ensure the SRLG-disjointness between working and restoration paths. Observe that, according to this method, not only the failed optical link (i.e., link 8-7) will be excluded, but also the rest of optical (not failed) links forming the disrupted optical tunnel that induced the failed PSC FA TE link. Obviously, discarding usable (i.e., not affected by the failure) links systematically may lead to attain suboptimal restoration (i.e., poor use of the network resources and restorability). The advantage of doing this, however, is that the restoration delay, being critical for network operators, is minimized since there is no need to localize the failure before starting the restoration process. It is worth noting that the failure localization process is complex and time-consuming in transparent WSON due to the propagation of errors.
In the example, the restoration path is formed by the nodes 1-11-9-7-12 (i.e., SRLG_B, SRLG_D, SRLG_F and SRLG_G) fulfilling the SRLG-diversity constraint. Observe that such a computed restoration path encompasses again a change of layers. Consequently, it is necessary to firstly, set up the optical tunnel (r_LSC_LSP) along with updating the link state (i.e., LS_Upd6, LS_Upd7, LS_Upd8 and LS_Upd9) of each link constituting the route; and secondly, to create and disseminate the new induced PSC FA TE link (LS_Upd10) between LSR1-12.
d. Finally, the r_PSC_LSP is established though the new PSC FA TE link whose available bandwidth is then updated (LS_Upd11).

3. MLN working / restoration path computation

We consider an on-line MLN path computation algorithm for both working and restoration of PSC LSPs. It relies on a constrained shortest path [6] which considers all the (physical / logical) optical and packet links in an integrated way. It aims at fully leveraging the grooming opportunities between the source and the destination nodes satisfying the following set of constraints:

• An LSP must be initiated and terminated on the same switching capability [4].
• A path may traverse one or more lower switching layers but the correct adaptation among them must be guaranteed following the GMPLS hierarchy [7].
• The eligible and candidate TE links to form the selected path must have unreserved bandwidth equal or larger than the bandwidth demanded by the LSP request.
• For the restoration path, once the SRLGs of the excluded link are resolved the computed path excludes any link tied to these SLRGs.

For both working and restoration paths, all the optical sub-paths constituting the route are subject to the wavelength continuity constraint, which is addressed by RSVP-TE signaling.

4. Experimental performance assessment

The GMPLS/PCE control plane platform of the CTTC ADRENALINE testbed is used for the experimental performance assessment. Figure 3 depicts the MLN topology formed by 7 PSC and 7 LSC LSRs. Each PSC LSR is physically connected to the associated LSC LSR with p bidirectional ports. The bidirectional optical links support 8 WDM channels per direction operating at 10 Gbps. We consider a dynamic stochastic model for both PSC LSP requests and link failure events. The arrival process of the packet connection (i.e., PSC LSP) requests and failure events is Poisson, and the holding times of the LSPs / duration of the failures follow a negative exponential distribution. The total offered traffic load is fixed to 25 Erlangs (Er) with an average inter-arrival time set to 10s and holding time to 250s. Such traffic is uniformly distributed among the PSC LSRs where each PSC LSP requests 1 Gbps. The average failure inter-arrival time (fiat) is set to 135s and 365s, and the mean repair time set (mrp) to 50s. Link failures are uniformly distributed exclusively among the optical (LSC-LSC) TE links. Each performance value is obtained requesting 5k PSC LSPs. For the sake of completeness, the propagation delay of the control infrastructure (i.e., channels) used for exchanging the OSPF-TE and RSVP-TE messages is set to a value ranging from 0 to 6 ms.

Fig. 3 Physical multi-layer network topology.

Download Full Size | PDF

The performance metrics of interest are: for the working path, the path computation time, the setup delay (contribution from both path computation time and end-to-end signaling process), and the connection blocking probability (BP); for the restoration path, the restoration path computation time, the restoration time and the restorability. The latter is defined as the ratio between the successfully restored PSC LSPs and the total failed PSC LSPs.

Table 1 gathers the performance metric results using different number of ports (p: 2..4). We observe that for the working paths, in general, as the number of PSC - LSC ports / transceivers increases, the BP is reduced (e.g., 0.12% for p: 2, 0.02% for p: 3 and 0% for p: 4). Indeed, a higher number of ports increases the likelihood of finding a feasible route as well as increases the exploitation of the grooming opportunities appearing at the intermediate nodes. This leads to attain a more efficient use of the network resources at the expenses, however, to consider more physical and logical links when computing the path. Consequently, the required time for the execution of the path computation time is increased.

Table 1. Performance metrics.

View Table

A similar behavior is attained when restoring failed PSC LSPs: more available ports enhance the success of the overall path computation (grooming) which, in turn, does improve the overall restorability (e.g., 42,1% with p: 2 and 60.3% with p: 4 at fiat: 135s). Note that this is a critical aspect since, as mentioned above, the adopted restoration path computation excludes systematically all the underlying SRLGs of the failed PSC LSPs regardless of the actual failed physical link. Consequently, the higher the routing space solution, the easier the restoration path computation. In other words, with more physical and logical links in the constructed graph, the restoration path computation is more able to finding a path dealing with the required SRLG-disjointness restriction. In this context, some implications are observed: firstly, the restoration path computation time is increased; secondly, the restoration time is slightly decreased as the number of ports grows. Indeed, a larger number of ports promote the reuse of established logical links (grooming). This, besides favoring the restorability, reduces the occupation of new (optical) resources when serving new connection requests. Consequently, more direct logical links are used which does lower both the signaling processing and the restoration time.

Last but not least, it is worth observing that the restoration time increases around 25-34 ms with respect to the working setup delay. Such a significant increase is due to three factors: the notification process, the restoration path computation and the signaling process. The process of notifying each source node of the failed optical connection affects the overall restoration time. On the other hand, from Table 1, we observe that the restoration path computation is, on average, more time consuming compared to the computation of the working path. The reason behind this is the application of the SRLG-disjointness constraint, which increases the time to find a feasible path. Finally, the computed restoration paths tend to traverse larger routes in hops than the working paths in order to exclude a failed link.

5. Conclusions

This work summarizes the deployed integrated restoration (GMPLS UCP and path computation) within MLN. This is experimentally assessed under dynamic traffic and failure generation using different number of PSC-LSC ports in terms of the blocking probability, path computation time, restorability and restoration time. As expected, the number of ports impacts significantly on the exploitation of the grooming, which does reduce the overall BP of working paths and increase the restorability when such working LSPs are disrupted.

The deployment and use of the presented GMPLS-based integrated restoration scheme to nationwide networks (e.g., comprising hundreds of nodes and serving a large number of concurrent connections) basically depends on the dynamicity of both the connection demands (i.e., LSP inter-arrival and holding times) and the failure occurrences (fiat and mrp). It is expected that, even within large network scenarios, the performance of the path computation, working setup delay and restoration times is not significantly impacted as long as the connection inter-arrival time and failure events occur at the same time scale (i.e., seconds), and the network has been correctly dimensioned.

Acknowledgments

Spanish MINECO through the project DORADO (TEC2009-07995), and by the EC’s FP7 through the IP STRONGEST (247674).

References and links

1. E. Oki, K. Shiomoto, D. Shimazaki, N. Yamanaka, W. Imajuku, and Y. Takigawa, “Dynamic multilayer routing schemes in GMPLS-based IP+optical networks,” IEEE Commun. Mag. 43(1), 108–114 (2005). [CrossRef]

2. P. Chołda and A. Jajszczyk, “Recovery and its quality in multilayer networks,” J. Lightwave Technol. 28(4), 372–389 (2010). [CrossRef]

3. X. Cui, J. Wang, X. Yao, W. Liu, H. Xie, and Y. Li, “Optimization of multilayer restoration and routing in IP-over-WDM networks,” in National Fiber Optic Engineers Conference (NFOEC), NWD3 (2008).

4. K. Shiomoto, D. Papadimitriou, J. L. Le Roux, M. Vigoureux, and D. Brungard, “Requirements for GMPLS-based multi-region and multi-layer networks (MLN/MRN),” IETF RFC 5212 (2008), http://tools.ietf.org/html/rfc5212

5. J. Moy, “OSPF Version 2,” IETF RFC 2328 (1998), http://www.ietf.org/rfc/rfc2328.txt

6. R. Martinez, R. Casellas, and R. Muñoz, “Experimental validation / evaluation of a GMPLS unified control plane in multi-Layer (MPLS-TP/WSON) networks,” in National Fiber Optic Engineers Conference (NFOEC), NTu2J (2012).

7. K. Kompella and Y. Rekhter, “Label switched paths (LSP) hierarchy with generalized multi-protocol label switching (GMPLS) traffic engineering (TE),” IETF RFC 4206 (2005), http://tools.ietf.org/html/rfc4206

		Working			Restoration
Traffic;p	Fail. Gen. mrp: 50s	Path Comp. (ms)	Setup Delay (ms)	BP (%)	Rest. Path Comp. (ms)	Rest. Time (ms)	Restora-bility (%)
25 Er; p: 2	fiat: 135s	1.2	60.4	0.12	3.6	94.1	42.1
25 Er; p: 2	fiat: 365s	1.1	59.3	0.12	3.3	89.7	48.1
25 Er; p: 3	fiat: 135s	2.3	54.9	0.02	5.1	84.1	56.1
25 Er; p: 3	fiat: 365s	2.2	54.1	0.02	5.1	82.2	60.8
25 Er, p: 4	fiat: 135s	3.4	54,2	0	6.8	81.3	60.3
25 Er, p: 4	fiat: 365s	3.3	54,2	0	6.9	79.9	62.4

Experimental assessment of dynamic integrated restoration in GMPLS multi-layer (MPLS-TP/WSON) networks

Abstract

1. Introduction

2. GMPLS UCP for integrated restoration

3. MLN working / restoration path computation

4. Experimental performance assessment

5. Conclusions

Acknowledgments

References and links

Cited By

Figures (3)

Tables (1)

Optics Express