Inter-data center interconnect with IP over elastic optical network (EON) is a promising scenario to meet the high burstiness and high-bandwidth requirements of data center services. In our previous work, we implemented multi-stratum resources integration among IP networks, optical networks and application stratums resources that allows to accommodate data center services. In view of this, this study extends to consider the service resilience in case of edge optical node failure. We propose a novel multi-stratum resources integrated resilience (MSRIR) architecture for the services in software defined inter-data center interconnect based on IP over EON. A global resources integrated resilience (GRIR) algorithm is introduced based on the proposed architecture. The MSRIR can enable cross stratum optimization and provide resilience using the multiple stratums resources, and enhance the data center service resilience responsiveness to the dynamic end-to-end service demands. The overall feasibility and efficiency of the proposed architecture is experimentally verified on the control plane of our OpenFlow-based enhanced SDN (eSDN) testbed. The performance of GRIR algorithm under heavy traffic load scenario is also quantitatively evaluated based on MSRIR architecture in terms of path blocking probability, resilience latency and resource utilization, compared with other resilience algorithms.
© 2015 Optical Society of America
Nowadays data center-supported services, such as cloud computing, have attracted much attention of service providers and network operators due to their rapid evolution in recent years . Since data center services are typically diverse in terms of required bandwidths and usage patterns, the network traffic shows high burstiness and high-bandwidth characteristics, which poses a significant challenge to the data center networking for more efficient interconnect with reduced latency and high bandwidth . The data center interconnect consists of two main networking scenarios, i.e., intra-data center and inter-data center scenarios. Intra-data center is used to offer fully connected all-to-all parallel interconnection amongst thousands of servers and racks in a flattened network topology. Different from intra-data center, inter-data center is designed to adaptively accommodate bursty and high capacity traffic between the data centers after converging the traffic into the core optical switch. To accommodate these applications in highly-available and energy-effective flexible connectivity in inter-data center scenario, the architecture of IP over elastic optical network (EON) has been widely studied as a way to construct economic networks, which reduces the pressure and cost of power-hungry core routers by bypassing them with elastic optical paths . This architecture can be achieved by the multi-flow transponder , which is proposed to offload the IP traffic into elastic optical paths at arbitrary bandwidth granularity because it can generate a wide variety and multiple optical paths in a single transponder by slicing the transponder into virtual sub-transponders. In order to support the operational flexibility, the IP over EON using multi-flow transponder can be used to provide this required interaction to the interconnection of geographically distributed data centers and provide the required bandwidth in a highly dynamic and efficient manner .
On the other hand, many delay-sensitive data center services require a high-level end-to-end quality of service (QoS) guarantees . If a node failure occurs in the IP over EON interconnecting data centers, the network restoration provided at multiple stratums (e.g., IP stratum and optical stratum) for the dynamic traffic will become much complex due to the involved multi-stratum resources integration (MSRI) . How to ensure the high-performance QoS of user demands after a failure deserves to be the research focus. So far, optical network survivability in the IP over optical network scenario has been well studied for the failure events [8–10]. Traditionally, the optical network restoration or protection algorithms and mapping are ensured by providing backup or new paths considering optical stratum resources. Once the edge optical node (i.e., the first optical node where the flow is offloaded into the optical network) breaks down, the traditional algorithms using the single optical stratum resources can be hard to provide protection and resilience. In such scenario, the traffic flow cannot be shifted from the IP router to EON. It means that the service is difficult to be restored just considering the optical stratum resources. However, the network resilience provided at multiple stratums for dynamic traffic in the inter-data center interconnect based on IP over EON scenario has not been addressed.
Moreover, the control of IP networks, optical networks and data centers is separately deployed in the inter-data center interconnect architecture usually. E.g., IP networks run the IP/MPLS control plane and the optical networks employ the control plane offered by generalized multi-protocol label switching (GMPLS) , while the cloud service providers operate OpenStack  platform to orchestrate data center resources. The service request can only be exchanged across the user-network interface (UNI) between different networks . The availability of comprehensive resource information together with a flexible and multi-stratum integrated control system becomes a key issue to enable end-to-end dynamic resilience and high-level performance requirement in case of a failure. Recent works [14, 15] demonstrated a centralized software control architecture, the software defined networking (SDN) enabled by OpenFlow protocol, can provide maximum flexibility for the operators and make a unified control over various resources for the joint optimization of functions and services with a global view [16–18]. Therefore, nowadays operators are trying to apply SDN/OpenFlow technique to globally control multi-stratum resources to realize resilience in such converged IP, optical and data center networks .
A multi-stratum resources integration among IP networks, optical networks and application stratums resources that allows to accommodate data center services has been already discussed in our previous works [6, 7] and . In view of this, this paper proposes a novel multi-stratum resources integrated resilience (MSRIR) architecture for the data center services in software defined inter-data center interconnect based on IP over EON. Additionally, a global resources integrated resilience (GRIR) algorithm for MSRIR is introduced based on the proposed architecture. Different from previous works which restore the services just exploiting the optical stratum, the proposed scheme provides resilience using the multiple stratums resources in case of the edge optical node failure. The MSRIR can enable joint optimization of IP network, EON and data center application stratums resources, and enhance the data center service resilience responsiveness to the dynamic end-to-end service demands. The overall feasibility and efficiency of the proposed architecture is experimentally verified on the control plane of our OpenFlow-based enhanced SDN (eSDN) testbed . The performance of GRIR algorithm under heavy traffic load scenario is also quantitatively evaluated based on MSRIR architecture in terms of path blocking probability, resilience latency and resource utilization, compared with other resilience algorithms.
The rest of this paper is organized as follows. Section 2 introduces the MSRIR architecture. The global resources integrated resilience algorithm under this network architecture is proposed in Section 3. The interworking procedure for the MSRIR with GRIR algorithm is described in Section 4. Finally, we describe the testbed and present the numeric results and analysis in Section 5. Section 6 concludes the whole paper by summarizing our contribution and discussing our future work on this area.
2. MSRIR architecture for software defined inter-data center interconnect
The multi-stratum resources integrated resilience (MSRIR) architecture can be implemented based on software defined inter-data center interconnect with IP over EON, which is designed to interwork and gather with multiple stratums resources in a control manner of open system. In this section, the main core and structure of the novel architecture are briefly pointed out. After that, the functional building blocks of controllers and coupling relationship between them in control plane are presented in detail.
2.1 MSRIR architecture for inter-data center interconnect based on IP over EON
The MSRIR architecture for software defined inter-data center interconnect based on IP over EON is shown in Fig. 1. The IP over EON can be used to interconnect geographically distributed data center networks in the proposed architecture, which mainly consists of three stratums: the IP network resources stratum, the EON resources stratum (e.g., spectral sub-carriers) and the application resources stratum (e.g., CPU and storage), as illustrated in Fig. 1. Each resource stratum is software-defined with OpenFlow protocol and controlled by an IP controller (IPC), an optical controller (OC) and an application controller (AC) respectively in a unified manner. To control the heterogeneous networks with extended OpenFlow protocol (OFP), OpenFlow-enabled IP router and elastic optical device nodes with OFP agent software are required, which are referred to as OF-router and software defined OTN (SD-OTN) and demonstrated in . In the standardization (e.g., ONF and IETF), there are two types of multi-domain or multi-layer policy control architecture to be discussed for multiple controllers’ connection. One is the hierarchical control, and the other is peering control. For the former, an orchestrator is used to cooperate multiple controllers in hierarchical control way. Comparing with the hierarchical architecture, the peering architecture is more scalable since the controllers only need to interact with their neighbors. It realizes more conveniently in the real environment. So, this paper adopts the peering control architecture in the experiment for simplicity. The proposed architecture emphasizes the cooperation among the IP, optical and application controllers, and it effectively realizes MSRIR to use the mixed IP and spectral path achieving joint and global optimization of cross stratum resources in case of the node failure. Note that, the path providing such service provisioning uses IP and optical resources through IP network and EON stratums, which is called mixed path (MP). Once received optical switch node failure information from the OC, the IPC is responsible for the analyzing it with flow resource status maintained and monitored in the IP stratum for MSRIR. The OC exploits optical network stratum resources abstracted from the physical network and performs accordingly resilient lightpath provisioning in EON. Meanwhile, AC is responsible for sustaining application stratum resources in data center servers for cross stratum optimization (CSO) . In case of an edge optical node failure, the MSRIR interacting among three controllers can provide recovery connectivity for the user to guarantee end-to-end QoS.
2.2 Functional models of MSRIR for software defined inter-data center interconnect
To achieve the architecture described above, the IP, optical and application controllers have to be extended in order to support the MSRIR functions. The functional building blocks of three controllers and the basic interactions among them are shown in Fig. 2. In the OC, the network abstraction module can abstract the required elastic optical resources, while the failure detection module interworks the information with SD-OTN periodically to perceive EON through extended OFP. In case of optical switch node failure, the failure detection module discovers it and deliveries such failure information to the MSRIR control. When the failure occurs at the edge optical node on the mixed IP and spectral path, the MSRIR control module decides to apply resilience algorithm associated with the IP stratum resources via IP-optical interface (IOI). In the IPC, the MSRIR agent can receive the request and forward it to flow monitor and estimation module in turn, which compiles the status of OF-routers via an OpenFlow module in the IP stratum. The MSRIR agent can perform global resources integrated resilience algorithm (which will be discussed in the next section) and decide the IP routing to offload the flow into the optical stratum partly using a mixed path, and then provide an MSRIR request to the OC through the IOI. The MSRIR control forwards the request to the PCE module in turn, eventually returning an MSRIR success reply containing the information of the provisioned lightpath and the abstracted optical resources. To conveniently perform the resilient path computation with the cross stratum optimization of optical and application stratum resources, the optical controller is interacted with application controller through optical-application interface (OAI). After receiving the application resources information from the AC via OAI, the end-to-end resilient path computation can be completed in PCE module considering CSO strategy of network and application resources, where the various computation strategies are alternative as a plug-in. The spectrum assignment for the computed path can be performed to provide the resilient lightpath by using extended OFP. Note that, the OFP agent software embedded in SD-OTN maintains optical flow table and models node information as software and maps the content to control the physical hardware . After the lightpath is setup successfully, the information of the path is conserved into database management and updated the results including an application utilization message to the AC via OAI. Moreover, the AC obtains data center resource information periodically or based on event-based trigger through an application resources monitor module. Note that, the VMWare software is deployed in data center in our experiment. We extend the OpenFlow protocol to invoke the API of VMWare to collect the data center resources, and control the resources migration. We also assume that network resources in data center are maintained and controlled by optical and IP controllers for architecture simplicity in inter-data center scenario, while application controller can monitor the servers through VMWare software in this work.
3. Global resources integrated resilience
3.1 Problem statement
In the proposed multiple stratums architecture, the node failure cases can be divided into three types generally, i.e., IP route node failure, optical switch node failure and data center server node failure. The routing tables at the routers are required to rely on the flooding to be updated, so that the services can be recovered through the IP stratum network directly after IP route failure occurs. The restoration issue in case of one data center or multiple data center domains, which is called disaster resilience, has been discussed in our previous work . The aforementioned two failure types are not involved in the scope of this study. For simplicity, this study considers the simple case of a single edge optical node failure inter-data center networks, while multiple node failure or mixed failures will be researched in the future works. If a failure occurs in an edge optical node (i.e., the first optical node where the flow is offloaded into optical network), the traffic cannot be shifted from the IP router to EON. It means the service is difficult to be restored just considering the optical stratum resources. If the service is carried by IP path (IP), the restoration of such service consumes extra resources in IP stratum, so that some IP router nodes are blocked due to the queue overflow when the network is loaded heavily. Note that IP network and EON carrying services have different advantages in various network traffic scenarios. Compared to the EON, IP network is more suitable for supplying low-bandwidth flows due to its flexibility and convenience of packet switching. Under heavy traffic load scenario, EON can offer highly-available, cost-effective and energy-effective connectivity by provisioning a sub-wavelength or super-wavelength level spectral path. Especially when parts of IP network nodes process the traffic flows busy in the queue, the services can be provided through optical bypass partially to take advantage of EON with large bandwidth. Thus, we study a novel global resources integrated resilience (GRIR) algorithm that is essential for the proposed architecture to support the restoration using the mixed path with both IP and EON stratum resources. Such path uses both IP and optical resources through IP network and EON stratums, which is called mixed path (MP). It performs more effective data center service provisioning, i.e., utilizing less resources and enhancing network performance. Figures 3(a)-3(b) show an example of GRIR scheme in IP over EON with a 4-node network. A path from source AI to destination DI is used to carry the service, and offloads flow from edge node AO. It means that the optical node AO and DO are the source and destination of related spectral path respectively. When the edge node AO breaks down, the traffic flow cannot be shifted from the IP router to EON in the traditional architecture. The new path AI-BI-CI-DI is rerouted only in the IP stratum according to the routing tables of the IP routers, as shown in Fig. 3(a). It follows that the process of CI is very busy in the queue and the other request from CI will be blocked, thus degrading the network performance. To solve such problem, GRIR algorithm uses the MP AI-BI-BO-CO-DO-DI with IP and optical resources to support the resilience, as shown in Fig. 3(b). The new BI with low resource usage in the IP stratum can be found to offload the traffic, while the corresponding edge node BO transfers the flow on the new lightpath. The new MP can enhance the network resource utilization with lower blocking effectively in the proposed algorithm.
3.2 Network modeling
We represent the software defined inter-data center network based on IP over EON as a weighted graph G (V, V’, L, L’, F, A), where V = and V’ = denote the set of OpenFlow-enabled optical and electrical switch nodes, respectively. L = and L’ = indicate the set of bi-directional fiber or cable links between nodes in V and V’ respectively. F = is the set of spectrum sub-carriers on each fiber link and A denotes the set of data center servers, while N, N’, L, L’, F and A represent the number of network nodes, the links, the spectral sub-carriers and data center nodes respectively. For each resilience request from source s to destination d, it can be translated into the needed network bandwidth b and needed application resources ar in the analysis of network model for simplicity. We denote the ith resilience request described above as, while Ri + 1 chronologically follows demand Ri. According to the request and status of resources, the suitable MP can be provided as the restoration based on the algorithm. In addition, some requisite notations and their definitions used in the study are listed in Table 1.
3.3 Global resources integrated resilience algorithm
In this study, we propose an auxiliary graph to implement GRIR algorithm according to its edge weights. An auxiliary graph, illustrated in Fig. 4(a), is constructed each time a new resilience request arrives. The nodes in each stratum of the auxiliary graph correspond to the nodes in the physical topology. The auxiliary graph is composed of the IP and optical stratums and three kinds of undirected edges, i.e., IP edges, transponder edges and spectrum edges.
There is an IP edge between two nodes in the IP stratum if there is an existing link between the two nodes in the physical network. To measure the processing capacity of IP router, we consider the congestion degree of the existing queue in the entrance port of router for assessing the workload of an IP router. Therefore, for the dynamic time-varying request, the IP edge weight between IP nodes i and j in recent time t0 is useful for assessing the IP router average recent occupation, which is expressed as the Eq. (1). Here, tc and indicate the current time and probability of occurrence of congestion degree , while the adjustable parameter k is used to normalize the IP edge weight.
A transponder edge between a node at the IP stratum and its corresponding node in the optical stratum represents an electrical-to-optical (E/O) or optical-to-electrical (O/E) conversion. Its edge weight evaluates the cost of E/O or O/E conversion . The optical stratum can be used to represent the spectrum resources, which is possibly used by new lightpaths that can carry the new resilience. Different from WDM network, to accommodate the new request successfully in EON, there must be at least b + B consecutive available sub-carriers in each fiber through the new lightpath, including the new resilience requested bandwidth b as well as a guard band B. Also, with the consideration of spectrum continuity constraint, the spectrum of the new lightpath must be continuous in all the fibers through the path. From optical link’s perspective, the number of possible spectrum allocation status using the sth sub-carrier for all service bandwidth demands is described as ms on link li,j, while the connecting bandwidth of each possible provided status is represented as bk. So, the average bandwidth of all probable allocation status using the sth sub-carrier on li,j uses the equation below, in which the value of indicates the consecutiveness degree of the sth sub-carrier to the adjacent available spectrum.
In addition, the number of sub-carrier occupation state change for neighboring sub-carriers on li,j, defined as , is useful to estimate the degree of the spectrum fragmentation on one link. Particularly, the higher degree of fragmentation means it is more difficult to search consecutive spectrum on the link. In order to assess the spectrum utilization, the spectrum edge weight considers consecutive and fragmented degree of spectrum as shown in what follows. Here, the adjustable parameter μ normalizes the spectrum edge weight.
The GRIR algorithm for software defined inter-data center interconnect in IP over EON that employs the auxiliary graph is described in Algorithm 1. In case of an edge optical node failure, the resilience request arrives at the network, a corresponding auxiliary graph is established according to current network state in recent time t0. Note that the edge weights are calculated following the above equations to reflect the network resources utilization. Based on the auxiliary graph, Dijkstra’s shortest path algorithm is computed from source to destination nodes in multiple stratums network to select the path candidate. If the chosen path is MP (i.e., go through IP and optical paths), it determines which optical node should be the new edge optical node to offload the traffic, and whether and how we should setup new lightpaths. The new lightpath can be established according to the selected spectrum edges including corresponding weight, and go through the fiber links with spectrum continuity constraint. The time complexity of the algorithm is analyzed in another work due to space limitation. An example auxiliary graph for a new resilience request from node AI to node DI is illustrated in Fig. 4(b). The weights of IP, transponder and spectrum edges are marked in the figure. When edge optical node AO failure occurs, the example resilience MP using auxiliary graph is derived by Dijkstra’s shortest path algorithm from node AI to node DI. This MP specifies that the new request can be carried on IP from AI to CI in IP stratum, and then accommodated using a new lightpath from node CO to DO which is established with the spectrum resources in EON. Note that the MP uses two transponder edges CI-CO and DO-DI, i.e., two additional transponders when setting up the new lightpath.
4. Interworking procedure for MSRIR
In this section, we illustrate the interworking procedure of MSRIR based on the GRIR algorithm described above. The resilient MP according to the GRIR algorithm can provide the data center service restoration for MSRIR. For the failure scenarios, a variety of complex failure cases will occur in the event of a failure, such as intra-domain/inter-domain link failure, network node failure, data center failure, mixed failure and so on. Here, we discuss the case of single EON node failure in the analysis of interworking procedure for simplicity. Figure 5 shows the detailed interworking procedure of MSRIR for dynamic data center service based on the proposed architecture in the typical failure scenario.
As shown in Fig. 5, for real-time monitoring and detecting EON, the OC sends optical flow monitor request to each SD-OTN using an extended features request message sent periodically through the OFP, while obtaining the status information with extended features reply message from them. The optical performance monitors are deployed to detect the optical network node failure correctly. In case there is a failure in edge optical switch node, the providing service from data center server will fail to be offered to user. Due to the monitor messages from the malfunctioning node, the OC receives the information report and finds this node failed. With the analysis in optical network stratum, the OC escalates the resilience to IP stratum for a possible change of MP and forwards this request to the IPC in turn. After the session establishment, the IPC receives the MSRIR request and chooses the new IP routing with GRIR algorithm to provide resilience, and then responds the MSRIR reply to the OC with the backup node information. After interworking the data center application resources with AC, the OC obtains the MSRIR reply and computes a resilient lightpath considering CSO of optical and application resources cooperated with AC, and then proceeds to set up an end-to-end elastic spectral path by controlling all corresponding SD-OTNs along the computed path by using OFP. When the OC receives setup success reply from the last SD-OTN, it responds the success reply to IPC with provisioning lightpath and abstracted optical resources information. After that, the IPC sends an MSRIR setup message to the router with buffered packet such that the flow is offloaded to the optical stratum for utilizing the multi-stratum resources effectively, while the application usage in AC can be updated to keep the synchronization by receiving the update message from IPC.
5. Feasibility verification and performance evaluation
To evaluate the feasibility and efficiency of the proposed architecture, we set up an IP over EON with data centers based on our OpenFlow-based enhanced SDN (eSDN) testbed, as shown in Fig. 6. We deploy control plane in the testbed due to the lack of hardware (e.g., multi-flow transponder). In , the experiment has been performed to demonstrate the GMPLS/PCE-controlled multi-flow optical transponders in flexi-grid networks. We develop software OFP agent to emulate the data plane to support the MSRIR with OFP. Also, the software agent is used to emulate the optical flow monitoring due to the lack of optical gears (e.g., optical performance monitors). Data center servers and the OFP agents are realized on an array of virtual machines created by VMware ESXi V5.1 running on IBM X3650 servers. The virtual OS technology makes it easy to set up the experiment topology based on the backbone topology of continental US (100 nodes, 171 links). For OpenFlow-based MSRIR control plane, OC server is assigned to support the proposed architecture and deployed by means of three virtual machines for MSRIR control, CSO & PCE strategy and network abstraction, while flow monitoring and MSRIR agent are deployed in IPC server. AC server is used as CSO agent to monitor the application resources from data center networks. Each controller server controls the corresponding resources, while database servers maintain traffic engineering database and related configuration. We deploy the service information generator related with AC, which implements batch data center services for experiments.
Based on the testbed described above, we have designed and verified experimentally MSRIR for data center service in IP over EON under typical node failure scenario. The experimental results are shown in Figs. 7(a) and 7(b). Figures 7(a) and 7(b) present the whole signaling procedure for MSRIR by using OFP through a Wireshark capture inserted in OC and IPC respectively. As shown in Fig. 7(a), 10.108.50.175, 10.108.49.178 and 10.108.51.2 denote the IP addresses of OC, IPC and AC respectively, while 10.108.48.182, 10.108.49.180 and 10.108.48.177 represent the IP addresses of related SD-OTNs respectively. 10.108.49.175 and 10.108.49.176 indicate the IP addresses of corresponding OF-routers as shown in Fig. 7(b). Note that existing OpenFlow messages have the original function, which are reused to simplify the implementation in this paper. The new messages types will be defined to support new functionalities in the future work. The features request message is responsible for monitoring by regularly querying related SD-OTNs about the current status. We assume the frequency of features request message is 5ms in the experiment. The OC receives the responses and obtains the information from SD-OTNs via features reply message. Due to the optical node failure, the OC analyzes in optical stratum and sends the request for MSRIR via UDP message. Here, we use UDP message to simplify the procedure and reduce the performance pressure of controllers. After receiving the request, the IPC performs GRIR algorithm to provide the resilience, and then responds the MSRIR reply via UDP. Then OC obtains the service usage of application resource through the interworking between the OC and AC via UDP message. After that, OC computes a path considering CSO of optical and application resources and then provisions a resilient spectral path to control all corresponding SD-OTNs along the computed path through flow mod massage. Receiving the setup success reply via packet in, OC responds the MSRIR success reply to IPC via UDP with provisioning spectral path information. Then the IPC sends flow mod message to offload the flow to the optical stratum by updating the flow entries of routers for utilizing the multi-stratum resources effectively. After that, IPC updates the application usage with UDP to keep the synchronization. The experimental results correspond to the procedures we depicted in Fig. 5.
We also adopt the backbone topology of continental US (shown in Fig. 8) to evaluate the performance and scalability of MSRIR based on GRIR algorithm under heavy traffic load scenario and compare it with the traditional CSO, SA-RIR and FES algorithms (are proposed in ) in terms of various failure rates through virtual machines. The traditional CSO algorithm carries the services with small bandwidth in the IP stratum, while optical network is used to accommodate the service with the large bandwidth (i.e., more than 10Gbps). The SA-RIR algorithm recovers the service with considering the congestion degree of router and differentiates the congestion degree into various ranks according to the preset thresholds. The FES algorithm can analyze the previous flow status, estimate the network condition and determine the new flow needs to be offloaded into optical network when the estimation exceeds the preset threshold. The flow requests to data center nodes are setup with bandwidth randomly distributed between 500Mbps and 100Gbps, while the needed application resource usage in data center is selected randomly from 0.1% to 1% for each application demand. The requests and node failures arrive at the network following a Poisson process and results have been obtained through the generation of 1 × 105 demands per execution. We assume the bandwidth of a sub-carrier is 12.5GHz, which is a typical value in EON. The congestion degree of the router port is set randomly from 1 to 5, while the probability of occurrence of each congestion degree is 20%. The k and μ are adjustable weight parameters between IP and spectrum edges, which can impact the results of routing via auxiliary graph. For simplicity, we set the values of k, μ and t0 as 0.5, 0.5 and 50ms in the simulation settings. To analyze the proposed algorithm in depth, we consider GRIR algorithm with different arrival rates of optical node failure events. When arrival rate of optical node failure events is 100 Erlang, the abbreviation of GRIR algorithm is referred to as GRIR-100. We also use GRIR-200 and GRIR-300 by parity of reasoning. In the emulation, we use these three typical values (i.e., GRIR-100, 200, 300) to compare the performance for simplify.
Figure 9(a) compares the path blocking probability among traditional CSO, SA-RIR and FES and GRIR algorithms in backbone topology of continental US. It can be seen clearly that GRIR algorithm achieves better path blocking probability values as compared to the other algorithms, especially when the network is heavily loaded. The reason is that, GRIR algorithm avoids much traffic to be transferred into the heavy loaded router, where lots of services may be blocked or lost due to the queue overflow. In addition, consecutive and fragmented degree of spectrum are also considered in the proposed algorithm using auxiliary graph. The selected path is more likely to be setup successfully during the spectrum allocation phase. Another phenomenon occurs that path blocking probability of GRIR algorithm increases when the arrival rates of optical node failure events rise (i.e., GRIR-100, 200, 300 Erlang). This is due to the fact that the resilience in the low failure rate can be provided with more available IP and optical stratum resources than that in high failure rate scenario. The comparisons on resource utilization among those algorithms are shown in Fig. 9(b). Resource utilization reflects the percentage of occupied resources to the entire IP network, EON and application resources. As shown in figure, GRIR algorithm can enhance the resource utilization remarkably compared to the other algorithms. This is justified by the fact that GRIR algorithm can optimize the multi-stratum resource at a finer granularity level and as such can yield higher resource efficiency. Figure 10(a) shows that GRIR outperforms other algorithms in the network cost, which is calculated according to the proportion of 1:2 using optical switch and IP router along the mixed path. The reason is much of them are saved by bypassing routers with elastic optical paths. The performances among those algorithms in terms of path resilience latency are compared in Fig. 10(b). The resilience latency includes three main parts other than the queue, which comprises the strategy processing time of controller, OFP propagation latency, IP router forwarding and optical transmission time. For the sake of observation and analysis, we just consider the IP router forwarding and optical transmission time in this work. In the simulations, the latency reflects the average resilience delay including IP router forwarding and optical transmission time. The GRIR algorithm significantly reduces the path resilience latency compared to other schemes. That is because GRIR algorithm can reduce the congestion in the queue of IP router by optical bypass. It allows saving a large amount of queuing delay. The other algorithms use IP resource with high priority, and thus lead to longer queueing delays. This phenomenon is more obvious under heavy traffic because more requests need to be queued and the times of CSO, SA-RIR and FES algorithms increase, which will augment the delay time.
To meet the QoS requirement of data center service resilience after an edge optical node failure, this paper presents a MSRIR architecture for software defined inter-data center interconnect based on IP over EON. Additionally, the GRIR algorithm is introduced for MSRIR in the proposed architecture, which considers the auxiliary graph to calculate mixed path for service resilience. The functional architecture and overall signaling procedure are described in this paper. The feasibility and efficiency of MSRIR is verified on our OpenFlow-based eSDN testbed built by control plane. We also quantitatively evaluate the performance of GRIR algorithm under heavy traffic load scenario in terms of path blocking probability, resource utilization, and resilience latency, and compare it with other relevant state of the algorithms. The results indicate that the MSRIR with GRIR algorithm can utilize multi-stratum resources effectively and enhance end-to-end resilience responsiveness of data center services, while leading to a reduced blocking probability coupled with cost savings.
Our future MSRIR work includes two aspects. One is to improve GRIR algorithm performance considering multiple node failure or mixed failures scenario. The other is to develop the new messages types to support new functionalities for MSRIR, and implement the network virtualization in inter-data center interconnect with IP over EON on our OpenFlow-based testbed.
This work has been supported in part by NSFC project (61271189, 61201154, 60932004), RFDP Project (20090005110013, 20120005120019), the Fundamental Research Funds for the Central Universities (2015RC15), and Fund of State Key Laboratory of Information Photonics and Optical Communications (BUPT).
References and links
1. M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity data center network architecture,” Comput. Commun. Rev. 38(4), 63–74 (2008). [CrossRef]
2. C. Kachris and I. Tomkos, “A survey on optical interconnects for data centers,” IEEE Comm. Surv. and Tutor. 14(4), 1021–1036 (2012). [CrossRef]
3. I. Tomkos, S. Azodolmolky, J. Sole-Pareta, D. Careglio, and E. Palkopoulou, “A tutorial on the flexible optical networking paradigm: state of the art, trends, and research challenges,” Proc. IEEE 102(9), 1317–1337 (2014). [CrossRef]
4. M. Jinno, H. Takara, Y. Sone, K. Yonenaga, and A. Hirano, “Multiflow optical transponder for efficient multilayer optical networking,” IEEE Commun. Mag. 50(5), 56–65 (2012). [CrossRef]
5. T. Tanaka, A. Hirano, and M. Jinno, “Advantages of IP over elastic optical networks using multi-flow transponders from cost and equipment count aspects,” Opt. Express 22(1), 62–70 (2014). [CrossRef] [PubMed]
6. H. Yang, J. Zhang, Y. Zhao, Y. Ji, H. Li, Y. Lin, G. Li, J. Han, Y. Lee, and T. Ma, “Performance evaluation of time-aware enhanced software defined networking (TeSDN) for elastic data center optical interconnection,” Opt. Express 22(15), 17630–17643 (2014). [CrossRef] [PubMed]
7. H. Yang, Y. Zhao, J. Zhang, S. Wang, W. Gu, Y. Ji, J. Han, Y. Lin, and Y. Lee, “Multi-stratum resource integration for OpenFlow-based data center interconnect [Invited],” J. Opt. Commun. Netw. 5(10), A240–A248 (2013). [CrossRef]
8. B. Guo, S. Huang, P. Luo, H. Huang, J. Zhang, and W. Gu, “Dynamic survivable mapping in IP over WDM network,” J. Lightwave Technol. 29(9), 1274–1284 (2011). [CrossRef]
9. S. Zhang, C. Martel, and B. Mukherjee, “Dynamic traffic grooming in elastic optical networks,” J. Sel. Areas Commun. 31(1), 4–12 (2013). [CrossRef]
10. S. Huang, B. Guo, X. Li, J. Zhang, Y. Zhao, and W. Gu, “Pre-configured polyhedron based protection against multi-link failures in optical mesh networks,” Opt. Express 22(3), 2386–2402 (2014). [CrossRef] [PubMed]
11. K. Kumaki, ed., “Interworking requirements to support operation of MPLS-TE over GMPLS networks,” IETF RFC 5146 (2008). http://tools.ietf.org/html/rfc5146
12. T. Szyrkowiec, A. Autenrieth, P. Gunning, P. Wright, A. Lord, J. P. Elbers, and A. Lumb, “First field demonstration of cloud datacenter workflow automation employing dynamic optical transport network resources under OpenStack and OpenFlow orchestration,” Opt. Express 22(3), 2595–2602 (2014). [CrossRef] [PubMed]
13. S. Das, G. Parulkar, and N. McKeown, “Why OpenFlow/SDN can succeed where GMPLS failed,” in Proceedings of European Conference on Optical Communication (ECOC 2012), (Optical Society of America, 2012), paper Tu.1.D.1. [CrossRef]
14. L. Liu, W. R. Peng, R. Casellas, T. Tsuritani, I. Morita, R. Martínez, R. Muñoz, and S. J. B. Yoo, “Design and performance evaluation of an OpenFlow-based control plane for software-defined elastic optical networks with direct-detection optical OFDM (DDO-OFDM) transmission,” Opt. Express 22(1), 30–40 (2014). [CrossRef] [PubMed]
15. M. Channegowda, R. Nejabati, M. Rashidi Fard, S. Peng, N. Amaya, G. Zervas, D. Simeonidou, R. Vilalta, R. Casellas, R. Martínez, R. Muñoz, L. Liu, T. Tsuritani, I. Morita, A. Autenrieth, J. P. Elbers, P. Kostecki, and P. Kaczmarek, “Experimental demonstration of an OpenFlow based software-defined optical network employing packet, fixed and flexible DWDM grid technologies on an international multi-domain testbed,” Opt. Express 21(5), 5487–5498 (2013). [CrossRef] [PubMed]
16. F. Paolucci, F. Cugini, N. Hussain, F. Fresi, and L. Poti, “OpenFlow-based flexible optical networks with enhanced monitoring functionalities,” in Proceedings of European Conference and Exhibition on Optical Communications (ECOC 2012), (Optical Society of America, 2012), paper Tu.1.D.5. [CrossRef]
17. L. Liu, R. Muñoz, R. Casellas, T. Tsuritani, R. Martínez, and I. Morita, “OpenSlice: an OpenFlow-based control plane for spectrum sliced elastic optical path networks,” in Proceedings of European Conference on Optical Communication (ECOC 2012), (Optical Society of America, 2012), paper Mo.2.D.3. [CrossRef]
18. R. Muñoz, R. Casellas, R. Martínez, and R. Vilalta, “Control plane solutions for dynamic and adaptive flexi-grid optical networks,” in Proceedings of European Conference on Optical Communication (ECOC 2013), (Optical Society of America, 2013), paper We.3.E.1.
19. H. Yang, Y. Zhao, J. Zhang, J. Wu, J. Han, Y. Lin, Y. Lee, and Y. Ji, “Multi-stratum resilience with resources integration for software defined data center interconnection based on IP over elastic optical networks,” in Proceedings of European Conference on Optical Communication (ECOC 2014), (Optical Society of America, 2014), paper Tu.1.6.5. [CrossRef]
20. R. Martínez, R. Casellas, R. Vilalta, and R. Muñoz, “Experimental assessment of GMPLS/PCE-controlled multi-flow optical transponders in flexgrid networks,” in Proceedings of Optical Fiber Communication Conference (OFC 2015), (Optical Society of America, 2015), paper Tu2B.4. [CrossRef]