Towards a carrier SDN: an example for elastic inter-datacenter connectivity

L. Velasco; A. Asensio; J.L. Berral; A. Castro; V. López

doi:10.1364/OE.22.000055

1. Introduction

Cisco global cloud index [1] forecasts datacenter (DC) traffic to quadruple over the next years, reaching 554 EB per month by 2016. Two main components of traffic leaving DCs can be distinguished: traffic among DCs (DC2DC) and traffic between DCs and end users (DC2U). The former includes database (DB) synchronization among replicated services and virtual machine (VM) migration to manage the cloud elastically, whilst the latter is associated to applications, such as web, email, video-on-demand, etc.

DC operations are commonly automated using some cloud middleware which applies scheduled-based algorithms to optimize some utility function ensuring quality of service, service availability and operational costs (e.g. energy consumption). The outcome of those algorithms is the set of VMs to be activated or stopped in each DC, or to be migrated from a DC to a remote one, as well as the set of DBs to be synchronized. As a result, the connectivity required between two DCs highly varies along the day, presenting dramatic differences in an hourly time scale as a consequence of the huge amount of raw data being transferred.

We propose to use Flexgrid-based optical network to interconnect datacenters [2], since that technology provides finer and multiple granularity in comparison to traditional wavelength switched optical networks (WSON), based on the wavelength division multiplexing (WDM) technology [3]. In flexgrid optical networks the available optical spectrum, e.g. C-band, is divided into frequency slices of fixed spectrum width, e.g. 6.25 GHz. Optical connections can be allocated into a variable number of these slices, which is a function of the requested capacity, the modulation technique applied, and the slice width. In addition, the capacity of optical connections can be dynamically increased or decreased to accommodate spikes by allocating or releasing slices provided that all spectra allocated to an optical connection remain contiguous [4].

Dynamic and elastic connections can be requested to an Application-Based Network Operations (ABNO) controller [5]. The ABNO controller delegates some functions in other entities, such as the provisioning process, which is done by an active stateful PCE [6]. However, competition for network resources could lead to connections capacity being reduced or even blocked at requesting time, resulting in a poor cloud performance. Therefore, cloud middleware can either perform connection request retries, similar to I/O pooling (software-driven I/O) in computers, to increase the bitrate of already established connections or set-up new ones.

2. Carrier software defined network (SDN)

To provide an abstraction layer to the underlying network, a new stratum on top of the ABNO, the carrier SDN, could be deployed. Figure 1 shows a carrier SDN controller implementing a northbound interface to request transfer operations. Those applications’ operations are transformed into network connection requests; the northbound interface uses application-oriented semantic, liberating application developers from understanding and dealing with network specifics and complexity.

Fig. 1 Current control architecture where the applications request connections to ABNO (a). Proposed control architecture where carrier SDN offers an application-oriented semantic interface (b).

Download Full Size | PDF

As an example of that paradigm, in this paper we propose an operation where cloud middleware requests transfers using its native semantic, i.e. amount of data to be transferred, DC destination, completion time, etc. The SDN controller is in charge of managing inter-DC connectivity; if not enough resources are available at requesting time, notifications (similar to interruptions in computers) are sent from the ABNO to the SDN controller each time specific resources are released. Upon receiving a notification, the SDN controller takes decisions on whether to increase the bitrate associated to a transfer. Therefore, we have effectively moved from pooling to a network-driven transfer mode.

3. Scenario

We consider a scenario where a flexgrid-based network is used to interconnect federated DCs. A follow-the-work strategy for VM migration was implemented; VMs are moved to DCs closer to the users, reducing the user-to-service latency. Cloud scheduling algorithms run periodically taking VM migration decisions. Once the set of VM to be migrated and DB synchronization needs are determined, cloud management performs those operations in collaboration with inter- and intra- DC networks.

Since our scheduling algorithms run periodically each hour, VM migrations are required to be performed within each period. In fact, the shorter the transfer time, the better the offered network service is, having the VMs in their proper locations earlier.

As illustrated in Fig. 2, carrier SDN is deployed between the ABNO controller and DC middleware and implements two connectivity models: i) application-driven, and ii) network-driven model.

Fig. 2 Carrier SDN implementing transfer operations for the application layer and request/response and notifications towards the network.

Download Full Size | PDF

In the application-driven model, each local cloud middleware manages connectivity to remote DCs so as to perform VM migration in the shortest total time. The source cloud manager requests Label Switched Paths (LSP) set-up, teardown, as well as elastic operations to the SDN controller, which forwards them to the ABNO controller, as shown in Fig. 3(a). In this model, the SDN works as a proxy between applications and network. After checking local policies, the ABNO controller forwards the requests to the active stateful PCE, which performs LSP operations on the controlled network.

Fig. 3 Software-driven (a) and network-driven (b) models.

Download Full Size | PDF

It is worth noting that, although applications have full control over the connectivity process, physical network resources are shared with a number of clients and LSP set-up and elastic spectrum increments could be blocked as a result of lack of resources in the network. Hence, applications need to implement some sort of periodical retries to increase the allocated bandwidth until reaching the required level. These retries, could impact negatively on the performance of the inter-DC control plane and do not ensure achieving higher bandwidth.

In the network-driven model in Fig. 3(b), applications request transferences instead of connectivity. The source cloud manager sends a transfer request to the SDN controller specifying the destination DC, the amount of data that need to be transferred, and the maximum completion time. Upon its reception, the SDN controller requests the ABNO controller to find the greatest spectrum width available, taking into account local policies and current service level agreements (SLA) and sends a response back to the cloud manager with the best completion time. The source cloud manager organizes data transference and sends a new transfer request with the suggested completion time. A new connection is established and its capacity is sent in the response message; in addition, the SDN controller requests the ABNO controller to keep it informed upon more resources are left available in the route of that LSP. The ABNO controller has access to both the Traffic Engineering Database (TED) and the LSP-DB. Algorithms deployed in the ABNO controller monitor spectrum availability in those physical links. When resource availability allows increasing the allocated bitrate of some LSP, the SDN controller performs elastic spectrum operations so as to ensure committed transfer completion times. Each time the SDN controller modifies bitrate by performing elastic spectrum operations, a notification is sent to the source cloud manager containing new throughput. Cloud manager then optimizes VM migration as a function of the actual throughput while delegating ensuring completion transfer time to the SDN controller.

4. Illustrative results

For evaluation purposes, we developed scheduling algorithms in an OpenNebula-based cloud middleware emulator. Federated DCs are connected to an ad hoc event-driven simulator developed in OMNET + + . The simulator implements the SDN controller and the flexgrid network with an ABNO controller on the top, as described in Fig. 2. Regarding PCE, the algorithm described in [7] for elastic operations was implemented.

For our experiments, we assume the global 11- node topology depicted in Fig. 4. These locations are used as source for DC2U traffic. In addition, four DCs are strategically placed in Illinois, Spain, India, and Taiwan. DC2DC and DC2U traffic compete for resources in the physical network. We fixed the optical spectrum width to 4 THz, the spectral granularity to 6.25 GHz, the capacity for the ports connecting DCs to 1 Tb/s, the number of VMs to 35,000 with an image size of 5 GB each and we considered 300,000 DBs with a differential image size of 450 MB and a total size of 5 GB each; half the size of Wikipedia [8]. Additionally, TCP, IPv4, GbE and MPLS headers have been considered.

Fig. 4 Global inter-DC topology

Download Full Size | PDF

Figure 5 shows the required bitrate to migrate VMs and synchronize DBs in 30 minutes. VM migration is performed as a follow-the-work strategy and thus, connectivity is used for just during part of the day. In contrast, DB synchronization is performed along the day, although bitrate depends on the amount of data to be transferred, i.e. on users’ activity.

Fig. 5 Req. bitrate in a day.

Download Full Size | PDF

Figure 6(a) and Fig. 6(b) depict the assigned bitrate for DB synchronization and VM migration, respectively between two DCs during a 24 hours period when the software-driven model is used. Figure 6(c) and Fig. 6(d) show the assigned bitrate when the network-driven model is used. Software-driven model tends to provide longer transfer times as a result of not obtaining additional bitrate in retries. Note that although the initial bitrate which is assigned at each interval is the same, intervals tend to be narrower using the network-driven model. The reason is that models assigns additional bitrate to the connections as soon as resources are released by other connections.

Fig. 6 DC2DC connection bitrate vs. time.

Download Full Size | PDF

Table 1 shows the number of required requests messages per hour needed to increase bitrate of connections for the whole scenario. As illustrated, only 50% of those requests succeeded to increase connections’ bitrate under the software-driven model, in contrast to 100% reached under the network-driven model. Additionally, Table 1 shows that when using the network-driven model both the maximum and average required time-to-transfer are significantly lower than when the software-driven model is used. The longest transfers could be done in only 28 minutes when the network-driven model was used compared to just under 60 min using the software-driven model. Note that the amount of requested bitrate is the same for both models.

Table 1. Performance results

View Table

Finally, let us analyze the performance of both connectivity models under increasing DC2U traffic loads. Figure 7(a) plots the percentage of VMs not moved as first scheduled against the normalized background traffic intensity. Both models behave the same when the background traffic intensity is low or high. When the intensity is low, there are enough resources in the network so even in the case that elastic connection operations are requested, both models are able to perform scheduled VM migration in the required period. When the background intensity is high connections requests are rejected or are established with a reduced capacity that is unlikely modified. As a result, a high percentage of scheduled VM migrations could not be performed. However, when the background load increases without exceeding 5% of total blocking probability, the number of VMs not moved when using the application-driven model increases remarkably whereas when using the network-driven model that increment is softer until the normalized background load is greater than 0.4; when the normalized background load increases from 0.4, the lack of resources starts affecting also the latter model.

Fig. 7 Percentage of VMs not moved as first scheduled (a) number of connection requests (b).

Download Full Size | PDF

It is also interesting to see the total number of requests generated when each connectivity model is used. Figure 7(b) plots the amount of requests for set-up, elastic capacity increment or decrement, and teardown that arrive to the ABNO controller. When the application-driven model is used, the number of requests is really high compared to that number under the network-driven model. However, since the requests are generated by the cloud middleware without any knowledge of the state of the resources, the majority of those requests are blocked as a result of lack of resources. Such high utilization of the network resources is the target for the network operator. In contrast, in the network-driven model, elastic capacity increment or decrement requests are generated by the carrier SDN, which knows that some resources in the route of established connections have been released and elastic capacity operation could be successfully applied. In this case, the amount of requests is much lower but many of them are successfully completed (although some few can be also blocked).

5. Conclusions

A DC federation scenario has been used as example of a carrier SDN controller implementing a northbound interface with application-oriented semantic. Two connectivity models have been compared. The software-driven model needs periodical retries requesting increase connection’s bitrate, which do not translate into immediate bitrate increments and could have a negative impact on the performance of the inter-DC control plane. In contrast, the network-driven model takes advantage from the use of notify messages being able to reduce time-to-transfer remarkably.

Finally, the proposed network-driven model opens the opportunity to network operators to implement policies so as to dynamically manage connections’ bitrate of a set of customers and fulfill simultaneously their SLAs.

Acknowledgments

The research leading to these results has received funding from the European Community’s Seventh Framework Programme FP7/2007-2013 under grant agreement no. 317999 IDEALIST projects and by the MINECO through the TEC2011-27310 ELASTIC project.

References and links

1. Global Cloud Index Cisco, 2012.

2. M. Jinno, H. Takara, B. Kozicki, Y. Tsukishima, Y. Sone, and S. Matsuoka, “Spectrum-efficient and scalable elastic optical path network: architecture, benefits, and enabling technologies,” IEEE Commun. Mag. 47(11), 66–73 (2009). [CrossRef]

3. A. Castro, L. Velasco, M. Ruiz, M. Klinkowski, J. P. Fernández-Palacios, and D. Careglio, “Dynamic routing and spectrum (re)allocation in future flexgrid optical networks,” Elsevier Comp. Netw. 56(12), 2869–2883 (2012). [CrossRef]

4. M. Klinkowski, M. Ruiz, L. Velasco, D. Careglio, V. Lopez, and J. Comellas, “Elastic spectrum allocation for time-varying traffic in flexgrid optical networks,” IEEE J. Sel. Areas Comm. 31(1), 26–38 (2013) (JSAC). [CrossRef]

5. D. King, A. Farrel, “A PCE-based architecture for application-based network operations,” IETF draft, 2013.

6. E. Crabbe, J. Medved, R. Varga, and I. Minei, “PCEP extensions for stateful PCE,” IETF draft, 2013.

7. A. Asensio, M. Klinkowski, M. Ruiz, V. López, A. Castro, L. Velasco, and J. Comellas, “Impact of aggregation level on the performance of dynamic lightpath adaptation under time-varying traffic,” in Proc. IEEE International Conference on Optical Network Design and Modeling (ONDM), 2013.

8. Wikipedia, http://en.wikipedia.org/wiki/Wikipedia_talk:Size_of_Wikipedia#size_in_GB.

	Requests (#/h, %success)	Max/Avg Time-to-transfer (min)
	Requests (#/h, %success)	DBs	VMs
Software-driven	43.1, 53.5%	58.0 / 29.0	54.0 / 28.2
Network-driven	65.3, 100%	28.0 / 22.4	27.0 / 22.2

Towards a carrier SDN: an example for elastic inter-datacenter connectivity

Abstract

1. Introduction

2. Carrier software defined network (SDN)

3. Scenario

4. Illustrative results

5. Conclusions

Acknowledgments

References and links

Cited By

Figures (7)

Tables (1)

Optics Express