11
An Online Integrated Resource Allocator for Guaranteed Performance in Data Centers Dinil Mon Divakaran, Member, IEEE, Tho Ngoc Le, and Mohan Gurusamy, Senior Member, IEEE Abstract—As bandwidth is shared in a best-effort way in today’s data centers, traffic generated between a set of VMs (virtual machines) affect the traffic between another set of VMs (possibly belonging to another tenant) sharing the same physical links, leading to unpredictable performance of applications running on these VMs. This article addresses the problem of allocation of not only server resources (computational and storage) but also network bandwidth, to provide performance guarantees in multi-tenant data centers. Bandwidth being a critical shared-resource, we formulate the problem as an optimization problem that minimizes bandwidth demand between clusters of VMs of a tenant; and we prove it as NP-hard. We develop fast online heuristics as an integrated resource allocator (IRA) that decides on the admission of dynamically arriving requests, and allocates resources for the accepted ones. We also present a modified version of IRA, called B-IRA that bounds the cost of bandwidth allocation, while exploring smaller search space for solution. We demonstrate that, IRA accommodates significantly higher number of requests in comparison to a load-balancing resource allocator (LBRA) that does not consider reducing bandwidth between clusters of VMs. IRA also outperforms B-IRA when traffic demands of VMs in an input are not localized. Index Terms—Data center, resource allocation, bandwidth, NP-hard, virtual machines Ç 1 INTRODUCTION C LOUD providers today are looking forward to leasing out multiple instances of data centers, or virtual data centers (VDCs), to tenants. Simply put, a VDC is a set of resources that a tenant requires from a data center for running one’s tasks on a set of virtual machines (VMs). Leasing out VDCs to multiple tenants obviously leads to better business for the providers. Realization of multi- tenancy in data centers requires guaranteeing of predict- able performance to the tasks carried out in the VDCs allocated to tenants. Guaranteeing performance in turn re- quires some form of admission control to ensure resources are not over-subscribed. An accepted VDC will be allocated the requested server resources, such as computational and storage, by the provider. The VDC also obtains performance isolation from other VDCs, thanks to the coming of age of server virtu- alization. But the time to complete the applications or tasks running on the VMs depends not only on these resources (and their performance isolation), but also on another im- portant resourceVthe network bandwidth. The bandwidth available for communication between VMs of a VDC is dependent on the traffic demands between VMs, possibly, belonging to other VDCs. This network resource, unless allocated as part of the VDC, need not necessarily be available at the required time on the path(s) between the communicating VMs, resulting in unpredictable delay of the applications running on the VMs of a VDC. Studies have shown that the variability of network performance in data centers is significant [3], [17], and hence cannot be ignored. As the performance meted out to a tenant’s VDC de- pends critically on the network bandwidth, providers need to shun the best-effort way of bandwidth-sharing adopted in today’s data centers. Instead, bandwidth should be accounted and allocated in such a way that, even while maximizing the number of simultaneous VDCs hosted in a data center, predictable performance should be provided to all VDCs. To support such a system, a request for a VDC can specify not only the (computational and storage) re- sources at the server, but also bandwidth required between VMs of the request. The resource allocator then has to take into account, the bandwidth requirements of VMs of a re- quest, along with the server resources, for allocating a VDC to a tenant. This importance is highlighted using a simple example in Section 1 of the online supplementary file which is available in the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPDS. 2013.212. We argue that an integrated approach for allocating computational, storage and bandwidth resources for a VDC-request is the need of the hour; the lack of one or more of these resources can affect the performance of tasks running on the corresponding VDCs. Such an integrated resource management scheme should try to meet two important objectives: 1. Allocate server resources (computational and stor- age), as well as bandwidth for accepted VDCs; 2. Maximize the number of concurrent VDCs hosted with guaranteed performance. . D.M. Divakaran and M. Gurusamy are with the Department of Electrical and Computer Engineering, National University of Singapore (NUS), Singapore 117583. E-mail: {eledmd, elegm}@nus.edu.sg. . T.N. Le is with the Department of Electrical Engineering and Computer Science, Northwestern University, Evanston IL 60208-3118 USA. E-mail: [email protected]. Manuscript received 30 Oct. 2012; revised 25 July 2013; accepted 13 Aug. 2013. Date of publication 22 Aug. 2013; date of current version 16 May 2014. Recommended for acceptance by K. Li. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TPDS.2013.212 1045-9219 Ó 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 6, JUNE 2014 1382

An online integrated resource allocator for guaranteed performance in data centers

Embed Size (px)

DESCRIPTION

2014 IEEE / Non IEEE / Real Time Projects & Courses for Final Year Students @ Wingz Technologies It has been brought to our notice that the final year students are looking out for IEEE / Non IEEE / Real Time Projects / Courses and project guidance in advanced technologies. Considering this in regard, we are guiding for real time projects and conducting courses on DOTNET, JAVA, NS2, MATLAB, ANDROID, SQL DBA, ORACLE, JIST & CLOUDSIM, EMBEDDED SYSTEM. So we have attached the pamphlets for the same. We employ highly qualified developers and creative designers with years of experience to accomplish projects with utmost satisfaction. Wingz Technologies help clients’ to design, develop and integrate applications and solutions based on the various platforms like MICROSOFT .NET, JAVA/J2ME/J2EE, NS2, MATLAB,PHP,ORACLE,ANDROID,NS2(NETWORK SIMULATOR 2), EMBEDDED SYSTEM,VLSI,POWER ELECTRONICS etc. We support final year ME / MTECH / BE / BTECH( IT, CSE, EEE, ECE, CIVIL, MECH), MCA, MSC (IT/ CSE /Software Engineering), BCA, BSC (CSE / IT), MS IT students with IEEE Projects/Non IEEE Projects and real time Application projects in various leading domains and enable them to become future engineers. Our IEEE Projects and Application Projects are developed by experienced professionals with accurate designs on hot titles of the current year. We Help You With… Real Time Project Guidance Inplant Training(IPT) Internship Training Corporate Training Custom Software Development SEO(Search Engine Optimization) Research Work (Ph.d and M.Phil) Offer Courses for all platforms. Wingz Technologies Provide Complete Guidance 100% Result for all Projects On time Completion Excellent Support Project Completion & Experience Certificate Real Time Experience Thanking you, Yours truly, Wingz Technologies Plot No.18, Ground Floor,New Colony, 14th Cross Extension, Elumalai Nagar, Chromepet, Chennai-44,Tamil Nadu,India. Mail Me : [email protected], [email protected] Call Me : +91-9840004562,044-65622200. Website Link : www.wingztech.com,www.finalyearproject.co.in

Citation preview

Page 1: An online integrated resource allocator for guaranteed performance in data centers

An Online Integrated Resource Allocator forGuaranteed Performance in Data Centers

Dinil Mon Divakaran, Member, IEEE, Tho Ngoc Le, and Mohan Gurusamy, Senior Member, IEEE

Abstract—As bandwidth is shared in a best-effort way in today’s data centers, traffic generated between a set of VMs (virtualmachines) affect the traffic between another set of VMs (possibly belonging to another tenant) sharing the same physical links,leading to unpredictable performance of applications running on these VMs. This article addresses the problem of allocationof not only server resources (computational and storage) but also network bandwidth, to provide performance guarantees inmulti-tenant data centers. Bandwidth being a critical shared-resource, we formulate the problem as an optimization problem thatminimizes bandwidth demand between clusters of VMs of a tenant; and we prove it as NP-hard. We develop fast online heuristicsas an integrated resource allocator (IRA) that decides on the admission of dynamically arriving requests, and allocates resources forthe accepted ones. We also present a modified version of IRA, called B-IRA that bounds the cost of bandwidth allocation, whileexploring smaller search space for solution. We demonstrate that, IRA accommodates significantly higher number of requestsin comparison to a load-balancing resource allocator (LBRA) that does not consider reducing bandwidth between clusters of VMs.IRA also outperforms B-IRA when traffic demands of VMs in an input are not localized.

Index Terms—Data center, resource allocation, bandwidth, NP-hard, virtual machines

Ç

1 INTRODUCTION

CLOUD providers today are looking forward to leasingout multiple instances of data centers, or virtual data

centers (VDCs), to tenants. Simply put, a VDC is a set ofresources that a tenant requires from a data center forrunning one’s tasks on a set of virtual machines (VMs).Leasing out VDCs to multiple tenants obviously leads tobetter business for the providers. Realization of multi-tenancy in data centers requires guaranteeing of predict-able performance to the tasks carried out in the VDCsallocated to tenants. Guaranteeing performance in turn re-quires some form of admission control to ensure resourcesare not over-subscribed.

An accepted VDC will be allocated the requested serverresources, such as computational and storage, by theprovider. The VDC also obtains performance isolation fromother VDCs, thanks to the coming of age of server virtu-alization. But the time to complete the applications or tasksrunning on the VMs depends not only on these resources(and their performance isolation), but also on another im-portant resourceVthe network bandwidth. The bandwidthavailable for communication between VMs of a VDC isdependent on the traffic demands between VMs, possibly,belonging to other VDCs. This network resource, unlessallocated as part of the VDC, need not necessarily be

available at the required time on the path(s) between thecommunicating VMs, resulting in unpredictable delay ofthe applications running on the VMs of a VDC. Studieshave shown that the variability of network performance indata centers is significant [3], [17], and hence cannot beignored.

As the performance meted out to a tenant’s VDC de-pends critically on the network bandwidth, providers needto shun the best-effort way of bandwidth-sharing adoptedin today’s data centers. Instead, bandwidth should beaccounted and allocated in such a way that, even whilemaximizing the number of simultaneous VDCs hosted in adata center, predictable performance should be providedto all VDCs. To support such a system, a request for a VDCcan specify not only the (computational and storage) re-sources at the server, but also bandwidth required betweenVMs of the request. The resource allocator then has to takeinto account, the bandwidth requirements of VMs of a re-quest, along with the server resources, for allocating a VDCto a tenant. This importance is highlighted using a simpleexample in Section 1 of the online supplementary filewhich is available in the Computer Society Digital Libraryat http://doi.ieeecomputersociety.org/10.1109/TPDS.2013.212.

We argue that an integrated approach for allocatingcomputational, storage and bandwidth resources for aVDC-request is the need of the hour; the lack of one or moreof these resources can affect the performance of tasksrunning on the corresponding VDCs. Such an integratedresource management scheme should try to meet twoimportant objectives:

1. Allocate server resources (computational and stor-age), as well as bandwidth for accepted VDCs;

2. Maximize the number of concurrent VDCs hostedwith guaranteed performance.

. D.M. Divakaran and M. Gurusamy are with the Department of Electricaland Computer Engineering, National University of Singapore (NUS),Singapore 117583. E-mail: {eledmd, elegm}@nus.edu.sg.

. T.N. Le is with the Department of Electrical Engineering andComputer Science, Northwestern University, Evanston IL 60208-3118USA. E-mail: [email protected].

Manuscript received 30 Oct. 2012; revised 25 July 2013; accepted 13 Aug.2013. Date of publication 22 Aug. 2013; date of current version 16 May 2014.Recommended for acceptance by K. Li.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TPDS.2013.212

1045-9219 � 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 6, JUNE 20141382

Page 2: An online integrated resource allocator for guaranteed performance in data centers

The first objective has received attention recently, thoughnot necessarily as a single joint problem; whereas the sec-ond objective was not considered in the problems studiedin this context. Combining the joint resource allocationproblem from the first objective along with the optimizationfrom the second objective, makes integrated resource allo-cation a challenging task. Observe, the second objective isimportant for providers of data centers, as their revenue isa function of the number of VDCs hosted. Obviously,guaranteed performance can be achieved only by acceptingthose VDC-requests that do not cause resource bottleneckin the data center. Intelligent placement of VMs is neces-sary to ensure bandwidth is used efficiently for acceptingas many VDCs as possible. Hence, the second objective istied up with the problem of mapping of VDCs consideringtheir use of shared bandwidth. Our key observation here isthe importance of inter-cluster bandwidthVthe bandwidthbetween groups or clusters of VMs of a requestVin increasingthe number of VDCs (accepted) with guaranteed performance.

This paper proposes a three-phase integrated resourceallocator (IRA) that allocates both server resources andnetwork resource as per the demands of accepted requests.We first formulate the resource allocation as an optimiza-tion problem that minimizes the inter-cluster bandwidth.We prove inter-cluster bandwidth minimization as an NP-hard problem. We then develop IRA, constituting fastonline heuristics, to form VM-clusters and map them ontodata center, in three phases. Taking network as the criticalresource for sharing, IRA attempts to increase the numberof concurrent VDCs hosted, by forming clusters of VMs(hereafter referred to as VM-clusters), and then mappingthem onto clusters of serversVserver-clusters (defined moreprecisely in Section 3)Vin such a way that the inter-clusterbandwidth is reduced, leading to more efficient use of net-work bandwidth. For comparison, we also develop amodified version of IRA, called B-IRA, that bounds thecost of bandwidth allocation, while exploring smallersearch space for solutions. Through extensive simulations,we demonstrate the effectiveness of our proposal. Wecompare IRA with B-IRA; in addition, we compare IRAwith a load-balancing resource allocator (LBRA) that aimsto form balanced VM-clusters (with respect to the numberof VMs) oblivious of the traffic demands between them,and then maps the VM-clusters onto server-clusters. Wedemonstrate that our proposed mechanism performssignificantly better than LBRA. Depending on the scenar-ios, the number of VDC-requests accepted by IRA can beapproximately twice as many as that accepted by LBRA.While IRA and B-IRA perform similarly in most scenarios,IRA outperforms B-IRA when the traffic demands of VMsin an input VDC are not localized.

Our previous work took the initial steps towards addres-sing this problem, and the solution approach focussedspecifically on grouping VMs of a VDC into just two VM-clusters [9]. We now take this forward and make newcontributions in this paper, specifically:

. We formulate the problem explicitly, and prove thatthe grouping problemVof grouping VMs of a VDCinto VM-clusters such that the inter-cluster band-width is minimizedVis NP-hard.

. We develop a three-phase integrated resource allo-cator, IRA, that generalizes the concept of groupingintroduced in [9]. Different from [9], the algorithmsdeveloped here exploit the optimal min-cut algo-rithm to form VM-clusters.

. We modify IRA to define another resource allocatorcalled B-IRA, which uses an approximation algorithmgiving bounds on the cost of bandwidth allocation ofa VDC.

. We present useful insights and more comprehensiveperformance evaluations of the IRA mechanism.

After discussing the related works, we presents theproblem formulation and complexity in Section 3. Thethree-phase IRA is developed in Section 4. The resourceallocator B-IRA is defined in Section 5. We conduct per-formance evaluations in Section 6, before concluding inSection 7.

2 RELATED WORKS

SecondNet is a data center network virtualization archi-tecture [8]; it assumes as part of a request, a matrix spe-cifying bandwidth demands between every VM pairs. Theallocation algorithm first locates a cluster based on theserver resource requirement, and then proceeds to build abipartite graph of VMs and servers based on both theserver resources and bandwidth requirement. A matchingis obtained by translating the problem to min-cost networkflow, where weight of a link is the used server bandwidth.Hence this mechanism does not try to reduce the band-width used on inter-cluster links. Besides, it is the (server-)clustering heuristics that plays major role in determiningthe VM-clusters.

Seawall focuses on enforcing link-bandwidth allocationto competing VMs based on the weights assigned to VMs[17]. Bandwidth obtained by a VM on a shared link isproportional to its weight, and end-to-end bandwidth isthe minimum of the bandwidth of all links in the path. In[14], the authors argue that having such a weight as thepayment, and guaranteeing minimal bandwidth to VMs,should be two important objectives in data center networks,while there exists trade-off between these two objectives.Similar to Seawall, Gatekeeper is designed to supportbandwidth guarantees [15]. While Seawall shares a linkbandwidth among VMs, Gatekeeper can achieve band-width-sharing among competing tenants.

In another work on bandwidth allocation for VDCs, theauthors consider a single aggregate value for (ingress andegress) bandwidth requirement of each VM in a request [3].The allocation manager searches for a suitable sub-treeat each level that satisfies the server resources as wellas bandwidth requirements. Due to the assumptionof single aggregate bandwidth requirement per VM, theproblem of reducing inter-cluster bandwidth does notarise here. Works related to virtual networks, Cloud-network platform and pricing of Cloud bandwidth aresummarized in Section 2 of the supplementary fileavailable online.

While the above works address issues related to band-width provisioning, none has focussed on reducing the

DIVAKARAN ET AL.: ONLINE INTEGRATED RESOURCE ALLOCATOR FOR GUARANTEED PERFORMANCE 1383

Page 3: An online integrated resource allocator for guaranteed performance in data centers

bandwidth between clusters of VMs. As far as we know, thereare only two works that looked into bandwidth-demandsbetween VM-clusters (given traffic demands between VMpairs). In [13], the traffic-aware VM placement problem wasidentified as an NP-hard problem; and assuming static,single-path routing, the authors proposed off-line algo-rithms for clustering of servers and VMs, and a method tofind a mapping between server-clusters and VM-clustersusing traffic between VMs as a metric; but, importantly,without considering server resource requirements of VMs.It is worth noting that, application of an offline frameworkto solve the problem online can result in inefficient migrationof VMs. Our work is different, as we focus on an onlineapproach for the allocation of VDCs, and define successfulmapping of the VM-clusters in a data center based on theinter-cluster bandwidth as well as the server resource andbandwidth demands of each VM within a cluster, whileaiming to accommodate as many VDCs as possible.

Another work that considers bandwidth betweenVM-clusters (albeit in another perspective) for VM place-ment is [10]. While the problem addressed is similarto what we present here, the authors in [10] view itdifferently as a traffic engineering problem, and hencetheir objective is to minimize congestion (averaged over alllinks) in the network. They attempt to find the near-optimalconfiguration (in terms of placement of VMs and findingthe less-congested paths) using the Markov approximationtechnique developed in [5]. The state-space of the MarkovChain is a function of the number of VDCs, VMs, servers,as well as the number of paths between the servers. Thisessentially means there can be millions of states for a largedata center with thousands of nodes hosting hundreds ofVDCs and having multiple paths between servers,raising doubt on the scalability of the approach. Sincethe objectives are different, the performance evaluationin [10] also takes a different route, focussing on utilizationof network links.

3 PROBLEM FORMULATION AND COMPLEXITY

Before formulating the problem, we discuss on the datacenter architecture considered here. Many different archi-

tectures have been proposed for data centers recently; fat-tree [2], VL2 [6] and BCube [7] being interesting examples.Some are designed with the aim to minimize the over-subscription in the data center networks. For example, theover-subscription ratio in a fat-tree is 1 : 1; which means,for bandwidth allocation, an allocation manager needs tocheck the residual bandwidth of only those links thatconnect the edge-switches and the servers. But fat-treeand its successors face a deployment challenge, as theylimit incremental expansion of data centers [18], besideshaving high complexity [11].

Our focus here is on the more generic three-tier hier-archical architecture (refer Fig. 1) commonly used in datacenters today [1]. The three tiers are: Core, Aggregation andEdge. As depicted in the figure, we use the term server-cluster to refer to the set of servers connected to the sameedge-switch. Hence, if an entire VDC can be mapped onto asingle server-cluster, in terms of network resources it means,the VDC will use only the links of a single switchVtheedge-switch connecting the server-clusterVfor communi-cation among its VMs. The figure also marks Pods. A Podis a set of server and network resources, used for easyexpansion of data centers. More importantly, observe,mapping an entire VDC onto a single Pod, ensures trafficis localized within the Pod. The architecture allows multi-path routing as well as over-subscriptions between dif-ferent tiers as discussed in Section 6.2. Importantly, thismeans, when servers try to communicate at their full link-capacities, congestion may occur at the edge switches or atthe aggregation switches.

3.1 Problem FormulationA request for a VDC from a user is of the form fN;R;Bg,where N is the number of VMs required, R is a vector ofserver resource units required by the VMs, and B is thematrix of bandwidth demands; that is, Bi;j is the bandwidthdemand from VM vi to VM vj. We refer to the graph formedof VMs of a request as VM-graph.

For simplicity, the computational and storage require-ments are combined and expressed in terms of serverresource units. Let M be the number of servers in a datacenter, and rk, 1 � k �M, denote the server resources unitsavailable at server sk. The bandwidth available in the pathconnecting servers sk and sm is denoted by bk;m. The cost ofmapping two VMs, say vi and vj onto servers sk and sm,respectively, is defined as the product of Bi;j and Ck;m,where we define Ck;m as

Ck;m ¼ 0 if sk and sm are on same server-cluster1 otherwise.

�(1)

One could also define a higher cost if VMs of a VDC aremapped to different Pods (by identifying the server indicesk and m with Pods), in comparison to the case where all theVMs are mapped to the same Pod.

Observing network-bandwidth as the critical resourcefor guaranteeing performance to tasks running on the VMsof a VDC, we set the objective to minimize the overall

Fig. 1. Three-tier hierarchical data center architecture.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 6, JUNE 20141384

Page 4: An online integrated resource allocator for guaranteed performance in data centers

bandwidth between every pair of edge switches. For asingle VDC-request, the problem of mapping VMs toservers while minimizing the network bandwidth, can beformulated as a nonlinear programming problem:

minimize :XMk¼1

XMm¼1

XNi¼1

XNj¼1;j6¼i

xi;kxj;mBi;jCk;m

s:t:Xi

Rixi;k � rk k ¼ f1 . . .Mg;

XMk¼1

xi;k ¼ 1 i ¼ f1 . . .Ng;

XNi¼1

XNj¼1;j 6¼i

xi;kxj;mBi;j � bk;m k;m ¼ f1 . . .Mg;

xi;k 2 f0; 1g i ¼ f1 . . .Ng;

k ¼ f1 . . .Mg: (2)

xi;k is a binary variable which takes a value one if andonly if VM vi is mapped to server sk. The first constraintensures that as VMs are mapped to servers, the serverresources available are not over-subscribed. The secondconstraint sees that each VM is mapped to only one of theservers. The (residual) bandwidth on every path connect-ing the servers with mapped VMs is respected with thethird constraint. Note that, following common practice, theabove formulation can be linearized by introducing appro-priate binary variables.

In other words, the problem is to find the optimal(minimum-cost) grouping of VMs into VM-clusters in onestage, such that the VM-clusters can be successfully mappedto server-clusters in another stage.

3.2 Problem ComplexityAs mentioned earlier, a subset of this problem (focussingonly on the bandwidth requirement) was studied in [13],and was formulated as an instance of the quadratic as-signment problem (QAP), an NP-hard problem [4]. How-ever, the problem we formulated (in Eq. (2)) belongs to theclass of generalized quadratic assignment problem (GQAP),which, as the name indicates, is a generalized version ofQAP [12]. GQAP is also known to be NP-hard; in fact it isstrongly NP-hard [12], implying, a fully polynomial approx-imation scheme can not be expected to exist. In the follow-ing, we show that even the grouping problem is in itself anNP-hard problem.

Theorem. Given a request for a VDC with inter-VM band-width requirement matrix, the problem of grouping VMsinto VM-clusters such that the sum of the bandwidthsbetween VM-clusters is minimized, is NP-hard.

Proof. The proof is given in Section 3 of the supplementaryfile available online. g

4 INTEGRATED RESOURCE ALLOCATOR (IRA)Next we develop the three-phase integrated resourceallocator, IRA, and discuss its run-time complexity. Assaid before, a VDC-request is of the form fN;R;Bg. Withoutloss of generality, we assume every VM in a VDC-requestrequires the same number of server resource units. Thatis, instead of the vector R, from now on, we assume eachVM of a VDC-request requires a constant R units of serverresources specified as part of input. The total number ofserver resource units required by the VDC is thus N �R.Obviously, different VDC-requests can have differentvalues for R. This simplification helps us to focus more onnetwork resource allocation.

4.1 Overview of IRAThere is a centralized integrated resource allocator, IRA,for a data center, which receives requests for allocations ofVDCs. The IRA goes through three phases to decide andallocate a VDC to a tenant. Phase One is responsible forgenerating a list of partitions for a given VDC-request,where each partition (follows the definition of a partition inset theory, and) is a set of mutually exclusive VM-clustersformed from the VMs of the VDC-request. The heuristicsin this phase associate a cost with each partition; the costbeing the sum of the bandwidth between every pair ofVM-clusters of the partition. The heuristics aims for(possibly local) minimum cost while forming partitions.The IRA takes the minimum-cost partition from the listgenerated, and in Phase Two attempts to find a set ofserver-cluster candidates for each of the VM-clusters in thepartition. The IRA then selects a candidate for each VM-cluster and checks for paths between every two server-clusters with bandwidth not less than the bandwidth-demand between the corresponding VM-clusters. In theworst case, Phase Two will explore all the paths between allthe candidates of the VM-clusters. If Phase Two issuccessful, IRA proceeds to Phase Three to allocateresources as per the mapping discovered in Phase Two;in case Phase Two fails, the next minimum-cost partitionis selected for resource discovery in Phase Two. This iscontinued either until a successful partition is mappedand allocated, or until all the partitions in the list havebeen explored.

We highlight here that, the IRA strives to fit (the VM-clusters of) a VDC onto a single-server cluster, and if not,onto multiple server-clusters of the same Pod, and if not,onto server-clusters of different Pods. The basic idea is tolocalize the traffic of a VDC.

The steps for IRA are given in Algorithm 1. Commentsare preceded with ‘##’. A VDC-request is denoted asreq ¼ fN;R;Bg. The number of VM-clusters formed in asingle partition of a VDC is limited to n. A partition of size kðk � nÞ, i.e., with kVM-clusters, is referred to as k-partition.Denote by m, the maximum number of 2-partitions thatwill be explored for a VDC-request. For example, if m ¼ 10,ten 2-partitions, each with two VM-clusters, will be gen-erated. If n ¼ 4, the maximum size of all partitions wouldbe four; in other words, each VDC will be grouped into nomore than four VM-clusters.

DIVAKARAN ET AL.: ONLINE INTEGRATED RESOURCE ALLOCATOR FOR GUARANTEED PERFORMANCE 1385

Page 5: An online integrated resource allocator for guaranteed performance in data centers

Algorithm 1 IRAðreq; n;mÞ1: g all VMs of req ## partition of 1 VM-cluster

2: if find resourcesðð0; fggÞ;BÞ then

3: allocateðð0; fggÞÞ4: return ACCEPT5: end if

6: � gen two partitionðreq; mÞ ## Each item, ! 2 �, is ofthe form ðC;GkÞ

7: for ! ¼ �:deleteðÞ do

8: if find resourcesð!;BÞ then

9: allocateð!Þ10: return ACCEPT

11: end if

12: !0 ¼ gen new partitionð!; n;BÞ13: if !0 6¼ NULL then

14: �:insertð!0Þ15: end if

16: end for

17: return REJECT

Initially, the algorithm groups the entire VDC as apartition with one VM-cluster fgg. Then, in line no. 2,invoking find_resources for resource discovery, it tries tofind a single server-cluster to map the VDC as a single VM-cluster with zero cost. If resources were discovered, theyare allocated by a call to allocate (line no. 3), and the VDC-request accepted. In case a single server-cluster with suf-ficient resources is not available, the algorithm proceedsto group the VDC into multiple VM-clusters with theaim of reducing the aggregate bandwidth between theVM-clusters. The algorithm initializes �, a min-priorityqueue, to the list of m 2-partitions generated by the functiongen two partition (line no. 6). The lowest priority correspondsto the minimum cost of the partition (see Section 4.1.1of the supplementary file available online). Each item in � isof the form ðC;GkÞ, where C is the cost of mapping thepartition Gk onto the data center. The partition Gk is aset of k ð2 � k � nÞ VM-clusters g1;g2; . . . ;gk. Theresource-discovery for each partition is achievedthrough calls to find resources. For every failed parti-tion, say of size k, corresponding to item !, a newpartition of size kþ 1 is generated using the functiongen new partition, if and only if k G n. The newly createdpartition, along with its cost, are inserted as an item into �(line no. 14). � will never have more than m candidatepartitions at any point in time. Therefore the terminationof the algorithm is guaranteed, either when it stops aftersuccessfully mapping and allocating a partition (therebyaccepting the request), or after it unsuccessfully explored(and also deleted) all the candidates in �, in which case itrejects the VDC-request.

Next, we develop the algorithms supporting Algorithm 1,classifying them under three phases.

4.2 Phase OneVGroupingGrouping (of VMs into multiple VM-clusters) is neededonly if a VDC cannot be allocated on a single server-cluster.Recall that, as per our definition, all the servers in a server-cluster are attached to the same edge-switch. In this phase,

the VMs of a VDC-request are grouped into VM-clustersdepending on their bandwidth requirements. As the dif-ferent VM-clusters formed are finally mapped onto differ-ent server-clusters residing in the same Pod or in differentPods, the important cost parameter here is the inter-clusterbandwidth demands.

4.2.1 The Cost of a PartitionConsider a partition G for a given VDC-request. Corre-sponding to every such partition G of size k, there existsan inter-cluster-bandwidth (ICB) matrix, a square matrixIG of order k, representing the inter-cluster-bandwidthbetween VM-clusters of the partition G. That is, for1 � i; j � k � n, IG

i;j denotes the bandwidth required forcommunication from the ith VM-cluster to the jth VM-cluster, both belonging to the partition G. The diagonalelements of such matrices are zeroes. The cost of apartition, therefore, is nothing but the sum of the entriesin its ICB.

The functions, gen two partition and gen new partition,invoked in Algorithm 1, are part of this phase. Whilethe first one is invoked only once, to generate a list of2-partitions, the second one is called each time a partitionfails to find sufficient resources. That is, gen new partitiongenerates a new partition from an existing partition forwhich sufficient resources could not be discovered.Descriptions and run-time complexities of these functionsare given in Section 4.2 of the supplementary file availableonline; below, we briefly discuss the logic.

4.2.2 Function gen_two_partitionThis function generates a list of 2-partitions by first ob-taining an optimal 2-partition using min-cut implemen-tation, thereby making an important deviation from ourprevious work [9] (algorithm in [9] does not start with anoptimal 2-partition). Then, using a simple heuristic ofiteratively removing the most expensive node from onegroup to the other, the algorithm generates the remaining2-partitions.

4.2.3 Function gen_new_partitionGiven a failed partition of size k ðk G nÞ, this functiongenerates a new partition of size kþ 1 by partitioning thelargest unmapped VM-cluster (for which resources couldnot be discovered) of the partition using min-cut.

4.3 Phase TwoVResource DiscoveryFor the minimum-cost partition in the list �, the IRAhas to check if server-cluster candidates can be foundfor the VM-clusters. A server-cluster is a candidate formapping a VM-cluster onto it, if it can satisfy the re-source requirementsVserver resources and bandwidthVofVMs in the VM-cluster. The partition is successfully selectedfor mapping only if,

1. there is at least one server-cluster candidate forevery VM-cluster in the partition, and

2. if there is sufficient bandwidth, respecting the inter-cluster bandwidth matrix, in the path(s) connectingthe selected server-clusters.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 6, JUNE 20141386

Page 6: An online integrated resource allocator for guaranteed performance in data centers

If any of the above conditions cannot be met, thepartition is abandoned, and the next partition with min-imum cost is selected for resource discovery (as given inAlgorithm 1). Recall, the mechanism always attempts tomap all the VM-clusters (of a partition) onto server-clustersbelonging to the same Pod, and if not, onto server-clustersof different Pods.

Algorithm 2 defines find resources function invoked inAlgorithm 1. Let np denote the number of Pods in a datacenter, and ne the number of edge-switches within a Pod.Every edge-switch is connected to ns servers. The totalnumber of servers in a data center is, therefore, np� ne � ns.The maximum size of a server-cluster is ns (servers). Thematrix Ap;e, corresponding to the server-cluster at edge-switch feje 2 1 . . .neg in Pod fpjp 2 1 . . .npg, maintains theresidual server and bandwidth resources available inthe server-cluster. Section 4.1.2 in the supplementary fileavailable online defines structures for discovering a list ofcandidate server-clusters in Oðnp:neÞ steps.

Algorithm 2 find resourcesð!;BÞ1: fg2: ðC;GÞ !

3: for each g 2 G do

4: for each server-cluster at ðp; eÞ do

5: res ¼ is avail serverCðAp;e;gÞ6: if res 6¼ FAIL then

7: ½g�:appendðresÞ8: end if

9: end for

10: if ½g� ¼¼ fg then

11: return FAIL ## no candidate found

12: end if

13: end for

14: return find mappingðG; Þ

is a dictionary that holds the server-cluster candidatesfor each VM-cluster of a given partition. The valuecorresponding to each key (VM-cluster) is a list of server-cluster candidates. For each VM-cluster g in a partition,Algorithm 2 iterates over all server-clusters to collectpotential candidates for mapping using the functionis avail serverC. The algorithm can return a failed statusin two cases: 1) if there is no candidate server-cluster forone or more VM-clusters (line no. 11), and 2), if it could notsuccessfully map all of the VM-clusters onto the data centerdue to unavailability of paths between the server-clustercandidates (line no. 14). The second case is determined bythe call to the function find mapping in the last line.

4.3.1 ComplexityThe function is avail serverC is called Oðn:np:neÞ times,while find mapping is called just once.

The run-time complexities of the functions is avail serverCand find mapping are discussed in Sections 4.3 and 4.4,respectively, of the supplementary file available online,which also define and describe the functions in detail.Below, we give the logic.

4.3.2 Function is_avail_serverCThis function checks if a given server-cluster is a candidatefor the VM-cluster g.

4.3.3 Function find_mappingOnce find resources succeeds in finding candidates for allVM-clusters, it invokes find mapping. The functionfind mapping uses recursive backtracking to finalize, foreach VM-cluster of the partition, a server-cluster from itslist of candidates. The algorithm in each stage temporarilyselects a server-cluster, say c, for a unmapped VM-cluster,say g, if c has paths with sufficient bandwidths to allpreviously selected server-clusters. After this mapping, arecursive call is made to find a server-cluster for the nextunmapped VM-cluster. An unsuccessful recursive callresults in backtracking the current mapping of c to g, andexploring another candidate cluster from the list.

4.4 Phase ThreeVAllocationIn this final phase, the resources are allocated according tothe mapping. The function allocate invoked in Algorithm 1comes in this phase. We skip the function definition as thesteps are simple. The dictionary final mapping keeps the finalmapping of VM-clusters to server-clusters; and accordinglythe server and bandwidth resources are allocated and thecorresponding data-structures updated to maintain residualcapacities.

4.5 IRA ComplexityThe running time of IRA is dominated by find resourcesfunction, which is called OðnmÞ times. The complexity offind resources is Oðn:npn2

eðlogðneÞ þNÞ þ jEjSnÞ, where Sis the maximum number of server-cluster candidates toexplore for mapping and jEj is the number of links connectingthe switches (excluding the server links) in the data centernetwork. It is safe to say that the first part (of the outermostsummation) does not play a major role in deciding therunning time. Indeed, the running time of IRA is governedby the second term, OðmnjEjSnÞ.

Remarks. The dominant factor in the running time is Sn, nbeing the maximum size of partitions to be explored fora VDC. But as the performance study (Section 6) reveals,n ¼ 4 gives good results, with higher values of n givingdiminishing returns as one or more resources eventu-ally become a bottleneck. The value of S is not morethan a couple of hundreds for a large data center (upperbounded by np:ne). Besides, Sn accounts for the worst-case mapping time. The actual number of server-clustercandidates that need to be explored for a VM-cluster canbe brought down considerably, by maintaining simplecounters at Pods. Each Pod maintains the index of theserver-cluster that has the maximum available band-width, as well as the available bandwidth (breaking tiesrandomly). This can be achieved by updating theinformation after every successful allocation in Oð1Þtime. With this information, find resources can nowprioritize the server-cluster candidates in increasingorder of maximum available bandwidth, which canlater be used by find mapping to choose (among all thedifferent Pods for a VM-cluster) the Pod corresponding

DIVAKARAN ET AL.: ONLINE INTEGRATED RESOURCE ALLOCATOR FOR GUARANTEED PERFORMANCE 1387

Page 7: An online integrated resource allocator for guaranteed performance in data centers

to the least bandwidth. By doing so, the algorithmincrementally exhausts Pods of the bandwidth resource,leaving lesser Pods (and hence, lesser server-clusters) tobe searched in the worst case scenarios.

5 BOUNDED IRAThe IRA initially generates m 2-partitions, and for eachpartition that fails to be allocated resources, a partition witha higher size (incremented by one) is generated. Hence, inthe worst case, OðmnÞ partitions are explored. Though thevalue of m is in a few tens, we now discuss an approx-imation algorithm that explores only n candidates, but atthe same time gives performance bound.

The VM-clusters in a partition of size k, where 2 � k � n,are similar to the subgraphs (connected components)formed by a k-cut. But, apart from the partition(s) formedby the min-cut, all other partitions of IRA are formed by asimple heuristic that guarantees no bound on the sum ofthe inter-cluster bandwidth, or simply put, the cost of thepartition. Our motivation for this heuristic in IRA is toobtain more candidates for each input VDC, aiming higherpossibility of accepting a VDC request.

Though min k-cut is an NP-hard problem, there arepolynomial-time approximation algorithms that givebounded performance. One such is the SPLIT algorithmdeveloped by Saran and Vazirani [16], which gives a k-cuthaving cost (the sum of the inter-cluster bandwidth) withinð2� 2

kÞ of the optimal k-cut. We define a bounded inte-grated resource allocator, or B-IRA, that makes not morethan n� 1 calls to SPLIT to generate a maximum of n� 1candidate partitions (of size greater than one), in Phase One(Grouping); and works exactly as IRA in the remaining twophases. The run-time complexity of B-IRA is dominated byOðnjEjSnÞ, as Phase One takes only polynomial time. Butmore importantly, all the candidate partitions are now sureof having the cost within ð2� 2

kÞ of the optimal.

6 PERFORMANCE STUDY

We evaluate IRA using simulations. We compare it withB-IRA as well as a load-balancing resource allocator (LBRA)described in the next section. The simulation settings, themetrics used for performance study, and the scenarios aredescribed in Sections 6.2, 6.3, and 6.4, respectively. InSection 6.5, we discuss the results.

6.1 LBRA: Load-Balancing Resource AllocatorAs in IRA, the LBRA first attempts to fit a VDC onto asingle server-cluster. If this is not possible, it splits theVMs of a VDC-request into two balanced VM-clustersVbalanced with respect to the number of VMsVand if thatfails, it splits the VMs into three balanced VM-clusters, andso forth. For example, if there are 24 VMs in a VDC-request,the LBRA will try to fit all the 24 VMs onto a single server-cluster. If this is not successful, it splits the VMs into twoVM-clusters of size 12 each, and then tries to fit thesetwo VM-clusters onto server-clusters. If it fails again, it triesVM-clusters of size eight each, and so forth. We assume noother intelligence in the formation of the VM-clustersVspecifically, the LBRA does not attempt to reduce the

inter-cluster-bandwidth requirement while forming VM-clusters, hence the cost (which is the inter-cluster bandwidth)is dependant on the sets of VMs that formed as differentVM-clusters. The resource-discovery phase of LBRA is similarto IRA, rejecting a request due to the unavailability of server-clusters or path(s) with sufficient bandwidth. The LBRA, likeIRA (and B-IRA), also tries to fit the groups within the samePod if possible, before trying to fit into server-clusters ofdifferent Pods. In short, LBRA and IRA differs only in PhaseOne, i.e., grouping.

6.2 Settings for SimulationsWe consider the multi-rooted three-tier hierarchical archi-tecture depicted in Fig. 1 (refer [1, Ch. 3] for further details).There are eight core switches and 16 Pods. Each Pod hastwo aggregation-switches, and 12 edge-switches. An edge-switch has two ports to connect to the two aggregation-switches in the same Pod, and 48 ports to connect 48 servers.Therefore, the total number of servers is 9216. The totalnumber of server-clusters is np � ne ¼ 192, which is also theupper bound for S.

The capacities of links connecting edge-switches andaggregation-switches, as well as of links connectingaggregation-switches and core-switches were set depend-ing on the required over-subscription. This architectureallows different over-subscription ratios between differenttiers. Let us consider as reference, an aggregation-to-coreover-subscription ratio of 1.5 : 1, and an edge-to-aggregationover-subscription ratio of 2.4 : 1. With 1G links connectingthe servers and the edge-switches, the capacity on an edge-to-aggregation link as well as that on each aggregation-to-core link is 10G. This will mean that, under high load, thebandwidth available per server is limited to �277 Mbps[1, Ch. 3]. We vary the edge-to-aggregation over-subscriptionratio for different scenarios (as explained later).

Regarding server resources, each server was assigned12 units, though the IRA can work with heterogeneousservers. VDC-requests are generated randomly and fedto the simulator one after the other, incrementally, untilthe maximum number of requests is reached; that is, VDCrequests arrive dynamically at the system. Hence, theranges we define for various parameters to generate theinput are set so as to have load ranging from low tomedium to high, with increasing number of VDCs at input.The size of a VDC in the number of VMs ðNÞ, as well as theserver resource units ðRÞ of VMs of a request, are bothgenerated randomly. The range of VDC sizes is [150-225]VMs. The range of the server resource requirement of aVM is [1-5]. Recall, the bandwidth on the VM-graph of aVDC is represented by B, the matrix specifying bandwidthdemand between every VM pair. There is a link betweenany two VMs of a VDC-request with a certain probability,and these link-probabilities define the VM-graph of aVDC-request. The probabilities and the values in B de-pend on the scenarios, and will be discussed in Section 6.4.The aggregate bandwidth demand of a VM is set to 250 Mbps(on an average), getting close to the maximum achievablethroughput under high-load for an edge-to-aggregationover-subscription ratio of 2.4 : 1. The value of m, the num-ber of 2-partitions to be generated for a VDC in IRA, isset to 25.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 6, JUNE 20141388

Page 8: An online integrated resource allocator for guaranteed performance in data centers

6.3 MetricsWe use three metrics for performance evaluation:

1. Percentage of VDCs accepted: Going by our objective,we consider the percentage of input VDCs that areaccepted by the system as an important metric.

2. Cost: This cost is the inter-cluster bandwidth. EveryVDC-request that is split into multiple VM-clusterswill incur a cost equal to the sum of the entries in theinter-cluster bandwidth matrix of the correspondingmapped partition. We plot the average cost peraccepted VDC to analyse the results.

3. Partition size: To view the affect of grouping, weobserve how many VDCs were mapped as k VM-cluster(s), 1 � k � n.

6.4 ScenariosWe consider four scenarios for the study, each differing inone or both of the following: 1) input VM-graph, and 2)edge-to-aggregation over-subscription ratio in the datacenter network. The aggregation-to-core over-subscriptionratio is 1.5 : 1 for all scenarios.

6.4.1 Scenario 1In this scenario, the distribution of VM-to-VM bandwidth isnon-uniform (highly uneven). Specifically, the VM-graph isgenerated as a graph of two subgraphs, where the VM-to-VM bandwidth is higher within each subgraph and muchlower between subgraphs (in the ratio 10 : 1). The proba-bility of a link between two VMs of the same subgraph is setto a high value of 0.9, and that between two VMs ofdifferent subgraphs is 0.1. This is similar to the partitionedtraffic model of [13], but with the link probabilities definedso as to allow communication (though low) betweensubgraphs, and high communication within each subgraph.The edge-to-aggregation over-subscription ratio here is2.4 : 1. The link capacities are as mentioned in Section 6.2.

6.4.2 Scenario 2This scenario is similar to Scenario 1, except that the edge-to-aggregation over-subscription ratio is now increased to4.8 : 1. This is achieved by changing the capacities of (only)edge-to-aggregation and aggregation-to-core links.

6.4.3 Scenario 3This is obtained by modifying Scenario 2Vinput graph hereconsists of four subgraphs, with high demands withinsubgraphs and low demands between subgraphs. The over-subscription ratio and the probabilities of links betweenVM-pairs are the same as in the previous scenario.

6.4.4 Scenario 4Unlike the previous scenarios, this scenario does not formsubgraphs with high intra-subgraph traffic demands. AVM can form a link to any other VM in the input VM-graphwith the same probability; we set this link-probability to 0.25.Bandwidth demands on links from VM v is uniformly andrandomly selected from the range ½1; bv�Mbps, where bv is set,depending on the number of links, so to have an aggregatebandwidth demand of 250 Mbps from v (as mentioned inSection 6.2). The edge-to-aggregation over-subscriptionratio is 4.8 : 1.

6.5 ResultsHere, we present and discuss the results for the differentscenarios. In each run, a pre-determined number of VDC-requests were incrementally fed to the three allocators(IRA, B-IRA and LBRA), all making online decisions on theadmission of requests. This number of input VDCs rangedfrom 100 to 225 at steps of 25, with 100 requests consideredas a low load, and 225 requests considered as a high load.Corresponding to each setting (i.e., fixed number of inputVDC-requests), 15 instances were simulated. Therefore,each point on the graphs (and each bar on the histograms)is the mean of the values obtained from 15 runs. The graphsfor acceptance percentage and cost mark the 95 percentconfidence intervals.

It is obvious that the number of VDCs accepted will notkeep increasing with increasing n, the maximum size ofpartitions, as one (or more) of the resources will eventuallybecome a bottleneck. As we will see later, for most ofour experiments, grouping VMs into not more than fourVM-clusters was sufficient. Hence for the results obtainedbelow, the the size of a partition, n, was set to four, unlessotherwise mentioned.

6.5.1 Scenario 1Fig. 2a plots the percentage of accepted VDCs, for differentnumbers of VDC-requests given at input. Recall that for

Fig. 2. Scenario 1: Acceptance percentage, average cost, and number of VM-clusters of size k ðk � 4Þ formed. (a) Percentage of VDCs accepted.(b) Cost, in terms of bandwidth. (c) Number of VM-clusters formed.

DIVAKARAN ET AL.: ONLINE INTEGRATED RESOURCE ALLOCATOR FOR GUARANTEED PERFORMANCE 1389

Page 9: An online integrated resource allocator for guaranteed performance in data centers

each experiment, VDCs arrive dynamically to the system. Wesee that, though all allocators accepted close to 100 percentof the requests at low loads, with increasing number ofVDC-requests, IRA and B-IRA accept higher percentageof VDCs than LBRA, accepting approximately 27 percentmore VDCs in the best case. From Fig. 2b, we observe thatthis improved performance comes at a significantly lowercost. In fact, it is because IRA and B-IRA try to reduce theinter-cluster bandwidth, that more VDCs get accepted bythem. Besides, it can be seen that, even when all the threeallocators accept relatively the same percentage of inputVDCs (when the number of input VDCs is 100 and 125),LBRA still incurs higher (average) cost in allocating a VDC.This shows the effectiveness of IRA and B-IRA in reducingthe inter-cluster bandwidth. Note, IRA and B-IRA performsimilarly in this scenario, under the observed metrics.

The histograms in Fig. 2c show, for each resourceallocator, the number of accepted VDC-requests mappedas k VM-clusters, 1 � k � 4. Here, most of the VDCs wereformed using VM-clusters of sizes one and two, under IRA,B-IRA as well as LBRA. As LBRA did not form VM-clustersbased on the inter-cluster bandwidth, it had to resort toconsiderable number of 4-partitions to accept more VDC-requests. For IRA and B-IRA, while 3-partition helped inincreasing the number of accepted VDC-requests, thecontribution due to 4-partition was insignificant. Whenone or more data center resources become bottleneck,increasing the partition size is not helpful. Recall that, theVM-graph of an input VDC-request in this scenario hadtwo subgraphs such that the bandwidth demand betweentwo VMs of the same subgraphs was much higher thanthat between two VMs of different subgraphs. It is clear

that IRA and B-IRA have made better use of this trafficcharacteristics, by mapping most of the VDCs into as twoVM-clusters (and a good percentage of the remaining asone VM-cluster).

6.5.2 Scenario 2This scenario has a higher edge-to-aggregation over-subscription ratio, in comparison to the previous scenario.The results for this scenario are discussed in Section 5 of thesupplementary file available online.

6.5.3 Scenario 3In comparison to Scenario 2, this scenario increases thenumber of subgraphs in the VM-graph of an input VDC tofour (from two). Figs. 3a and 3b plot the acceptance per-centage and average cost. We see that LBRA is at best ableto accept 70 percent of VDC-requests, and this decreasessignificantly with increasing load. LBRA also incurs muchhigher cost for allocation. While IRA and B-IRA gave muchbetter performance, in the best case accepting more thantwice as many as LBRA accepted, this time they usedhigher number of three and four VM-clusters, to map theaccepted VDCs (see Fig. 5a). Observe that 3-partitions con-tributed more towards increasing the number of acceptedVDCs than 4-partitions. This happens as both the allocatorsstart exploring the possibility of mapping a VDC with oneVM-cluster, and then increases the number of VM-clusters byone, each time the mapping fails. For the same reason, even ifthe minimum k-cut may happen to be for k ¼ 4, IRA andB-IRA may be able to find a successful mapping using threeVM-clusters (without attempting to explore partitions withfour VM-clusters), albeit at higher cost.

Fig. 3. Scenario 3: Acceptance percentage and average cost. (a) Percentage of VDCs accepted. (b) Cost, in terms of bandwidth.

Fig. 4. Scenario 4: Acceptance percentage and average cost. (a) Percentage of VDCs accepted. (b) Cost, in terms of bandwidth.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 6, JUNE 20141390

Page 10: An online integrated resource allocator for guaranteed performance in data centers

6.5.4 Scenario 4In this scenario, a VM forms a communication link to anyother VM in the input VM-graph with equal probability, withbandwidth on links of a VM being uniformly distributed.Hence, the formation of subgraphs in an input VM-graphwith high intra-subgraph demand and low inter-subgraphdemand, is highly unlikely. We studied this scenario for anedge-to-aggregation over-subscription ratio of 2.5 in [9],where we observed 10-23 percent increase in the number ofaccepted VDCs by IRA over LBRA, due to 2-partition. The(edge-to-aggregation) over-subscription ratio we use here is4.8 : 1. Different from previous scenarios, Figs. 4a and 5breveal that IRA outperforms both B-IRA and LBRA in thisscenario, with IRA accepting close to 80 percent of requests.The number of VDCs accepted by LBRA is limited to anaverage of 86 (see Fig. 5b), as bandwidth-oblivious group-ing can soon make the network bandwidth a bottleneck.

B-IRA accepts the lowest number of VDC requests, notaccepting more than 40 percent for any load. As the VM-graph of a request do not have subgraphs with relativelyhigh localized traffic, B-IRA fails to find a k-cut ð2 � k � 4Þwith a realistic inter-cluster bandwidth cost. Fig. 5bsubstantiates this argumentVB-IRA could form onlynegligible number of partitions with size greater thanone. Most of the VDCs accepted by B-IRA where mappedas single VM-clusters; and hence, as seen in Fig. 4b, theaverage cost per accepted VDC is the least for B-IRA(a VDC mapped as one VM-cluster onto a single server-cluster incurs zero cost). Also observe, the gap in the costbetween IRA and LBRA has reduced in comparison toprevious scenarios, revealing that grouping of VM-clustersmay not bring large cost savings when the input VDC doesnot have localized traffic demands.

Fig. 5b shows that IRA and LBRA had to rely onpartitions of sizes three and four to map a good percentageof the accepted VDCs. To see if higher value of n couldincrease the number of VDCs accepted, we simulated thisscenario with n set to six. Fig. 5c plots the partition sizesused for allocating VDCs. Increasing n to six did not helpB-IRA and LBRA in improving their performances, thoughLBRA used 5-partitions to allocate VDCs. Higher partitionsize benefited IRA for low to moderate loads, increasingthe acceptance percentage by � 10 percent for input-VDCnumbers less than or equal to 175. Yet, most of the (new)contributions were due to 5-partitions, with six VM-

clusters contributing none. As the input load increasedbeyond 175 VDCs, higher partition size did not improveperformance of IRA. Clearly, as load increases, higher-sizepartitions, which also leads to higher costs (and hencehigher network bandwidth for mapping such VDCs), cannot increase the number of VDCs accepted.

The observations of performance study are summarizedin Section 6 of the supplementary file available online.

7 CONCLUSION

This work addressed the problem of allocation of resourcesin the context of guaranteeing performance to tenants ofmulti-tenant data centers. We took an integrated approachfor the allocation of server resources and bandwidth asVDCs to tenants. Noting bandwidth as an importantshared resource affecting performance guarantees in adata center, we formulated the resource allocation as anoptimization problem with the objective of increasing thenumber of accepted VDCs by minimizing the bandwidthbetween VM-clusters of a VDC. Showing the problem to beNP-hard, we developed a set of algorithms forming anonline integrated resource allocator, IRA, that groups VMsof a VDC-request into VM-clusters and performs resource-discovery to map the VM-clusters onto the data centerssuch that their resource requirements can be satisfied. Theperformance study demonstrated the effectives of IRA overboth B-IRA and LBRAVwhile IRA accepted significantlyhigher number of requests than LBRA in all scenarios, italso outperformed B-IRA when the traffic demands of VMswere not localized. We also observed, IRA achieves thisperformance using small partition sizes.

ACKNOWLEDGMENT

This work was supported by Singapore A�STAR-SERCresearch grant No: 112-172-0015 (NUS WBS No: R-263-000-665-305). T.N. Le was affiliated with the NUS whilecarrying out this work.

REFERENCES

[1] Cisco Data Center Infrastructure 2.5 Design Guide. [Online].Available: www.cisco.com/application/pdf/en/us/guest/netsol/ns107/c649/ccmigration_09186a008073377d.pdf.

Fig. 5. Number of VM-clusters of size k ðk G ¼ nÞ formed. (a) Scenario 3, maximum partition size n ¼ 4. (b) Scenario 4, maximum partition size n ¼ 4.(c) Scenario 4, maximum partition size n ¼ 6.

DIVAKARAN ET AL.: ONLINE INTEGRATED RESOURCE ALLOCATOR FOR GUARANTEED PERFORMANCE 1391

Page 11: An online integrated resource allocator for guaranteed performance in data centers

[2] M. Al-Fares, A. Loukissas, and A. Vahdat, ‘‘A Scalable, Com-modity Data Center Network Architecture,’’ in Proc. SIGCOMM,2008, pp. 63-74.

[3] H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, ‘‘TowardsPredictable Datacenter Networks,’’ in Proc. SIGCOMM, 2011,pp. 242-253.

[4] E. Cela, The Quadratic Assignment Problem: Theory and Algo-rithms. Norwell, MA, USA: Kluwer, 1998.

[5] M. Chen, S.C. Liew, Z. Shao, and C. Kai, ‘‘Markov Approxima-tion for Combinatorial Network Optimization,’’ in Proc. INFO-COM, 2010, pp. 1-9.

[6] A. Greenberg, J.R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri,D.A. Maltz, P. Patel, and S. Sengupta, ‘‘VL2: A Scalable and FlexibleData Center Network,’’ in Proc. SIGCOMM, 2009, pp. 51-62.

[7] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang,and S. Lu, ‘‘BCube: A High Performance Server-Centric NetworkArchitecture for Modular Data Centers,’’ in Proc. SIGCOMM,2009, pp. 63-74.

[8] C. Guo, G. Lu, H.J. Wang, S. Yang, C. Kong, P. Sun, W. Wu, andY. Zhang, ‘‘SecondNet: A Data Center Network VirtualizationArchitecture with Bandwidth Guarantees,’’ in Proc. ACM Co-NEXT, 2010, pp. 15:1-15:12.

[9] M. Gurusamy, T.N. Le, and D.M. Divakaran, ‘‘An IntegratedResource Allocation Scheme for Multi-Tenant Data-Center,’’ inProc. IEEE LCN, Oct. 2012, pp. 496-504.

[10] J.W. Jiang, T. Lan, S. Ha, M. Chen, and M. Chiang, ‘‘Joint VMPlacement and Routing for Data Center Traffic Engineering,’’ inProc. INFOCOM, 2012, pp. 2876-2880.

[11] S. Kandula, J. Padhye, and V. Bahl, ‘‘Flyways to De-Congest DataCenter Networks,’’ in Proc. ACM HotNets, 2009, pp. 1-6.

[12] C. Lee and Z. Ma, ‘‘The Generalized Quadratic AssignmentProblem,’’ Dept. Mech. Ind. Eng., Univ. Toronto, Toronto, ON,Canada, Tech. Rep., 2003.

[13] X. Meng, V. Pappas, and L. Zhang, ‘‘Improving the Scalability ofData Center Networks with Traffic-Aware Virtual MachinePlacement,’’ in Proc. INFOCOM, 2010, pp. 1-9.

[14] L. Popa, A. Krishnamurthy, S. Ratnasamy, and I. Stoica, ‘‘Fair-Cloud: Sharing the Network in Cloud Computing,’’ in Proc. ACMHotNets, 2011, p. 22.

[15] H. Rodrigues, J.R. Santos, Y. Turner, P. Soares, and D. Guedes,‘‘Gatekeeper: Supporting Bandwidth Guarantees for Multi-TenantDatacenter Networks,’’ in Proc. 3rd Conf. WIOV, 2011, p. 6.

[16] H. Saran and V.V. Vazirani, ‘‘Finding k-Cuts Within Twice theOptimal,’’ in Proc. 32nd Annu. Symp. Found. Comput. Sci., 1991,pp. 743-751.

[17] A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha,‘‘Sharing the Data Center Network,’’ in Proc. NSDI, 2011, p. 23.

[18] A. Singla, C.-Y. Hong, L. Popa, and P.B. Godfrey, ‘‘Jellyfish:Networking Data Centers Randomly,’’ in Proc. NSDI, 2012, p. 17.

Dinil Mon Divakaran is a Research Fellow inthe Department of Electrical and ComputerEngineering at the National University ofSingapore (NUS). Prior to this, he worked as anassistant professor in the School of Computingand Electrical Engineering at the Indian Instituteof Technology (IIT) Mandi. He carried out his PhDat the INRIA RESO team in ENS Lyon, France.His research works revolve around applicationsof game theory, queueing theory and probabilisticmodels, as well as the study of optimization

problems and design of heuristics, all in the broad area of computernetworks. He is also keenly interested in the study of architectures,protocols (specifically TCP) and QoS mechanisms for networks. He is amember of the IEEE.

Tho Ngoc Le received the BS degree in electricalengineering from the National University ofSingapore (NUS), Singapore, in 2011. He ispursuing the PhD degree at the ElectricalEngineering and Computer Science Department,Northwestern University, Evanston, IL, USA. InSummer 2010, he was an internship student atthe Institute for Infocomm Research (I2R),A*STAR, Singapore. From 2011 to 2012, hewas a research engineer at the Communicationsand Networks Laboratory, NUS. His research

interests include resource allocation, game theory, wireless commu-nications and social networks. Mr. Le was a recipient of the SingaporeScholarship from Singapore Ministry of Foreign Affairs, the Alcatel-Lucent Technologies Prize, the Singapore Indian Chamber of Com-merce Medal, the ST Electronics Prize, and the Vietnamese EducationFoundation Fellowship.

Mohan Gurusamy received the PhD degree incomputer science and engineering from theIndian Institute of Technology, Madras, in 2000.He joined the National University of Singapore inJune 2000, where he is currently an AssociateProfessor in the Department of Electrical andComputer Engineering. He is currently servingas an editor for IEEE Transactions on CloudComputing, Elsevier Computer Networks journal,and Springer Photonic Network Communicationsjournal. He has served as the lead guest editor for

two special issues of the IEEE Communications Magazine (OCS),August 2005 and November 2005, and as a co-guest editor for a specialissue of the Elsevier Optical Switching and Networking journal,November 2008. He was the organizer and lead chair for CreateNetGOSP workshops co-located with Broadnets conference, October 2005and October 2006, and September 2008. His research interests are in theareas of Data Center networks, optical networks, Metro/Carrier Ethernetnetworks and wireless sensor networks. He has over 140 publications tohis credit including two books and three book chapters in the area ofoptical networks. He is a Senior Member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 6, JUNE 20141392