Gossip-based Resource Management for Cloud Environments

Embed Size (px)

DESCRIPTION

cloud

Citation preview

  • Gossip-based Resource Managementfor Cloud Environments

    Fetahi Wuhib and Rolf StadlerACCESS Linnaeus Center

    KTH Royal Institute of TechnologyEmail: {fetahi,stadler}@kth.se

    Mike SpreitzerIBM T.J. Watson Research Center

    Email: [email protected]

    AbstractWe address the problem of resource manage-ment for a large-scale cloud environment that hosts sites.Our contribution centers around outlining a distributedmiddleware architecture and presenting one of its keyelements, a gossip protocol that meets our design goals:fairness of resource allocation with respect to hosted sites,efficient adaptation to load changes and scalability in termsof both the number of machines and sites. We formalizethe resource allocation problem as that of dynamicallymaximizing the cloud utility under CPU and memoryconstraints. While we can show that an optimal solutionwithout considering memory constraints is straightforward(but not useful), we provide an efficient heuristic solutionfor the complete problem instead. We evaluate the protocolthrough simulation and find its performance to be well-aligned with our design goals.

    Index Termscloud computing, distributed manage-ment, resource allocation, gossip protocols

    I. INTRODUCTIONWe consider the problem of resource management

    for a large-scale cloud environment. Such an environ-ment includes the physical infrastructure and associatedcontrol functionality that enables the provisioning andmanagement of cloud services. The perspective we takeis that of a cloud service provider, which hosts sites ina cloud environment.

    The stakeholders are depicted in Figure 1a. The cloudservice provider owns and administrates the physicalinfrastructure, on which cloud services are provided. Itoffers hosting services to site owners through a middle-ware that executes on its infrastructure (See Figure 1b).Site owners provide services to their respective users viasites that are hosted by the cloud service provider.

    The type of sites we have in mind for this workranges from smaller websites (that are hosted on a singlephysical machine) up to medium-scale e-commerce sites(whose demand can be satisfied with the resources ofa few dozen machines per site). A cloud, by whichwe mean the infrastructure and the middleware depictedin 1b, would run millions of such sites and includeshundreds of thousands of machines.

    (a) (b)

    Fig. 1. (a) Deployment scenario with the stakeholders of thecloud environment considered in this work. (b) Overall architectureof the cloud environment; this work focuses on resource managementperformed by the middleware layer.

    This work contributes towards engineering a middle-ware layer that performs resource allocation in such acloud environment, with the following design goals.

    1) Performance objective: We consider computationaland memory resources, and the objective is toachieve max-min fairness for computational re-sources under memory constraints.

    2) Adaptability: The resource allocation process mustdynamically and efficiently adapt to changes in thedemand for cloud services.

    3) Scalability: The Resource allocation process mustbe scalable both in the number of machines inthe cloud and the number of sites that the cloudhosts. This means that the resources consumed permachine in order to achieve a given performanceobjective must increase sublinearly with both thenumber of machines and the number of sites.

    Our approach centers around a decentralized designwhereby the components of the middleware layer runon every processing node of the cloud environment. (Werefer to a processing node as a machine in the remainderof the paper.) To achieve scalability, we envision thatall key tasks of the middleware layer, including estimat-ing global states, placing site modules and computingpolicies for request forwarding are based on distributedalgorithms. Further, we rely on a global directory for

    1

    INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT

    fetahiTypewritten Text6th International Conference on Network and Service Management, Niagara Falls, Ontario, Canada, October 25-29, 2010.

  • routing requests from users on the Internet to accesspoints to particular sites inside the cloud.

    The core contribution of the paper is a gossip pro-tocol that can be used to meet the design goals above.While gossip protocols have been studied before for loadbalancing in distributed systems, there are no resultsavailable (to our knowledge) for the case of consideringmemory constraints and the cost of a configurationchange, which make the resource allocation problemmuch harder. Coming up with an optimal and efficientsolution for the problem that does not consider memoryconstraints is quite straightforward (but not useful in ourcontext), and we can only provide an efficient heuristicfor the problem we set out to solve, since the memoryconstraints make it NP-hard.

    The paper is structured as follows. Section II outlinesthe architecture of a middleware layer that performs re-source management for a large-scale cloud environment.Section III formalizes the resource allocation problem.Section IV presents two protocols to solve this problem,one of which is evaluated through simulation in SectionV. Section VI reviews related work, and Section VIIcontains the conclusion of this research and outlinesfuture work.

    II. SYSTEM ARCHITECTURE

    A cloud environment spans several datacenters in-terconnected by an internet. Each of these datacenterscontains a large number of machines that are connectedby a high-speed network. Users access sites hosted bythe cloud environment through the public Internet. A siteis typically accessed through a URL that is translated to anetwork address through a global directory service, suchas DNS. A request to a site is routed through the Internetto a machine inside a datacenter that either processesthe request or forwards it. In this paper, we restrictourselves to a cloud that spans a datacenter containing asingle cluster of machines and leave for further workthe extension of our contribution to an environmentincluding multiple datacenters.

    Figure 2 (left) shows the architecture of the cloudmiddleware. The components of the middleware layerrun on all machines. The resources of the cloud areprimarily consumed by module instances whereby thefunctionality of a site is made up of one or moremodules. In the middleware, a module either containspart of the service logic of a site (denoted by mi inFigure 2) or a site manager (denoted by SMi).

    Each machine runs a machine manager componentthat computes the resource allocation policy, which in-cludes deciding the module instances to run. The re-source allocation policy is computed by a protocol (later

    Fig. 2. The architecture for the cloud middleware (left) and compo-nents for request handling and resource allocation (right).

    in the paper called P) that runs in the resource managercomponent. This component takes as input the projecteddemand for each module that the machine runs. Thecomputed allocation policy is sent to the module sched-uler for implementation/execution, as well as the sitemanagers for making decisions on request forwarding.The overlay manager implements a distributed algorithmthat maintains an overlay graph of the machines in thecloud and provides each resource manager with a list ofmachines to interact with.

    Our architecture associates (one or more) site man-ager with each site(s). Each site manager handles userrequests to a particular site. It has two important com-ponents: a demand profiler and request forwarder. Thedemand profiler estimates the resource demand of eachmodule of the site based on the request statistics, QoStargets, etc. (An example of such a profiler can befound in [1].) This estimate is forwarded to all machinemanagers that run instances of modules belonging to thissite. Similarly, the request forwarder sends user requestsfor processing to instances of modules belonging to thissite. Request forwarding decisions take into account theresource allocation policy and constraints such as sessionaffinity. Figure 2 (right) shows the components of a sitemanager and how they relate to machine managers.

    The remainder of this paper focuses on the func-tionality of the resource manager component. For othercomponents of our architecture, such as overlay managerand demand profiler we rely on known solutions.

    III. FORMALIZING THE PROBLEM OF RESOURCEALLOCATION BY THE CLOUD MIDDLEWARE

    For this work, we consider a cloud as having com-putational resources (i.e., CPU) and memory resources,which are available on the machines in the cloud infras-tructure. As explained earlier, we restrict the discussionto the case where all machines belong to a single clusterand cooperate as peers in the task of resource allocation.The specific problem we address is that of placingmodules (more precisely: identical instances of modules)

    2

  • on machines and allocating cloud resources to thesemodules, such that a cloud utility is maximized underconstraints. As cloud utility we choose the minimumutility generated by any site, which we define as theminimum utility of its module instances. We formulatethe resource allocation problem as that of maximizingthe cloud utility under CPU and memory constraints.The solution to this problem is a configuration matrixthat controls the module scheduler and request forwardercomponents. At discrete points in time, events occur,such as load changes, addition and removal of site ormachines, etc. In response to such an event, the opti-mization problem is solved again, in order to keep thecloud utility maximized. We add a secondary objectiveto the optimization problem, which states that the costof change from the current configuration to the newconfiguration must be minimized.

    A. The ModelWe model the cloud as a system with a set of sites

    S and a set of machines N that run the sites. Each sites S is composed of a set of modules denoted by Ms,and the set of all modules in the cloud is M =

    sSMs.

    We model the CPU demand as the vector (t) =[1(t), 2(t), . . . , |M |(t)]T and the memory demand asthe vector = [1, 2, . . . , |M |]T , assuming that CPUdemand is time dependent while memory demand is not[2].

    We consider a system that may run more than oneinstance of a module m, each on a different machine,in which case its CPU demand is divided among itsinstances. The demand n,m(t) of an instance of m run-ning on machine n is given by n,m(t) = n,m(t)m(t)where

    nN n,m(t) = 1 and n,m(t) 0. We call

    the matrix A with elements n,m(t) the configuration(matrix) of the system. A is a non-negative matrix with1TA = 1T .

    A machine n N in the cloud has a CPU capacityn and a memory capacity n. We use and todenote the vectors of CPU and memory capacities of allthe machines in the system. An instance of module mrunning on machine n demands n,m(t) CPU resourceand m memory resource from n. Machine n allocatesto module m the CPU capacity n,m(t) (which may bedifferent from n,m(t)) and the memory capacity m.The value for n,m(t) depends on the allocation policyin the cloud, and our specific policy (t) is describedin Section IV-A.

    We define the utility un,m(t) generated by an instanceof module m on machine n as the ratio of the allocatedCPU capacity to the demand of the instance on thatparticular machine, namely, un,m(t) =

    n,m(t)n,m(t)

    . (An

    S, N , M set of all sites, machines and modulesMs, s S set of modules of site s(t), R|M| CPU and memory demand vectorsA(t) R|N||M| configuration matrixn,m(t) R CPU demand of the instance of module m

    on machine nn,m(t) R CPU allocated to the instance of module

    m on machine n(t) R|N||M| CPU allocation matrix with elements

    n,m(t)

    , R|N| CPU and memory capacity vectorsun,m(t), Uc utility generated by an instance of module

    m on machine n, utility generated by thecloud

    TABLE INOTATIONS USED IN FORMALIZING RESOURCE ALLOCATION

    instance with n,m = 0 generates a utility of .) Wefurther define the utility of a module m as um(t) =minnN{un,m(t)} and that of a site as the minimum ofthe utility of its modules. Finally, the utility of the cloudU c is the minimum of the utilities of the sites it hosts.As a consequence, the utility of the cloud becomes theminimum utility of any module instance in the system.

    We model the system as evolving at discrete points intime t = 0, 1, . . ..

    Table I summarizes the notations used in this paper.

    B. The Optimization Problem

    For the above model, we consider a cloud withCPU capacity , memory capacity , and demandvectors ,. We first discuss a simplified version ofthe problem. It consists of finding a configuration A thatmaximizes the cloud utility U c:

    maximize U c(A,)

    subject to A 0, 1TA = 1T (a)(A,)1 (b)

    (OP(1))

    Our concept of utility is max-min fairness (cf. [2]),and our goal is to achieve fairness among sites. Thismeans that we want to maximize the minimum utility ofall sites, which we achieve by maximizing the minimumutility of all module instances.

    Constraint (a) of OP(1) relates to dividing into sharesthe CPU demand of each module into the demand ofits instances. The constraint expresses that all sharesare non-negative and add up to 1 for each module.Constraint (b) says that, for each machine in the cloud,the allocated CPU resources can not be larger than theavailable capacity. is the resource allocation functionwhich we discuss in Section IV-A.

    3

  • We now extend OP(1) to the problem that capturesthe cloud environment in more detail. First, we take intoaccount the memory constraints on individual machines,which significantly increases the problem complexity.Second, we consider the fact that the system must adaptto external events described above, in order to keepthe cloud utility maximized. Therefore, the problembecomes one of adapting the current configuration A(t)at time t to a new configuration A(t+ 1) at time t+ 1which achieves maximum utility at minimum cost ofadapting the configuration.

    maximize U c(A(t+ 1),(t+ 1))minimize c(A(t), A(t+ 1))

    subject to A(t+ 1) 0, 1TA(t+ 1) = 1T(A(t+ 1),(t+ 1))1 sign(A(t+ 1)) .

    (OP(2))

    This optimization problem has prioritized objectives inthe sense that, among all configurations A that maximizethe cloud utility, we select one that minimizes the costfunction c. (The cost function we choose for this workgives the number of module instances that are startedto reconfigure the system from the current to the newconfiguration.)

    While this paper considers only events in form ofchanges in demand, OP(2) allows us to express (andsolve) the problem of finding a new allocation after otherevents, including adding or removing sites or machines.

    IV. A PROTOCOL FOR DISTRIBUTED RESOURCEALLOCATION

    In this section, we present a protocol P, which is aheuristic algorithm for solving OP(2) and which repre-sents our proposed protocol for resource allocation in acloud environment.

    P is a gossip protocol and has the structure of a round-based distributed algorithm (whereby round-based doesnot imply that the protocol is synchronous). When exe-cuting a round-based gossip protocol, each node selectsa subset of other nodes to interact with, whereby theselection function is often probabilistic. Nodes interactvia small messages, which are processed and triggerlocal state changes. In this work, node interaction followsthe so-called push-pull paradigm, whereby two nodesexchange state information, process this information andupdate their local states during a round.

    P runs on all machines of the cloud. It is invokedat discrete points in time, in response to a load change.

    The output of the protocol, the configuration matrix A, isdistributed across the machines of the system. A controlsthe start and stop of module instances and determinesthe control policies for module schedulers and requestforwarders. The protocol executes in the resource man-ager components of the middleware architecture (SeeFigure 2). A set of candidate machines to interact withis maintained by the overlay manager component of themachine manager. We assume that the time it takes forP to compute a new configuration A is small comparedto the time between events that trigger consecutive runsof the protocols. At the time of initialization, P reads asinput a feasible configuration of the system (see below).At later invocations, the protocol reads as input theconfiguration matrix produced during the previous run.

    A. Functionalities the protocol P Uses

    a) Random selection of machines: P relies on theability of a machine to select another machine of thecloud uniformly at random. In this work, we approximatethis ability by using CYCLON, an overlay protocol thatproduces a time-varying network graph with propertiesof a random network [3].

    b) Resource allocation and module scheduling pol-icy: In this work, machines apply a resource alloca-tion policy that allocates CPU resources to moduleinstances proportional to their respective demand, i.e.,n,m(t) = n,m(t)n/

    i n,i. Such a policy respects

    the constraints in OP(1) and OP(2) regarding CPUcapacity.

    c) Computing a feasible configuration: P requiresa feasible configuration as input during its initializationphase. A simple greedy algorithm can be used for thispurpose, which we present in [4] due to space limitation.

    B. Protocol P: An Optimal Solution to OP(1)

    We developed the protocol P, which is a distributedsolution to OP(1). P is a gossip protocol that producesa sequence of configuration matrices A(r), r = 1, 2, . . .,such that the series of cloud utilities U c(A(r),) con-verges exponentially fast to the optimal utility. Dueto space limitation, P is described and its propertiesproved in [4]. We would encourage the reader to lookup this protocol, as it is quite simple and enables a betterunderstanding of P, which can be seen as an extension ofP. During each round of P, two machines perform anequalization step whereby CPU demand is moved fromone machine to another machine in such a way that theirrelative demands are equalized. The relative demand ofa machine n is defined as vn =

    m n,m/n.

    4

  • C. Protocol P: A Heuristic Solution to OP(2)

    OP(2) differs from OP(1) in that memory constraintsof individual machines are considered and a secondaryobjective is added for the purpose of minimizing thecost of adapting the system from the current to a newconfiguration that maximizes the utility for the newdemand. Introducing local memory constraints to theoptimization problem turns OP(1), which we showed canbe efficiently solved for many practical cases [4], into anNP-hard problem [2].

    P employs the same basic mechanism as P as itattempts to equalize the relative demands of pairs of ma-chines during a protocol round. Due to the local memoryconstraints, such a step does not always succeed.

    P uses the following approach to achieve its objectives.First, pairs of machines that execute an equalization stepare often chosen in such a way that they run instances ofcommon modules. To support this concept, we maintainon each machine n the set Nn of machines in the cloudthat run module instances common with n. To avoid thepossibility of the cloud being partitioned into disjointsets of interacting machines, n is occasionally pairedwith a machine outside of the set Nn to execute anequalization step. This dual approach keeps low theneed for starting new module instances and thus keepsthe cost low. Second, during an equalization step, Pattempts to reduce the difference in relative demandbetween two machines, in case it cannot equalize thedemand. Further, P attempts to execute an equalizationstep in such a way that the demand for a specificmodule is shifted to one machine only. This conceptaims at increasing the probability that an equalizationstep succeeds in equalizing the relative demands, thusincreasing the cloud utility.

    The pseudocode of P is given in Algorithm 1. To keepthe presentation simple, we omit thread synchronizationprimitives which prevent concurrent machine to machineinteractions. Note that setting n,m = 0 implies stoppingmodule m on machine n.

    During the initialization of machine n, the algorithmreads the CPU demand vector, the CPU and memorycapacity vectors, and the row of the configuration matrixfor n. (For an efficient implementation, n must onlyread those vector components that refer to itself andits module instances.) Then, it starts two threads: anactive thread, in which the machine periodically executesa round, and a passive thread that waits for anothermachine to start an interaction.

    The active thread executes rmax rounds. In eachround, n chooses a machine n uniformly at randomfrom the set Nn with probability p and from the set

    Algorithm 1 Protocol P computes a heuristic solutionfor OP(2) and returns a configuration matrix A. Codefor node n.initialization1: read ,,, rown(A), Nn;2: start the passive and active threadsactive thread3: for r = 1 to rmax do4: if rand(0..1) < p then5: choose n at random from Nn;6: else7: choose n at random from N Nn;8: send(n, rown(A));

    rown(A) =receive(n);

    9: equalizeWith(n, rown(A));10: sleep(roundDuration);11: write rown(A);passive thread12: while true do13: rown(A)=receive(n

    );send(n, rown(A));

    14: equalizeWith(n, rown(A));

    proc equalizeWith(j, rowj(A))1: l = arg max{vn, vj}; l = arg min{vn, vj};2: if j Nn then3: moveDemand1(l, rowl(A), l

    , rowl(A));4: else5: moveDemand2(l, rowl(A), l

    , rowl(A));

    proc moveDemand1(l, rowl(A), l, rowl(A))1: compute such that

    1l

    (m l,m ) = 1l (

    m l,m + )

    2: let mod be an array of all modulesthat run on both l and l, sorted byincreasing l,m

    3: for i = 1 to |mod| do4: m = mod[i]; = min(, l,m);5: = ; = l,m l,m ;

    l,m += ; l,m = ;proc moveDemand2(l, rowl(A), l, rowl(A))1: compute such that

    1l

    (m l,m ) = 1l (

    m l,m + )

    2: let mod be an array of all modulesthat run on l, sorted by decreasingl,mm

    ;3: for i = 1 to |mod| do4: m = mod[i]; = min(, l,m);5: if m +

    i|l,i>0 i l then

    6: = ; = l,m l,m ;l,m += ; l,m = ;

    N Nn with probability 1 p. (For the evaluation ofP in Section V, we set p = |Nn|1+|Nn| .) Then, n sendsits state (i.e., rown(A)) to n , receives ns state as aresponse, and calls the procedure equalizeWith(),which performs the equalization step. The passive thread

    5

  • executes in a continuous loop. Whenever n receives thestate from another machine n, it responds by sendingits own state to n and performing an equalization stepby invoking equalizeWith().

    The procedure equalizeWith() attempts to equal-ize the relative demands of machines n and n.It first identifies the machine l with the larger (orequal) relative demand and the machine l with thelower relative demand. Then, if n belongs to Nnand thus runs at least one common module instance,procedure moveDemand1() is invoked. OtherwisemoveDemand2() is invoked.moveDemand1() equalizes (or reduces the differ-

    ence) of the relative demands of the two machines, byshifting demand from the machine l with the largerrelative demand to the machine l with the smallerrelative demand. It starts by computing the demand that needs to be shifted from l to l (step 1). Then, fromthe set of modules that run on both machines, taking aninstance with the smallest demand on l, it proceeds toshift the demand from l to l, until a total of demandis shifted, or it has exhausted the set of modules.moveDemand2() equalizes (or reduces the differ-

    ence) of the relative demands of the two machines,by moving demand from the machine with larger rel-ative demand to the machine with smaller relative de-mand. Unlike moveDemand1(), moveDemand2()starts one or more module instances at the destinationmachine, to move demand from the source machine tothe destination, if sufficient memory at the destinationmachine is available. Finding a set of instances at thesource that equalize the relative demands of the partic-ipating machines while observing the available memoryof the destination is a Knapsack problem. A greedyapproximation method is applied, whereby the modulem with the largest value of l,mm is moved first, followedby the second largest, etc., until the relative demands areequalized or the set of candidate modules is exhausted[5].

    V. EVALUATION THROUGH SIMULATION

    We have evaluated P through extensive simulationsusing a discrete event simulator that we developed in-house. We simulate a distributed system that runs themachine manager components of all machines in thecloud. Specifically, these machine managers execute theprotocol P, which computes the allocation matrix A, andalso the CYCLON protocol, which provides for P thefunction of selecting a random neighbor. The externalevents for this simulation are the changes in demandvector .

    Evaluation metrics: We evaluate the protocol invarious scenarios and measure the following metrics. Weexpress the fairness of resource allocation through thecoefficient of variation of site utilities, computed as theratio of the standard deviation divided by the averageof the utilities. We measure the satisfied demand as thefraction of sites that generate a utility larger than 1.We measure the cost of reconfiguration as the ratio ofmodule instances started to module instances running,per machine.

    Generating the demand vectors and : In allscenarios, the number of modules of a site is chosen froma discrete Poisson distribution with mean 1, incrementedby 1. The memory demand of a module is chosenuniformly at random from the set c {128MB, 256MB,512MB, 1GB, 2GB}. For a site s, at each change indemand, the demand profiler generates CPU demandschosen from an exponential distribution with mean (s).We choose the distribution for (s) among all sites tobe Zipf distributed with = 0.7, following evidence in[6]. The maximum value for the distribution is c500GCPU units and the population size used is 20,000. For amodule m of site s, we choose a demand factor m withmMs m = 1, chosen uniformly at random, which

    describes the share of module m in the demand of thesite s. c and c are scaling factors, see below.

    Capacity of machines: A machine has CPU capac-ity selected uniformly at random from {2, 4, 8, 16}GCPU units and a memory capacity selected uniformlyat random from {2, 4, 8, 16}GB (CPU and memorycapacities are chosen independently of each other).

    Choosing machines for interaction: Each machineruns CYCLON, a gossip protocol that implements ran-dom neighbor selection.

    Scenario parameters: We evaluate the performanceof our resource allocation protocol P under varyingintensities of CPU and memory load which are defined asfollows. The CPU load intensity is measured by the CPUload factor (CLF), which is the ratio of the total CPUdemand of sites to the total CPU capacity of machinesin the cloud. Similarly, the memory load factor(MLF) isthe ratio of the total memory demand of sites (assumingeach module runs only one instance) to the total memorycapacity of all machines in the cloud. In the simulations,we vary CLF and MLF by changing c and c . In thereported experiments, we use the following parametersunless stated otherwise:

    |N |=10,000, |S|=24,000 rmax = 30, CLF = MLF = 0.5 maximum number of instances/module: 100 number of load changes during a run: 100

    6

  • A. Fairness

    First, we evaluate P regarding the fairness of re-source allocation for CLF={0.1, 0.4, 0.7, 1.0, 1.3} andMLF={0.15, 0.35, 0.55, 0.75, 0.95}. (We leave outMLF=1 because there may not exist a feasible solutionor our initialization algorithm may not find it.) Wecompare the performance of our protocol to that of anideal system, which can be thought of as a centralizedsystem which performs resource allocation on a giantsingle machine that has the aggregate CPU and aggregatememory capacities of the entire cloud. The performanceof this centralized system gives us a bound on theperformance P.

    Figure 3a shows that the measured fairness metricseems to be independent of CLF. (Note that the value ofthis metric that corresponds to our fairness goal is 0.) Weexpect this because P allocates CPU resources propor-tional to the demand of a module instance, regardless ofthe available capacity on the particular machine. Second,the figure shows that resource allocation becomes lessfair when MLF is increased. For example, the averagedeviation of the utilities of sites is about 16% from theaverage utility for MLF of 15%. However, this valueincreases to about 100% when MLF is at 75%. Thisbehavior is also to be expected, since increasing MLFresults in machines being less likely to have suffcientmemory for starting new instances. Note that the idealsystem always achieves our fairness goal since its mem-ory is not fragmented.

    B. Scalability

    In this scenario, we measure the dependence of ourevaluation metrics on the size of the cloud. To achievethis, we run simulations for a cloud with (2,500, 5,000,10,000, 20,000, 40,000, 160,000) machines and (6,000,12,000, 24,000, 48,000, 96,000, 384,000) sites respec-tively (keeping the ratio of sites to machines at 2.4,which ensures that CLF and MLF are kept close to thedefault value of 0.5). Figure 3b shows the result obtained,which indicates that all metrics considered for this evalu-ation are independent of the system size. In other words,if the number of machines grows at the same rate as thenumber of sites, (while the CPU and memory capacitiesof a machine, as well as all parameters characterizing asite, such as demand, number of modules, etc., stay thesame) then we expect all considered metrics to remainconstant. Note that our conclusion is related exclusivelyto the scalability of the protocol P. The complete resourcemanagement system includes many more functions thathave not been evaluated here, for instance, the scalabilityof effectively choosing a random peer.

    (a) Fairness of the resource allocation in function of the CPU loadfactor (CLF) and the memory load factor (MLF) of the cloud. A valueof 0 achieves the fairness objective of the cloud.

    (b) Scalability with respect to the number of machines and sites.

    Fig. 3. The performance of the resource allocation protocol P withrespect to fairness and scalability.

    C. Satisfied demand and cost of reconfiguration

    We evaluated P with regards to satisfied demandand cost of reconfiguration for different values of CLFand MLF. Due to space limitation, the details of theevaluation and the results are presented in [4]. Withregards to satisfied demand, the results show that ourprotocol satisfies more than 95% of all site demandsfor CLF 70% and MLF 55%. With regards tocost of reconfiguration, the result shows that the costincreases with decreasing MLF. For instance, the costof reconfiguration is less than 3% for MLF of 95%.However, it increases to 35% for MLF of 15%.

    VI. RELATED WORKThe problem of application placement in the context

    of resource management for datacenters has been studiedbefore (e.g., [2], [7]), and solutions are already availablein middleware products [8]. While these product solu-tions allow for a fair resource allocation in a similarway as our scheme does, they rely on centralized archi-tectures, which do not at all scale to system sizes weconsider in this paper.

    7

  • The work in [9], which has been extended by [10]presents a distributed middleware layer for applicationplacement in datacenters. As in this paper, the goal ofthat work is to maximize a cluster utility under changingdemand, although a different concept of utility is used.The proposed design in [9], [10] scales with the numberof machines, but it does not scale in the number ofapplications, as the design in this paper does. (Theconcept of an application in the referenced work roughlycorresponds to concept of a site in this paper.)

    Distributed load balancing algorithms have been ex-tensively studied for homogeneous as well as hetero-geneous systems, for both divisible and indivisible de-mands. These algorithms typically fall into two classes:diffusion algorithms (e.g., [11]) and dimension exchangealgorithms (e.g., [12]). Convergence results for differentnetwork topologies and different norms (that measurethe distance between the system state and the optimalstate) have been reported, and it seems to us that theproblem is well understood today. The key differenceto the problem addressed in this paper is that thesealgorithms do not take into account memory constraints.Considering memory constraints makes the problem NP-hard and does require a new approach.

    VII. DISCUSSION AND CONCLUSION

    With this paper, we make a significant contributiontowards engineering a resource management middlewarefor a site-hosting cloud environment. We identify akey component of such a middleware and present aprotocol that can be used to meet our design goalsfor resource management: fairness of resource allocationwith respect to sites, efficient adaptation to load changesand scalability of the middleware layer in terms of boththe number of machines in the cloud as well as thenumber of hosted sites.

    We presented a gossip protocol that computes aheuristic solution to the resource allocation problem andevaluated its performance through simulation. In all thescenarios we investigated, we observe that the protocolqualitatively behaves as expected based on its design.For instance, regarding fairness, the protocol performsclose to an ideal system for scenarios where the ratio ofthe total memory capacity to the total memory demandis large. More importantly, the simulations suggest thatthe protocol is scalable in the sense that all inves-tigated metrics do not change when the system size(i.e., the number of machines) increases proportionalto the external load (i.e., the number of sites). Notethat if we would solve the resource allocation problemexpressed in OP(2) by P in a centralized system, then theCPU and memory demand for that resource allocation

    system would increase linearly with the system size. Thisstrongly suggests to us that a centralized solution for theproblem we address in this paper will not be feasible.

    The results reported in this paper are building blockstowards engineering a resource management solutionfor large-scale clouds. Pursuing this goal, we plan toaddress the following issues in future work: (1) Developa distributed mechanism that efficiently places new sites.(2) Extend the middleware design to become robustto machine failures. (3) Extend the middleware designto span several clusters and several datacenters, whilekeeping module instances of the same site close toeach other, in order to minimize response times andcommunication overhead.

    REFERENCES[1] G. Pacifici, W. Segmuller, M. Spreitzer, and A. Tantawi, Dy-

    namic estimation of CPU demand of web traffic, in valuetools06: Proceedings of the 1st international conference on Perfor-mance evaluation methodolgies and tools. New York, NY, USA:ACM, 2006, p. 26.

    [2] D. Carrera, M. Steinder, I. Whalley, J. Torres, and E. Ayguade,Utility-based placement of dynamic web applications with fair-ness goals, in Network Operations and Management Symposium,2008. NOMS 2008. IEEE, april 2008, pp. 9 16.

    [3] S. Voulgaris, D. Gavidia, and M. van Steen, CYCLON: Inex-pensive membership management for unstructured p2p overlays,Journal of Network and Systems Management, vol. 13, no. 2, pp.197217, 2005.

    [4] F. Wuhib, R. Stadler, and M. Spreitzer, Gossip-based resourcemanagement for cloud environments (long version), KTH RoyalInstitute of Technology, Tech. Rep., August 2010, TRITA-EE2010:032.

    [5] G. B. Dantzig, Discrete-Variable Extremum Problems, OPER-ATIONS RESEARCH, vol. 5, no. 2, pp. 266288, 1957.

    [6] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, Webcaching and Zipf-like distributions: evidence and implications, inINFOCOM 99. Eighteenth Annual Joint Conference of the IEEEComputer and Communications Societies. Proceedings. IEEE,vol. 1, 21-25 1999, pp. 126 134 vol.1.

    [7] C. Tang, M. Steinder, M. Spreitzer, and G. Pacifici, A scalableapplication placement controller for enterprise data centers, inWWW 07: Proceedings of the 16th international conference onWorld Wide Web. New York, NY, USA: ACM, 2007, pp. 331340.

    [8] IBM, IBM WebSphere Application Server. [Online]. Avail-able: http://www.ibm.com/software/webservers/appserv/extend/virtualenterprise/

    [9] C. Adam and R. Stadler, Service middleware for self-managinglarge-scale systems, Network and Service Management, IEEETransactions on, vol. 4, no. 3, pp. 5064, April 2008.

    [10] J. Famaey, W. De Cock, T. Wauters, F. De Turck, B. Dhoedt, andP. Demeester, A latency-aware algorithm for dynamic serviceplacement in large-scale overlays, in IM09: Proceedings ofthe 11th IFIP/IEEE international conference on Symposium onIntegrated Network Management. Piscataway, NJ, USA: IEEEPress, 2009, pp. 414421.

    [11] G. Cybenko, Dynamic load balancing for distributed memorymultiprocessors, Journal of Parallel and Distributed Computing,vol. 7, no. 2, pp. 279 301, 1989.

    [12] C. Z. Xu and F. C. M. Lau, Analysis of the generalized dimen-sion exchange method for dynamic load balancing, Journal ofParallel and Distributed Computing, vol. 16, pp. 385393, 1992.

    8