31
Online Virtual Machine Allocation with Lifetime and Load Predictions Niv Buchbinder * Yaron Fairstein Konstantina Mellou Ishai Menache § Joseph (Seffi) Naor Abstract The cloud computing industry has grown rapidly over the last decade, and with this growth there is a significant increase in demand for compute resources. Demand is manifested in the form of Virtual Machine (VM) requests, which need to be assigned to physical machines in a way that minimizes resource fragmentation and efficiently utilizes the available machines. This problem can be modeled as a dynamic version of the bin packing problem with the objective of minimizing the total usage time of the bins (physical machines). Earlier works on dynamic bin packing assumed that no knowledge is available to the scheduler and later works studied models in which lifetime/duration of each “item” (VM in our context) is available to the scheduler. This extra information was shown to improve exponentially the achievable competitive ratio. Motivated by advances in Machine Learning that provide good estimates of workload char- acteristics, this paper studies the effect of having extra information regarding future (total) demand. In the cloud context, since demand is an aggregate over many VM requests, it can be predicted with high accuracy (e.g., using historical data). We show that the competitive factor can be dramatically improved by using this additional information; in some cases, we achieve constant competitiveness, or even a competitive factor that approaches 1. Along the way, we design new offline algorithms with improved approximation ratios for the dynamic bin-packing problem. 1 Introduction Cloud computing is a growing business which has revolutionized the way computing resources are consumed. The emergence of cloud computing is attributed to lowering the risks for end-users (e.g., scaling-out resource usage based on demand), while allowing providers to reduce their costs by efficient management and operation at scale. One popular way of consuming cloud resources is through Virtual Machine (VM) offerings. Users rent VMs on demand with the expectation of a seamless experience until they decide to terminate usage. In turn, cloud resource managers place VMs on physical servers that have enough capacity to serve them. The specific VM allocation decisions have a direct impact on resource efficiency and return on investment. For example, * Tel Aviv University, [email protected] Technion, [email protected] Microsoft Research, [email protected] § Microsoft Research, [email protected] Technion, [email protected] 1

Online Virtual Machine Allocation with Lifetime and Load

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Online Virtual Machine Allocation with Lifetime and Load

Online Virtual Machine Allocation with Lifetime and Load

Predictions

Niv Buchbinder∗ Yaron Fairstein† Konstantina Mellou‡ Ishai Menache§

Joseph (Seffi) Naor ¶

Abstract

The cloud computing industry has grown rapidly over the last decade, and with this growththere is a significant increase in demand for compute resources. Demand is manifested in theform of Virtual Machine (VM) requests, which need to be assigned to physical machines in away that minimizes resource fragmentation and efficiently utilizes the available machines. Thisproblem can be modeled as a dynamic version of the bin packing problem with the objective ofminimizing the total usage time of the bins (physical machines). Earlier works on dynamic binpacking assumed that no knowledge is available to the scheduler and later works studied modelsin which lifetime/duration of each “item” (VM in our context) is available to the scheduler.This extra information was shown to improve exponentially the achievable competitive ratio.

Motivated by advances in Machine Learning that provide good estimates of workload char-acteristics, this paper studies the effect of having extra information regarding future (total)demand. In the cloud context, since demand is an aggregate over many VM requests, it can bepredicted with high accuracy (e.g., using historical data). We show that the competitive factorcan be dramatically improved by using this additional information; in some cases, we achieveconstant competitiveness, or even a competitive factor that approaches 1. Along the way, wedesign new offline algorithms with improved approximation ratios for the dynamic bin-packingproblem.

1 Introduction

Cloud computing is a growing business which has revolutionized the way computing resources areconsumed. The emergence of cloud computing is attributed to lowering the risks for end-users(e.g., scaling-out resource usage based on demand), while allowing providers to reduce their costsby efficient management and operation at scale. One popular way of consuming cloud resourcesis through Virtual Machine (VM) offerings. Users rent VMs on demand with the expectation of aseamless experience until they decide to terminate usage. In turn, cloud resource managers placeVMs on physical servers that have enough capacity to serve them. The specific VM allocationdecisions have a direct impact on resource efficiency and return on investment. For example,

∗Tel Aviv University, [email protected]†Technion, [email protected]‡Microsoft Research, [email protected]§Microsoft Research, [email protected]¶Technion, [email protected]

1

Page 2: Online Virtual Machine Allocation with Lifetime and Load

inefficient placement mechanisms might result in fragmentation and unnecessary over-provisioningof physical resources.

Our goal in this paper is to design algorithms for allocating VMs to physical machines in acloud facility (e.g., cluster, region), so that the total active machine-time, taken over all machines,is minimized; a machine is considered active if one or more VMs run on it. When a machine becomesinactive, it can be returned to the general pool of machines, and therefore does not contribute tothe cost function. In certain scenarios, the same optimization can also lead to power savings, underthe assumption that empty machines can be kept in an idle, low-power mode [13, 24].

The problem of allocating VMs to physical machines can be modeled as a generalization of theclassic (and extensively studied) static bin packing problem, where the goal is to pack a set of itemsof varying sizes, while minimizing the number of bins used [14, 27, 8]. The VM allocation problemcorresponds to a dynamic bin packing problem in which items, or VMs, arrive over time and laterdepart [9]. Minimizing the total active-machine time translates then to minimizing the total usagetime of the bins, or machines [21, 23, 29, 3]. The VM allocation problem is of interest in both theuniform size case, in which all items have the same size (and each bin can pack at most g items)[11], but especially under the more general setting, which we refer to as the non-uniform size case.The problem is known to be NP-hard even in the uniform case when g = 2 [31].

The dynamic bin packing problem has been studied in both offline and online settings. In theonline setting, items arrive over time, giving rise to two different models. In the non-clairvoyantcase [11, 21, 30] no information is given to the scheduler upon arrival of a new item, and indeedonly poor performance is obtained when there is a large variation in item duration times [15] (seeadditional discussion later). In the clairvoyant setting [23, 29, 3] the departure time (or duration)of an item is revealed upon arrival, allowing for significant performance improvements.

The clairvoyant model assumes that highly accurate lifetime predictions are available to ascheduler. In the cloud context, this information has recently been obtained through MachineLearning (ML) tools [10, 5, 22], which are deployed to support resource management decisions forthe underlying systems (see [20, 6, 10, 12] and references therein). ML is increasingly used, not onlyfor lifetime prediction, but also to predict other metrics, such as machine health [10] and futuredemand [13].

Motivated by the recent momentum in applying ML for cloud systems, we take the online dy-namic bin packing model a step further, and study the advantage of having additional information,on top of VM or item lifetimes. Specifically, we focus on designing online algorithms that possesssome form of prediction about future demand. From a practical perspective, we note that demandis an aggregate over numerous requests; as such, it can be predicted with high accuracy [13, 33] (infact, higher accuracy than individual VM lifetime predictions).

1.1 Our Results

We first describe the setting in which we study the VM scheduling problem. We assume that eachVM (item) has a demand (size), and each machine (bin) has unit size. Thus, the total demandof VMs assigned to a machine at any point of time cannot exceed 1. In the uniform size case weassume that the size of all VMs is 1

g for some integer g. The VMs arrive over time and need to beassigned to machines for their duration (lifetime) in the system. As there is no migration of VMsacross physical machines, the initial assignment remains as it is until the VM terminates.

We can assume without loss of generality that at any given time there is at least one active VM.We refer to the aggregate size of the VMs that are active at time t as the total demand/load at t.

2

Page 3: Online Virtual Machine Allocation with Lifetime and Load

A physical machine is considered active when one or more VMs are assigned to it. The goal is tominimize the total time that the machines remain active. An important parameter in our resultsis µ, defined to be the ratio between the maximum and minimum duration, taken over all VMs.Finally, let k denote the parameter of the known harmonic bin packing algorithm, and let Πk beits asymptotic competitive ratio. 1 [19].

As there is always at least one active VM in each time step, the optimal cost of the dynamicbin packing problem is at least T , the length of the time horizon. To facilitate the understandingof our results, we divide both the optimal cost, OPT , as well as our algorithm’s cost by T . LetOPTavg denote the optimal cost divided by T . The value OPTavg should thus be read as theaverage number of machines used by an optimal solution. This change, of course, does not affectthe multiplicative factor in the approximation/competitive ratios we get. However, any additiveterm should now be read as the average number of additional machines our algorithm is usingover the (average) number of machines an optimal solution is using. We note that in our cloudcomputing context the average number of machines is typically in the order of thousands.

We study the VM scheduling problem in both offline and online settings.

1.1.1 Offline Algorithms

We first show the following result for the offline problem.

Theorem 1.1. For any integer k ≥ 3, there exist offline scheduling algorithms whose averagecost is at most:

Non-uniform size case Uniform size case

Algorithms 3, 8 Πk ·OPTavg +O(√

OPTavg · k · logµ)

OPTavg +O(√

OPTavg · logµ)

Algorithms 2, 8 2Πk(1 + 1k−2) ·OPTavg + k 2 ·OPTavg

Previous results 4 ·OPTavg [29] 2 ·OPTavg [1, 17]

The 2 ·OPTavg upper bound for the uniform size case is well known [1, 17, 29]. However, it isdescribed here not only to compare against our other results, but also because the techniques usedto prove it are later used in the online case; though fairly simple, these techniques are somewhatdifferent from previous proofs. In the above theorem we obtain two improved new bounds for theoffline case. These improvements are in the spirit of the asymptotic approximation ratio commonlyused in the standard bin packing problem. Algorithm 2 has multiplicative approximation ratio2Πk(1+ 1

k−2), while using extra k machines in each time step. Thus, its “asymptotic” approximationapproaches 2Π∞ ≈ 3.38, which is better than the best known (strict) 4-approximation for theproblem2.

Algorithm 3 achieves an even better multiplicative approximation ratio of Πk (that approaches1.69), albeit only when the average number of machines used by an optimal solution is relativelylarge. Specifically, it uses an extraO(

√OPTavg · k · logµ) machines in each time step. If the average

number of machines used in the optimal solution is much larger than logµ, the latter additive termbecomes negligible compared to OPTavg. In the uniform size case, the asymptotic approximationof the algorithm is 1, as may be expected when sizes are uniform. When the maximum demand of

1The harmonic algorithm is parameterized by k, which controls an additive term in its competitive ratio. Πk isa monotonically decreasing number that approaches Π∞ ≈ 1.691. Πk quickly becomes close to 1.691, for example,Π6 = 1.7 and Π12 ≈ 1.692.

2We remark that Algorithm 2 also achieves the same 4-approximation (with no additive term).

3

Page 4: Online Virtual Machine Allocation with Lifetime and Load

any VM is small (and also in some other scenarios), our performance guarantees are better thanthose outlined in Theorem 1.1 (actually, in both offline and online settings). We refer the readerto Section 3 and Section 5 for more details.

1.1.2 Online Algorithms with Additional Information

Our main contribution in this paper is the construction of new online algorithms that have access toadditional information on future demand, leading to improved competitive ratios. Interestingly, ouronline algorithms are inspired by their offline counterparts. Earlier results assumed that an onlinescheduler gets no extra information upon arrival of a VM request. The performance guarantee ofthese algorithms turned out to be very poor in many cases. Better results were later obtained forthe clairvoyant model, in which duration of requests are revealed upon arrival.

Specifically, we explore two novel models in which the scheduler is provided with predictionsabout demand. In the first model, the average load is known to the scheduler (a single value), andin the second model the total load in each of the future time steps is known to the scheduler. Weremark that predicting future cumulative demand is much simpler than obtaining the full structureof an instance, which requires predicting future arrivals of individual requests.

Theorem 1.2. For any integer k ≥ 2, there exist online scheduling algorithms with average costat most:

Extra information Non-uniform size case Uniform size case

beyond duration

Average load Πk ·OPTavg + k ·O(√

OPTavg · logµ) OPTavg +O(√

OPTavg · logµ)

Future load vector 8 ·OPTavg 2 ·OPTavg

No extra information O(√

logµ) ·OPTavg [3] O(√

logµ) ·OPTavg [3]

In the above table, we compare our results with the previously best known online result inthe clairvoyant model, due to Azar and Vainstein [3]3. As indicated in the table, [3] designed analgorithm whose total cost is at most O(

√logµ) · OPTavg, and proved that this ratio is optimal.

Our results demonstrate that with more information the competitive ratio can be dramaticallyimproved. Suppose that the only additional information provided is the average load (taken overthe full time horizon), and that the average number of machines used is much larger than logµ;then, we obtain a constant competitive ratio that approaches Π∞ ≈ 1.69 in the non-uniform sizecase and an asymptotic ratio of 1 in the uniform size case. Thus, our performance guarantee isalways better than [3], and it is the same when the average load is O(1).

If the load at all future times is known, we achieve a (strict) constant competitive ratio underno additional assumptions. This is in contrast to the Ω(

√logµ) lower bound on the competitive

ratio of any algorithm without this extra knowledge [3].In Section 4.4 we complement our results and analyze the performance of our algorithms when

the average load prediction, as well as interval lengths predictions, are inaccurate.We complement the above results by generalizing the lower bound of [3] to take into account

also OPTavg showing that the additive term O(√

OPTavg · logµ) is indeed unavoidable, if only theaverage future load (and lifetime) is available to an online algorithm.

3We note that similarly to the algorithm of [3], our algorithm also does not need to know the value of µ upfront.

4

Page 5: Online Virtual Machine Allocation with Lifetime and Load

Theorem 1.3. The average cost of any online algorithm is at least Ω(√

OPTavg · logµ). The

bound holds even for the uniform size case and with prior knowledge of the average load and µ.

We also remark that the lower bound of approximately 1.542 [4] on the asymptotic competitiveratio of any static online bin packing algorithm, carries over to the dynamic clairvoyant case4.

1.2 Techniques

The main issue we cope with in the online setting is how to improve the competitive ratio byutilizing the additional information provided to an online scheduler. At a high level, this is achievedby drawing on ideas from our new offline algorithms; we show that these algorithms can be to someextent “simulated” in the online case, even when less information is available. Yet, the loss toperformance is bounded.

How to utilize future load predictions? When loads for each future time step are available, wedraw on ideas from Algorithm 2. In each iteration, this offline algorithm considers the unscheduledrequests, and finds greedily (and carefully) a set of requests (among the unscheduled requests) thatcan be scheduled on one or two machines, and having high enough load in every time step. This setcan be interpreted as a cover of the time horizon. The offline algorithm then repeats this processwith the remaining unscheduled requests, till all requests are scheduled.

Achieving this goal online is tricky, as multiple covers of the time horizon must be created inparallel without knowing future requests. Each “error” in assigning requests to covers may eitherincrease the number of machines required for scheduling a cover, or increase the number of covers(again, resulting in too many active machines). We show that when given information on futuredemand, the number of “extra” covers generated is bounded. The high level idea is to maintainseveral open machines (and not just two as in the offline case), and schedule a new request on thelowest index machine that “must” accept it in order to preserve a “high load” invariant. Findingthe right machine is done online utilizing the predictions on the remaining future demand and thenew request’s lifetime. Surprisingly, we are able to get a constant competitive factor in this case,even though the online scheduler is not familiar with the full interval structure of the instance, asin the offline setting, only with cumulative load. The results are presented in Section 5.

How to use average load prediction? Interestingly, we show that this single value parametercan be extremely useful in improving the competitive factor. This is done by mimicking Algorithm 3.This offline algorithm finds a dense subset of requests of roughly the same duration that can bescheduled together (similarly in spirit to the greedy set-cover algorithm). In the offline setting weshow that this is possible whenever there exists a point in time in which demand is high enough.

The online scheduler is not familiar with the demand ahead of time. Hence, it uses a carefulclassification of the requests by their duration, the current demand, and the average demand. Whilethe idea of classification has been used before in the context of dynamic bin packing, we classifyintervals in a more sophisticated way. Finally, to get our refined bounds we schedule each class ofintervals using a new family of non-clairvoyant algorithms (discussed in the sequel) that trade offcarefully multiplicative and additive terms. The results are presented in Section 4.

4Simply use the same (long) duration for all requests that arrive (almost) at the same time according to theadversarial arrival sequence.

5

Page 6: Online Virtual Machine Allocation with Lifetime and Load

A new family of non-clairvoyant algorithms. To get our refined bounds we show a generalreduction that transforms any k-bounded space (static) bin packing algorithm (see exact definitionsin Section 2.1) into a non-clairvoyant algorithm for the dynamic bin packing problem. Note thatthe optimal cost in the static bin packing is simply the number of bins (and not the total duration).Hence, we use OPTS to emphasize that this is the optimal solution for static instances. We provethe following.

Lemma 1.4. Given a k-bounded space bin packing algorithm whose cost is at most c ·OPTS + `,there exists an online non-clairvoyant algorithm for the dynamic bin packing setting whose averagecost is at most c · µ ·OPTavg + maxk, `.

For example, substituting in this theorem the performance of the Harmonic Algorithm, weobtain a non-clairvoyant algorithm whose average cost is at most Πk · µ ·OPTavg + k.

1.3 Related Work

In the remainder of this section, we discuss additional relevant work from an algorithmic perspective.Flammini et al. [11] analyzed a natural First-Fit heuristic for the offline problem, and proved it isa 4-approximation. Tang et al. [30] proved that a First-Fit heuristic in the online non-clairvoyantsetting is µ+4 competitive. This was proven to be almost optimal by Li et al. [23] who showed thatthe competitiveness of any Fit-Packing algorithm cannot be better than (µ + 1). Ren et al. [29]designed a First-Fit based algorithm for the clairvoyant bin packing problem. Using additionalpredictions of maximum and minimum lifetimes of items they achieved competitive ratio 2

õ+ 3.

The online dynamic bin packing problem was first introduced by Coffman et al. [9]. Theirobjective was minimizing the maximum number of active machines over the time horizon. Theydesigned an 2.788-competitive algorithm. For this model, Wong et al. [32] obtain a lower bound of83 ≈ 2.666 on the competitive ratio.

Additional interval scheduling models have been recently considered in the context of cloudcomputing (e.g., [25, 2, 7, 16] and references therein). These models are fundamentally differentthan our VM scheduling setup, mainly because the “jobs” in these works have some flexibility(termed slackness) as to when they are executed. Finally, we note that there has been growinginterest in designing resource management algorithms with ML-assisted (and potentially inaccurate)predictions; see, e.g., recent work on online caching [26], scheduling [18] and the ski-rental problem[28].

Organization. Our model is formally defined in Section 2. In Section 3, we consider the offlinecase. The online case is studied under two different settings: in Section 4, we assume that theaverage load information is available, whereas in Section 5 the scheduler is equipped with thefuture load vector predictions.

2 Model and Preliminaries

We model each VM request as a time interval I = [s, e); we often use the term interval whenreferring to a VM request. Each interval is associated with a start time, sI , end time, eI , and asize, wI ≤ 1. The intervals are scheduled on machines/bins whose size is normalized to 1. We saythat t ∈ I if sI ≤ t < eI . Let `I = eI − sI be the length of interval I. We assume without loss ofgenerality that the minimum length of an interval is 1, and denote by µ the maximum length of an

6

Page 7: Online Virtual Machine Allocation with Lifetime and Load

interval (which is not necessarily known in advance). Let β = maxI wI be the maximum size of aninterval (which, again, is not necessarily known in advance). In the uniform size model the size ofall intervals is 1

g for some integer value g. In the non-uniform size model the size of each intervalis arbitrary.

The static bin packing problem is a classic NP-hard problem in which the goal is to pack a setof items of varying sizes, while minimizing the number of bins used. The problem has been studiedextensively in both offline and online settings [14, 27, 8]. The problem of allocating VMs to physicalmachines is equivalent to the dynamic bin packing problem in which items (VMs) arrive over timeand later depart [9]. The goal is to minimize the total usage time of the bins which is the same asminimizing the total time the machines are active [21, 23, 29, 3]. In the online setting items arriveover time, and in the non-clairvoyant case no information is given to the scheduler upon arrival ofa new item, while in the clairvoyant setting the departure time (or duration) of an item is revealedupon its arrival.

A machine is said to be active or open at time t if at least one VM is running on it. Ourgoal is to schedule the VMs so as to minimize the total (or equivalently the average) number ofactive machines over the time horizon. We assume that the cloud capacity is large enough, so thatVM requests can always be accommodated. Without loss of generality, we further assume that ateach time t there is at least one active request (otherwise, the time horizon can be partitioned intoseparate time horizons).

Let I be a set of all intervals (VM requests). We define I(t) = I ∈ I|t ∈ I as the set ofintervals that are active at time t, and let Nt = |I(t)|. Let v = (v1, v2, . . . , vT ) be the load vectorover time, where vt = d

∑I∈I(t)wIe. In our analysis, we use several norms of the load vector:

‖v‖1 =∑T

t=1 vt, ‖v‖∞ = maxTt=1vt, and ‖v‖0 =∑T

t=1 1(Nt>0) (i.e., the total number of time

epochs in which there is at least one active VM request). In addition, let vavg = ‖v‖1T be the

average value of the load vector (or the average demand). Throughout the paper, we will use loadvector notions not only for the set I of all intervals, but also for different subsets S ⊆ I. In everysuch use case, we describe explicitly the corresponding subset. A simple (known) lower bound onthe value of the optimal solution, using our load vector notation, is the following:

Observation 2.1. The total active-machine time required by any scheduler is at least ‖v‖1.

We next provide a useful lemma about intersecting intervals (intervals that are all active at thesame time t). We use this lemma frequently in our algorithms’ analyses to guarantee that in eachsuch set, there is one interval that sees a high load, i.e., the total load is above a certain threshold,for its whole duration.

Lemma 2.2 (Intersecting intervals). Let I be a set of intervals with load vector v that are all usingtime t (i.e., t ∈ I for all I ∈ I). For any α > 0, if vt > α, then there exists an interval I ∈ I suchthat vt′ > α/2 for all t′ ∈ I.

Proof. Let α > 0 be a lower bound on the load at time t, i.e., vt > α. Since all intervals in Iare active at time t, it can be seen that their load vector is non-decreasing until time t, and non-increasing after time t. Let I1, I2, . . . , IJ ∈ I be the intervals sorted by their starting times (whichare all prior to t). Let A1 = I1, . . . , Ij ⊆ I be such that

∑ji=1wIi ≤ α/2, but

∑j+1i=1 wIi > α/2.

For each interval in I \ A1 the load at its starting point is strictly more than α/2. Similarlydefine A2 = Ik, ..., IJ such that

∑Ji=k wIi ≤ α/2, but

∑Ji=k−1wIi > α/2. and let I \ A2 be the

subset of intervals whose load in their endpoint is strictly more than α/2. As the total load in I is

7

Page 8: Online Virtual Machine Allocation with Lifetime and Load

strictly more than α, and the load of intervals in A1 ∪ A2 is at most α, there must be an intervalI ∈ I \ (A1 ∪ A2). The load that an interval I ∈ I \ (A1 ∪ A2) observes is strictly more than α/2at both its start and end times, and hence it is strictly more than α/2 at any t′ ∈ I.

2.1 Static Bin Packing Algorithms

We now discuss several well known static bin packing algorithms and related definitions. We proveuseful properties that are later used by our dynamic bin packing algorithms.

First, the well known First-Fit algorithm appears as Algorithm 1.

Algorithm 1: First-Fit

1 When an interval I arrives at time t, assign it to a machine with the earliest opening timeamong the available machines. If no machine is available, open a new machine.

The following properties hold for the First-Fit algorithm. The proofs are presented in AppendixB.1.

Lemma 2.3. The total cost of Algorithm 1 is at most:

1. ‖v‖∞ · ‖v‖0 for the uniform size case.

2.(

11−β · ‖v‖∞ + 1

)· ‖v‖0 for the non-uniform size case when β ≤ 1

2 .

3. 4 · ‖v‖∞ · ‖v‖0 for the non-uniform size case when β > 12 .

4. ‖v‖0 if ‖v‖∞ ≤ 1.

Definition 2.4. An online static bin packing algorithm is said to be k-bounded-space if at anytime the number of bins that are accepting new items (active bins) is at most k.

The Next-Fit algorithm is a prime example of a bounded space algorithm. It holds exactly oneactive bin at any time. Upon arrival of an item that does not fit in the active bin, it closes it andopens a new one (in which the new item is placed). Thus, the Next-Fit algorithm is 1-bounded.

Another important example of a bounded space algorithm is the Harmonic algorithm [19].The k-bounded space Harmonic algorithm partitions the instance I = ∪kj=1Ij such that Ij =I ∈ I | wI ∈ ( 1

j+1 ,1j ]

for j = 1, . . . , k − 1 and Ik =I ∈ I | wI ∈ (0, 1

k ]

. Each sub-instance

Ij is packed separately using Next-Fit. Given an instance I of static bin packing, the cost ofthe k-bounded space Harmonic algorithm is Πk · OPTS(I) + k. Πk is a monotonically decreasingnumber that approaches Π∞ ≈ 1.691. Πk quickly becomes very close to this number, for example,Π12 ≈ 1.692. As shown by [19], no constant bounded space algorithm can achieve an approximationratio better than Π∞.

For our analysis we need the following stronger guarantee for the performance of an online staticbin packing algorithm, where A(I) denotes the cost of algorithm A on instance I.

Definition 2.5. An online (static) bin packing algorithm A is (c, `)-decomposable if for everyinstance I =

⋃nj=1 Ij,

∑nj=1A(Ij) ≤ c ·OPTS(I) + n · `.

8

Page 9: Online Virtual Machine Allocation with Lifetime and Load

In particular, plugging n = 1, an algorithm A being (c, `)-decomposable implies that for anyinstance I it holds that A(I) ≤ c ·OPTS(I) + `.

Lemma 2.6. The following algorithms are decomposable:

1. Next-Fit is (c, 1)-decomposable where c = 1 in the uniform size case and c = min

2, 11−β

in

the non-uniform size case.

2. k-Harmonic is (Πk, k)-decomposable.

The proofs are presented in Appendix B.1.

3 Offline Scheduling Algorithms

In this section we design two offline algorithms proving Theorem 1.1. We start with the CoveringAlgorithm, which iteratively finds a set of requests that cover the time horizon and can be scheduledusing one or two machines. We then present our Density Algorithm that finds dense subsets ofrequests of roughly the same duration.

3.1 The Covering Algorithm

This section presents the Covering Algorithm as Algorithm 2 whose performance is given by thefollowing theorem. The proofs appear in Appendix B.2.

Theorem 3.1. The total cost of Algorithm 2 is at most 2 · ‖v‖1 for the uniform size case, 4 · ‖v‖1in the non-uniform case. If β ≤ 1

4 the total cost is at most∑

td2vt

1−2β e.

The main tool is the following idea of covers that are subsets of intervals that can be easilyscheduled together.

Definition 3.2. Given a set of intervals I with load vector v, a subset of intervals C ⊆ I is an[`, u]-cover if its load vector v′ satisfies v′t ∈ [minvt, `, u] for any time t.

We show that we can efficiently find a cover with load at any time at most 2 in the uniformsize case, and at most 1 in the non-uniform size case. This property allows us to schedule each ofthese covers using just two or one machines respectively. At the same time, these covers preservea “high load” invariant (important property for our algorithm’s performance), by ensuring a lowerbound on their load, as presented more formally in the following lemma.

Lemma 3.3. Let I be a set of intervals. Then, it is possible to efficiently find

• A [1, 2]-cover for the uniform size case.

• A [12 − β, 1]-cover for the non-uniform size case when β < 1

2 .

Given Lemma 3.3 the algorithm is simple. We iteratively construct covers according to Lemma3.3 and schedule each cover separately using the First-Fit algorithm.

9

Page 10: Online Virtual Machine Allocation with Lifetime and Load

Algorithm 2: Covering Algorithm

1 In the non-uniform size case: Schedule each interval with size greater than 14 on a

separate machine and remove it from I.2 while I 6= ∅ do3 Find a cover C ⊂ I as guaranteed by Lemma 3.3.4 Schedule the intervals in C using Algorithm 1 (First-Fit), and remove the intervals

from I.

5 end

3.2 The Density Algorithm

In this section we design our second algorithm whose cost is at most c · ‖v‖1 +O(∑T

t=1

√vt logµ

)where c = 1 in the uniform size case and c = min

2, 1

1−β

in the non-uniform size case. As can be

deduced from the performance bound, the algorithm performs best when the load is much largerthan log µ. The cost of the algorithm is up to twice less in both uniform and non-uniform casescompared to previous algorithms. We abuse here the notation of vt and define it as

∑I∈I(t)wI and

not⌈∑

I∈I(t)wI

⌉. We prove the theorem with respect to these smaller values of vt (making the

result only stronger).The algorithm is based on the following lemma that shows it is possible to find very dense

packing whenever the load is large. The proof appears in Appendix B.3.

Lemma 3.4. Let c = 1 in the uniform size case and c = min

2, 11−β

in the non-uniform size

case. Let I be a set of intervals, and let t be a time at which vt ≥ 2 + 4 lnµ. Then, it is possible tofind efficiently a set C ⊆ I(t) such that 1

c ≤∑

I∈C wI ≤ 1, and a length ` such that:

1. The length of each interval I ∈ C is at least `.

2. All intervals in C can be scheduled on a single machine of length at most `(

1 + 2√

2+4 lnµvt

).

Using Lemma 3.4 we design Algorithm 3. While the load is large, Algorithm 3 iteratively findsa set of intervals that create a dense packing if scheduled together, and removes them from theinstance. When the remaining load is low, the remaining intervals are scheduled using the CoveringAlgorithm of Section 3.1.

Algorithm 3: Density Algorithm

1 Let I be our current set of intervals.2 while ‖v‖∞ of the current set I is at least 2 + 4 lnµ do3 Apply Lemma 3.4 on tmax = arg maxt vt to find subset of intervals C ⊆ I(tmax).4 Schedule the intervals in C on a single machine, and remove C from I.

5 end6 Schedule the remaining intervals using Algorithm 2.

10

Page 11: Online Virtual Machine Allocation with Lifetime and Load

Theorem 3.5. The total cost of Algorithm 3 is at most

c · ‖v‖1 +O

(T∑t=1

√vt logµ

)≤ c ·OPT + T ·O

(√OPTavg logµ

)where c = 1 in the uniform size case and c = min

2, 1

1−β

in the non-uniform size case.

Proof. Consider an iteration r of the loop of Algorithm 3. Let Ir be the current set of intervalswith corresponding load values vrt , and let Cr and `r be the subset of intervals and the lengthpromised by Lemma 3.4. By Lemma 3.4, the total cost paid by the algorithm in this iteration is at

most `r(

1 + 2√

D‖vr‖∞

), where D = 2 + 4 lnµ. Let ∆vrt be the decrease in vrt after removing the

intervals in the subset Cr from Ir. Since the sum of sizes of intervals in Cr is at least 1/c, and thelength of each interval is at least `r, we get that

∑Tt=1 ∆vrt ≥ 1

c · `r. Let R be the total number of

iterations in the loop. The total cost over all iterations is at most,

R∑r=1

`r

(1 + 2

√D

‖vr‖∞

)≤

R∑r=1

c

T∑t=1

∆vrt ·

(1 + 2

√D

‖vr‖∞

)(1)

≤ cR∑r=1

T∑t=1

∆vrt ·min

(1 + 2

√D

vrt

), 3

(2)

= c

R∑r=1

T∑t=1

∆vrt ·

(1 + min

2

√D

vrt, 2

).

Inequality (1) follows since∑T

t=1 ∆vrt ≥ 1c · `

r. Inequality (2) follows since ‖vr‖∞ ≥ vrt , and since‖vr‖∞ ≥ D inside the loop.

Next, for each time t, we may analyze the summation∑R

r=1 ∆vrt ·(

1 + min

2√

Dvrt, 2)

. Let

vt = v1t be the starting value in the original instance I. We get that,

R∑r=1

∆vrt ·

(1 + min

2

√D

vrt, 2

)≤ vt +

R∑r=1

∆vrt ·min

2

√D

vrt, 2

≤ vt +∑

r|vrt<D

2∆vrt + 2√D

∑r|vrt≥D

∆vrt√vrt

≤ vt + 2D + 2√D

∑r|vrt≥D

∆vrt√vrt

As 1√v

is a decreasing function of v for v > 0,∑

r|vrt≥D∆vrt√vrt≤ 1√

D+∫ vtD

dv√v

= 1√D

+

2(√

vt −√D)

. Plugging this, we get that the total cost of all iterations is at most c · ‖v‖1 +

O(∑T

t=1

√vt logµ

).

Finally, by Theorem 3.1, the total cost of Algorithm 2 is at most 4‖v′‖1, where v′ is the finalload vector (after applying all iterations). However, by the stopping rule of our algorithm we have

11

Page 12: Online Virtual Machine Allocation with Lifetime and Load

‖v′‖∞ ≤ 2 + 4 logµ. Hence, the total additional cost is at most:

4‖v′‖1 = 4T∑t=1

√v′t ·√v′t ≤ 4

T∑t=1

√v′t(2 + 4 logµ) ≤ O(

T∑t=1

√vt logµ)

Using Jensen’s inequality and substituting vavg ≤ OPTavg we get that∑T

t=1

√vt logµ ≤ T ·√

OPTavg logµ, which concludes the proof.

3.3 Improving the Approximation For Non-Uniform Sizes

In Appendix A we show how to draw on ideas from the analysis of the Harmonic Algorithm forstatic bin packing to improve the performance of algorithms for the dynamic bin packing problem.In particular, we partition the intervals into subsets based on their size, and schedule each subsetseparately using our algorithms. This proves the bounds in Theorem 1.1.

4 Online Algorithm Using Lifetime and Average Load Predictions

In this section we design an algorithm having extra knowledge of the average load, which is asingle value (the total load divided by the length of the time horizon). We start by presenting atransformation of certain static bin packing algorithms to the dynamic case, and then use it as abuilding block in the design of our Combined Algorithm. We complement this result with a lowerbound when the lifetimes and average load are available to the algorithm. Finally, we discuss theeffect of prediction errors, or noise, on the algorithms’ guarantees.

4.1 Transforming Static Bounded Space Algorithms to Non-Clairvoyant Dy-namic Algorithms

In this section we show a general transformation of an online k-bounded space (static) bin packingalgorithm (see Definition 2.4) to a non-clairvoyant online algorithm for the dynamic bin packingsetting. We first define a static bin packing instance, given a dynamic bin packing instance, andprove an easy observation.

Definition 4.1. Let I = I1, . . . , In be an instance of the dynamic bin packing problem where thesize of interval Ij is wIj . We define a corresponding static bin packing instance, IS = i1, . . . , in,such that for j = 1, . . . , n : wIj = wij . Let OPT (I) be the cost of the optimal solution for I andOPTS(IS) be the number of bins in an optimal solution for IS.

Observation 4.2. For any instance I, OPTS(IS) ≤ OPT (I).

Proof. Consider the instance I. Obviously, shrinking all intervals to unit length can only decreaseOPT without affecting OPTS(IS). Hence, we can assume that all intervals are of unit length.Next, consider machine m that is active in the range [s, t) in the optimal solution of I. Let Ij(m)be the set of intervals that have arrived in the range [s+ j−1, s+ j) and are assigned to m. Noticethat all items in IS that correspond to intervals in Ij(m) can be placed in a single bin in a solutionof IS . Thus, all items in IS corresponding to intervals that are assigned to machine m can beassigned to bt−sc bins. Hence, we can construct a feasible solution for instance IS of cost (numberof bins) no more than OPT (I).

12

Page 13: Online Virtual Machine Allocation with Lifetime and Load

Algorithm 4 is given as input an online k-bounded static bin packing algorithm and applies itto the dynamic setting. The static bin packing instance is generated according to Definition 4.1.

Algorithm 4: Dynamic non-clairvoyant algorithm

1 Let A be k-bounded static bin packing algorithm (which maintains at any time at most kactive bins b1, b2, . . . , bk)

2 Upon arrival of a new interval I:3 begin4 If A opens a new bin, then open a new machine.5 If A accepts the item to bin bi, accept the interval to machine mi.

6 end7 Upon departure of an interval I:8 begin9 If I is not the last interval departing from its machine, do nothing.

10 If I is the last interval departing from a non-active bin, close the machine.11 If I is the last interval departing from an active bin, close the machine and associate a

new machine with the bin. Open the new machine if a new assignment to therespective active bin is made.

12 end

In the following lemma we analyze the cost of Algorithm 4.

Lemma 4.3. Let A be a k-bounded space bin packing algorithm that for instance IS has cost atmost c ·OPTS(IS) + `. Then, Algorithm 4 is an online non-clairvoyant algorithm for the dynamicbin packing setting whose cost on any instance I is at most:

c · µ ·OPT (I) + maxk, ` · ‖v‖0.

If A is also (c, `)-decomposable (see Definition 2.5), then its total cost when run separately on ninstances I1, . . . , In, where instance Ij has load vector vj and value µj, is at most:

c · µmax ·OPT (I) + maxk, ` ·n∑j=1

‖vj‖0,

where I =⋃nj=1 Ij and µmax = maxnj=1 µj.

The following is obtained by plugging Next-Fit and the Harmonic algorithm into Lemma 4.3.

Corollary 4.4. Algorithm 4 is a non-clairvoyant algorithm with total cost of at most:

• c · µ · OPT (I) + ‖v‖0 when the underlying static bin packing algorithm is Next-Fit, where

c = 1 if sizes are uniform; otherwise, c = min

2, 11−β

.

• Πk ·µ ·OPT (I) + k · ‖v‖0 when the underlying static bin packing algorithm is Harmonic withparameter k.

13

Page 14: Online Virtual Machine Allocation with Lifetime and Load

Proof of Lemma 4.3. Consider an instance I, and let M be the set of machines that the algorithmopens due to A opening a new bin. For the purposes of this analysis, when the last interval departsfrom an active bin, we consider the corresponding machine closed only if no other assignment isdone on this bin. Otherwise, we consider the machine to be in a “frozen” state, where it is inactive(therefore not paying any cost), but it can become active again if a new assignment is made tothe respective active bin. Each of these machines appears once in the set M . For each machinem ∈ M , let sm and em be the times that the first and last interval is assigned to m respectively,and let fm denote the duration for which m remains frozen during the interval [sm, em) (can bezero if m never becomes frozen).

Algorithm A has at most k active bins at any time t, and an interval is accepted to a machineonly if A accepts the item to the corresponding active bin. As a result, at any time there are atmost k accepting machines. We can therefore partition all machines M into k sets P ′1, . . . , P

′k, such

that for j = 1, . . . , k, any two machines m1,m2 ∈ P ′j are not accepting at the same time. It isobvious that such partition can also be found for any k′ ≥ k. For the following analysis, if ` ≥ k,we want to consider the partition into ` sets instead of k. So, for simplicity of exposition, we defineα = maxk, `, and consider the partition P1, . . . , Pα such that any two machines m1,m2 ∈ Pj arenot accepting at the same time. Let Mj be the number of machines in Pj .

Let mji be the machine with the i-th earliest start time among the machines in Pj . Let sji denote

the start time of mji , e

ji the time it accepts its last interval, and f ji the duration for which it is

frozen (can be zero). As a result, mji remains open at most during the interval [sji ,mineji + µ, T)

minus its freezing periods. Furthermore, it is obvious that sji ≤ eji , ∀i ∈ 1, . . . ,Mj − 1, and by the

properties of the algorithm, eji ≤ sji+1, ∀i ∈ 1, . . . ,Mj − 1, since mji+1 can become accepting only

after mji has stopped accepting intervals, i.e. after time eji . Finally, let Fj =

∑Mj

i=1 fji denote the

total freezing time over all machines in Mj . The active machine-time required by the machines ofeach set Pj is at most:

Mj∑i=1

(minT, eji + µ − sji − f

ji

)≤ T − sjMj

− Fj +

Mj−1∑i=1

(eji + µ− sji

)

≤ µ · (Mj − 1) + T − sjMj− Fj +

Mj−1∑i=1

(sji+1 − s

ji

)≤ µ · (Mj − 1) + T − sj1 − Fj ≤ µ · (Mj − 1) + ‖v‖0.

Summing up the costs for all j = 1, . . . , α, the total cost of the algorithm is at most:

µ ·α∑j=1

(Mj − 1) + α · ‖v‖0.

Notice that the algorithm can open a new machine only when A opens a new bin. Since by theguarantee of A the optimal number of bins is at most c ·OPTS(IS) + `, we conclude that:

α∑j=1

Mj ≤ c ·OPTS(IS) + `.

14

Page 15: Online Virtual Machine Allocation with Lifetime and Load

Using this observation, we obtain that the total cost of the algorithm is at most:

µ ·α∑j=1

(Mj − 1) + α · ‖v‖0 ≤ µ · (c ·OPTS(IS) + `− α) + α · ‖v‖0

≤ µ · c ·OPT (I) + α · ‖v‖0,

where the last inequality follows from Observation 4.2. This concludes the first part of the proof.For the second part of the proof, let A be a (c, `)-decomposable algorithm and assume that we

run it separately on n instances I1, . . . , In, where I = ∪nr=1Ir. Let vr be the load vector of Ir. Wedenote by P r1 , . . . , P

rα the partition of the machines opened in the solution of Ir, and by M r

j thenumber of machines in P rj . Using previous arguments, the total cost of the solution for instance Iris at most:

µr ·α∑j=1

(M rj − 1) + α · ‖vr‖0

Since algorithm A is (c, `)-decomposable,

n∑r=1

α∑j=1

M rj ≤ c ·OPTS(IS) + n · `

As a result, the total cost of the algorithm is at most:

n∑r=1

µr ·α∑j=1

(M rj − 1) + α ·

n∑r=1

‖vr‖0 ≤ µmax · (c ·OPTS(IS) + n · `− n · α) + α ·n∑r=1

‖vr‖0

≤ µmax · c ·OPT (I) + α ·n∑r=1

‖vr‖0

where again the last inequality follows from Observation 4.2. This concludes the proof.

4.2 Combined Algorithm

In this section we design an online algorithm (Algorithm 5) that uses the lifetimes and averageload information and whose total cost is at most Πk ·OPT + T · k ·O(

√vavg logµ). The algorithm

uses two parameters. Let εj = min1, jvavg. We say that an interval I is in class cj if its predicted

length `I ∈ [e∑

i<j εi , eεj · e∑

i<j εi ]. Let vj be the load vector of intervals in class cj . The second

parameter used by the algorithm is qj = min1,√

vavgj . Note that the values εj and qj depend

only on vavg and not on µ.The way the algorithm schedules the intervals differs depending on the class of each interval

and its current load. In particular, when an interval I arrives, if its current class load is low, theCombined Algorithm schedules it using First-Fit. If its current class load is high, then the CombinedAlgorithm schedules it with intervals from the same class using Algorithm 4. The intuition is thatif many intervals from the same class (having similar length) arrive close in time, we can achievebetter packing by scheduling them together. To achieve that, the Combined Algorithm runs manycopies of Algorithm 4 simultaneously (one for each class), and schedules intervals of each class usingthe respective copy.

15

Page 16: Online Virtual Machine Allocation with Lifetime and Load

Algorithm 5: Combined Algorithm (with vavg prediction)

1 Hold a single copy of Algorithm 1 (First-Fit), and several copies of Algorithm 4 with anunderlying (static) k-bounded space (c, `)-decomposable algorithm (see Definitions 2.4,2.5).

2 Upon arrival of a new interval I ∈ cj at time t:

3 if vjt ≤ qj then4 Schedule the interval I using the single copy of Algorithm 1 (First-Fit).5 else6 Schedule the interval I using the jth copy of Algorithm 4.7 end

Let vj1 be the load vector of intervals I ∈ cj that were scheduled by Algorithm 1. Let vj2 be theload vector of intervals I ∈ cj that were scheduled be Algorithm 4. By this definition vj = vj1 +vj2.

Lemma 4.5. For any j,

1. ‖vj1‖∞ ≤ qj.

2. ‖vj2‖0 ≤ (1 + eεj )|t | vjt ≥qj2 | ≤ (1 + eεj )

∑Tt=1 min1, 2vjt

qj.

Proof. We prove the two claims.Proof of (1): Consider any t, and take the intervals in vj1t . Let I be the interval in vj1t that

arrived last, and let t′ ≤ t be its start time. Since I was scheduled using Algorithm 1, vjt′ ≤ qj .

At time t′, all other intervals in vj1t are alive, and no other interval arrives in vj1t between t′ and t,since I was the last one. Therefore, vj1t ≤ qj , and since this holds for all t, ‖vj1‖∞ ≤ qj .

Proof of (2): Since we are working with a single class cj , we will remove the index j and simplyrefer to it as c and to its load vector as v. Let [`, ` · eε) denote the lengths of intervals in class c.Let c′ be the set of intervals in c that were scheduled by Algorithm 4, and let v′ be the load of c′.Finally, we use R1, R2, . . . , Rr for the disjoint ranges (time intervals) in which the load is at leastq2 and there is an interval I ∈ c′ with sI ∈ Ri. We use |Ri| for the total duration of Ri.

By the behavior of the algorithm, for each interval I ∈ c′, we have vsI > q, i.e. sI ∈ Ri for someindex i. Therefore, all I ∈ c′ can be associated with a range Ri with sI ∈ Ri. Let c′i be the set ofintervals that are associated with range Ri. We prove that:

‖c′‖0 ≤r∑i=1

‖c′i‖0 ≤r∑i=1

(|Ri|+ ` · eε) ≤ (1 + eε)r∑i=1

|Ri| ≤ (1 + eε)|t | vjt ≥qj2|

The first inequality follows since⋃c′i = c′. The second inequality follows since each interval

I ∈ c′i starts within Ri and ends at most ` ·eε after Ri’s end. The last inequality is true by definitionof Ri, and we will now show that |Ri| ≥ ` for all i which proves the third inequality and completesthis part of the proof.

Consider a range Ri and take an interval I ∈ c′i with start time sI ∈ Ri. Let A ⊆ c be theintervals that are active at time sI . Using Lemma 2.2 with α = q, we see that there exists aninterval I ∈ A (with length at least `) that is active at time t and sees more than q

2 load for itswhole duration. This means that it is associated with range Ri (otherwise the ranges would not bedisjoint), and proves that Ri has length at least `.

16

Page 17: Online Virtual Machine Allocation with Lifetime and Load

Finally, it is easy to see that |t | vjt ≥qj2 | =

∑Tt=1 min1, b2vjt

qjc which concludes the proof.

Lemma 4.6. Let n be the maximal class such that cn 6= ∅. Then,

n =

O(√vavg logµ) vavg ≥ 2 logµ

O(logµ) vavg ≤ 2 logµ

Proof. The value n satisfies that e∑n−1

j=1 εj ≤ µ ≤ e∑n

j=1 εj , which means∑n−1

j=1 εj ≤ logµ. By the

choice of εj = min1, jvavg, we get:

minn,vavg−1∑j=1

j

vavg+

n−1∑vavg

1 = min(n− 1)n

2vavg,vavg − 1

2+ max0, n− vavg ≤ logµ (3)

We now consider separately the two cases:

• When vavg ≥ 2 logµ: In this case, n ≤ vavg, as if we assume the contrary, (3) leads to a con-

tradiction. Therefore, (3) becomes logµ ≥ (n−1)n2vavg

. Rearranging this gives n = O(√vavg logµ).

• When vavg ≤ 2 logµ: If n ≤ vavg, then n ≤ 2 logµ and we are done. In the opposite case,(3) becomes:

vavg − 1

2+ n− vavg ≤ logµ⇒ n ≤ logµ+

vavg2

+1

2≤ 2 logµ+

1

2

and this concludes the proof.

We are now ready to prove Theorem 4.7.

Theorem 4.7. Given an instance of Clairvoyant Bin Packing with average load prediction, thetotal cost of Algorithm 5 when it is executed with an underlying k-bounded space (c, `)-decomposablealgorithm is at most c ·OPT + T ·maxk, ` ·O(

√vavg logµ).

Proof. The total cost of the algorithm is composed of two parts; the cost of Algorithm 1 (First-Fit)and the cost of all copies of Algorithm 4. Let Ij be the set of intervals scheduled by the j-th copyof Algorithm 4 and I ′ = ∪nj=1Ij . The underlying static bin packing algorithm of Algorithm 4 isk-bounded and (c, `)-decomposable. Thus, by Lemma 4.3 the total cost of scheduling I ′ is at most

c · eεn ·OPT (I ′) + maxk, ` ·n∑j=1

‖vj2‖0

17

Page 18: Online Virtual Machine Allocation with Lifetime and Load

The cost of Algorithm 15 is at most 4T · ‖∑n

j=1 vj1‖∞. Combine the two bounds to get:

Cost of Alg ≤ 4T · ‖n∑j=1

vj1‖∞ + c · eεn ·OPT (I ′) + maxk, ` ·n∑j=1

‖vj2‖0

≤ 4T ·n∑j=1

qj + c · eεnOPT (I) + maxk, ` ·n∑j=1

(1 + eεj )

T∑t=1

min1, 2vjtqj (4)

≤ 4T ·n∑j=1

qj + c · eεnOPT (I) + 2(1 + e) maxk, ` ·n∑j=1

minT, ‖vj‖1qj (5)

≤ 4T ·n∑j=1

qj + c · eεnOPT (I) + 2(1 + e) maxk, ` · T ·minn, vavgqn (6)

≤ c ·OPT (I) + T ·maxk, ` ·O

εn · vavg +n∑j=1

qj + minn, vavgqn

. (7)

Inequality (4) follows by using the fact that εj is increasing as j is larger, and applying Lemma

4.5. Inequality (5) follows since εj ≤ 1 and using that∑T

t=1 min1, vjtqj ≤ minT, ‖v

j‖1qj. Inequality

(6) follows since qj is non-increasing in j. Finally, Inequality (7) follows by rearranging and usingthat εj ≤ 1 and hence eεn ≤ 1 + (e − 1)εn = 1 + O(εn) and the fact that OPT ≤ 4‖v‖1. We nextanalyze two cases:

Case 1, vavg ≥ 2 logµ: In this case, by Lemma 4.6, n = O(√vavg logµ). Substituting n for

this value, we get that εn = min1, nvavg ≤ n

vavg= O(

√log µvavg

). Finally, as qj ≤ 1,∑n

j=1 qj ≤ n =

O(√vavg logµ). Plugging these bounds into Inequality (7) we get the desired result.

Case 2, vavg ≤ 2 logµ: In this case vavg = O(√vavg logµ). By the Lemma 4.6, n = O(logµ).

Using this bound we get that: qn = min1,√

vavgn . Thus,

vavgqn≤ maxvavg, O(

√vavg logµ) =

O(√vavg logµ). Finally,

n∑j=1

qj ≤vavg∑j=1

qj +

n∑j=vavg

qj ≤ vavg +

n∑j=vavg

√vavgj≤ vavg +

√vavg · n = O(

√vavg logµ)

Using that εn ≤ 1, and plugging everything into Inequality (7) we get the desired result.

As seen in Lemma 2.6 the Next-Fit and Harmonic algorithms are decomposable (see Definition2.5). Thus, using the Next-Fit algorithm (in the uniform size case) and the Harmonic algorithm(in the non-uniform size case) as the underlying algorithms of Algorithm 4 produces the followingcorollary.

Corollary 4.8. The total cost of Algorithm 5 is at most:

• OPT + T ·O(√vavg logµ) (uniform size).

• Πk ·OPT + T · k ·O(√vavg logµ) (non-uniform size).

5We remark that in the analysis of Algorithm 1 we took the worst case performance of 4. This only affects theperformance by additional constants.

18

Page 19: Online Virtual Machine Allocation with Lifetime and Load

4.3 Lower Bound: Dynamic Clairvoyant Bin Packing

In this section we complement the results of Section 4.2 by generalizing the lower bound of [3]to take into account also OPTavg showing that the additive term O(

√OPTavg · logµ) is indeed

unavoidable, if only the average future load and lifetime is available to an online algorithm.

Lemma 4.9. For any values µ, vavg the total cost of any algorithm is at least:

Ω(T ·√vavg logµ

)= Ω

(T ·√

OPTavg · logµ).

Proof. First if logµvavg

≤ 2. Then the bound is meaningless since in this case the cost of OPT is at

least ‖v‖1 · Ω(1) = Ω(T ·√vavg logµ

). Otherwise; vavg <

log µ2 , and we show an adversary that

given a parameter a ← vavg (the desired average load) and µ, creates an instance such that the

average load is always at least a and the algorithm pays at least ‖v‖1 ·√

log µ2a . If the actual average

load of the instance is strictly more than a (the desired average load), we can extend the timehorizon without adding new requests until the average load drops to the desired average, a. Ofcourse, this extension does not affect the total cost. The adversary initiates the following sequence:

• at each time t = 1, . . . , µ as long as the algorithm has strictly less than N =√

2a logµ activemachines:

• Initiate sequentially requests of size w =√

2alog µ ≤ 1 (since a ≤ log µ

2 ) and of increasing length

of 2i, i = 0, 1, . . . , dlogµe.

Since each request is of length at most µ, the total length of the time horizon T ≤ 2µ (and thisadversarial sequence can be repeated again afterwards).

First, the adversary indeed manages to make the algorithm open at least N =√

2a logµmachines at each time t since otherwise the load of the requests initiated at time t is at least

dlogµe · w ≥ logµ ·√

2alog µ = N . Hence, the average load at each time t = 1, . . . , µ is at least

w · N = 2a, and, since the length of the time horizon of the sequence is at most 2µ, the averageload over the whole horizon is at least a as promised.

Let `t be the longest interval that the adversary releases at time t (0 if there is no such interval).We have that:

‖v‖1 ≤ 2

µ∑t=1

w · `t (8)

≤ 2w · calg = calg · 2√

2a

logµ(9)

Inequality (8) follows since the total size of the last interval in the round dominates all previousones at that round by the geometric power of 2 (the longest item dominates the rest). Inequality(9) follows by the observation that the algorithm opens a new machine for the last interval theadversary gives at a certain round. Hence, calg ≥

∑µt=1 `t. Rearranging, we get that calg ≥

‖v‖1 · Ω(√

log µvavg

)= Ω

(T ·√vavg logµ

). Lastly, Theorem 3.1 states that OPTavg ≤ 4 · vavg which

concludes the proof.

19

Page 20: Online Virtual Machine Allocation with Lifetime and Load

4.4 Handling Inaccurate Predictions

So far, we have assumed that predictions for either interval lengths or average load are accurate (or“noiseless”). Naturally, some predictions are in practice prune to errors. In this section we examinethe performance of Algorithm 5 in the presence of prediction errors. We show that the algorithmis robust to prediction errors with respect to both (vavg and interval lengths). Formally,

Theorem 4.10. Suppose that Algorithm 5 is given predicted values v′avg and interval lengths `′I

such that v′avg ∈[vavg1+δ , vavg · (1 + δ)

], and each `′I ∈

[`I

1+α , `I · (1 + λ)], with δ, α, λ ≥ 0. Then, its

total cost is at most:

• (1 + α) · (1 + λ) · (OPT + T ·O(√

(1 + δ)vavg logµ)) (uniform size).

• (1 + α) · (1 + λ) · (Πk ·OPT + Tk ·O(√

(1 + δ)vavg logµ)) (non-uniform size).

Proof. We first show how to handle the inaccuracy in predicting vavg. We claim that for thepurpose of analysis (with loss in performance) we can assume that our prediction v′avg is accurate.Indeed, if v′avg < vavg, we can extend the time horizon with no additional requests from T to T ′

such that T ′ · v′avg = T · vavg. This makes the average load v′avg (and clearly does not change thecosts of the algorithm and OPT). However, the additive cost increases to T ′ · O(

√v′avg logµ) =

T ·O(√

((1 + δ)vavg logµ). If v′avg > vavg, we can add fictitious T (v′avg − vavg) intervals of length 1at the end of the time horizon. This increases (for the analysis) the cost of both the algorithm andOPT by this value. Again, this increases the actual load to v′avg, and the additive term becomes

T ·O(√v′avg logµ) = T ·O(

√(1 + δ)vavg logµ).

To analyse the errors in the length predictions we observe that in this case the analysis ofAlgorithm 5 is almost unchanged. There are only two modifications to be made. First, we generalizeLemma 4.5 as follows. For any j,

‖vj2‖0 ≤ (1 + α)(1 + λ)(1 + eεj )|t | vjt ≥qj2| ≤ (1 + α)(1 + λ)(1 + eεj )

T∑t=1

min1, 2vjtqj.

Second, the maximum length ratio of all intervals scheduled by each copy of Algorithm 4 grows bythe length prediction error to at most eεn · (1 + α)(1 + λ). Thus, the total scheduling cost is atmost:

c(1 + α)(1 + λ) · eεn ·OPT (I ′) + maxk, ` ·n∑j=1

‖vj2‖0.

Note that as a special case of the above theorem, the algorithm pays a rather small multiplicativefactor (1+δ) when the only inaccurate prediction is in the average load. This is significant in otherdomains of interest in which the interval lengths are precisely known upon arrival (e.g., establishingvirtual network connections, see [3]). Unfortunately, in the general case, Theorem 4.10 implies thatthe algorithm pays (at the worst case) an additional cost which is proportional to the noisiest lengthprediction. This result is tight as suggested by the lemma below.

Lemma 4.11. Given length prediction `′I such that `′I ∈[`I

1+α , `I · (1 + λ)]

with α, λ ≥ 0, the cost

of any online algorithm is at least (1 + α)(1 + λ)OPT .

20

Page 21: Online Virtual Machine Allocation with Lifetime and Load

Proof. At time 0 the adversary adds k2 intervals with predicted length (1 + λ) and width 1k . All

these intervals are located on at least k machines. The true length of each interval is as follows: oneach machine there is exactly one interval of length (1 +α)(1 + λ) and the length of the rest of theintervals is 1.

The total cost of the algorithm is at least (1+α)(1+λ)k as there are at least k active machines.The cost of the optimal solution is (1 + α)(1 + λ) + k − 1. For k large enough compared to theprediction error of the length, we get that the ratio between the cost of any algorithm and theoptimal solution is,

(1 + α)(1 + λ)k

(1 + α)(1 + λ) + k − 1→ (1 + α)(1 + λ).

5 Online Algorithm Using Lifetime and Load Vector Predictions

In this section we design an online algorithm with an extra knowledge of the future load. Inparticular, at each time t the algorithm is given the value vt′ for all t′ ∈ [t, t+ µ). Without loss ofgenerality (by refining the discretization of the time steps), we assume that at most one intervalarrives at any time t. Note that, even though the load vector reveals the time in which the nextinterval arrives (the next time step with a positive residual load), we cannot infer the interval’slength.

Algorithm 6 shows how to construct a single cover, similar in nature to the covers offlineAlgorithm 2 is creating, but in an online fashion. To this end, it takes as an input a load vectorv′ which might be an overestimate load prediction6 of the real load v, i.e. v′ ≥ v, and let∆ = v′ − v ≥ 0. When an interval arrives, Algorithm 6 decides whether to accept it or reject it,with accepted intervals becoming part of the cover. A cover accepts an interval only if its rejectionwould not leave enough future load for the cover to reach its lower bound. Both load and lengthpredictions are necessary to make this decision as the algorithm queries the residual load at everytime step along the interval. Algorithm 7 then uses Algorithm 6 to create a set of covers andschedule all intervals to machines.

Algorithm 6: Online covering with overestimate predictions

1 Let v′ ≥ v be an overestimate load prediction given to the algorithm.2 When interval I arrives at time t′. Let va(t′), vr(t′) be the load vector of the intervals that

arrived prior to time t′, and were accepted or rejected respectively.3 Accept I if there is a time t ∈ I such that,

v′t − vrt (t′) ≤

1 in the uniform size case12 in the non-uniform size case

Otherwise, reject I.

6This can be an overestimate due to prediction errors, however, the algorithm itself later produces overestimatesby design (and not due to errors), which are used recursively in our analysis.

21

Page 22: Online Virtual Machine Allocation with Lifetime and Load

Lemma 5.1. Let va be the intervals that Algorithm 6 accepted. Then, for each time t:

min(1−∆t)+, vt ≤ vat ≤ 2 for the uniform size case

min(1

2− β −∆t)

+, vt ≤ vat ≤ 1 for the non-uniform size case

Proof. We first show that the upper bounds hold, and then provide a proof for the lower bounds.

Upper bounds: Suppose there is a time t where vat exceeds the upper bound. We will show thatthe algorithm cannot have accepted all intervals in va. Consider the intervals I that belong to va

and which are active at time t, i.e., t ∈ I. Let that set of intervals be denoted by A, and its loadwith v′′. Since all intervals in A are active at time t, we can apply Lemma 2.2.

For the uniform case: Using Lemma 2.2 with α = 2, we can see that there is at least one intervalthat observes load strictly more than 1 for its whole duration. Let I be the first such interval, andlet sI be its arrival time. The intervals in va are never rejected by the algorithm and hence,vrt (sI) ≤ vt − vat . Hence, upon arrival of I, for any time t ∈ I v′t − vrt (sI) ≥ vt − vrt (sI) ≥ vat > 1,and I will not be accepted by the algorithm.

For the non-uniform case: Similarly, using Lemma 2.2 with α = 1, we see that there is at leastone interval that observes load greater than 1

2 for its whole duration. Let I be the first such interval,and let sI be its arrival time. The intervals in va are never rejected by the algorithm and hence,vrt (sI) ≤ vt − vat . Hence, upon arrival of I, for any time t ∈ I v′t − vrt (sI) ≥ vt − vrt (sI) ≥ vat >

12 ,

and I will not be accepted by the algorithm.

Lower bounds: This proof is also by contradiction. Let t be a time for which vat is smaller thanthe lower bound. Let I be the last arrived interval that is active at time t and which was rejectedby the algorithm. Upon I’s arrival at time sI , the current load vector of rejected intervals at timet is vrt (sI) (does not yet include I), and so, vrt (sI) + vat + wI = vt. Upon I’s arrival, the algorithmconsiders the quantity v′t − vrt (sI).

For the uniform case: The assumption that the lower bound is violated is translated to vat <min(1 − ∆t)

+, vt. If ∆t ≥ 1, this leads to an obvious contradiction, since it would imply thatvat < 0. Therefore, we focus on the case ∆t < 1, so (1−∆t)

+ = 1−∆t. This gives: v′t − vrt (sI) =v′t−vt+vat +wI = ∆t+vat +wI < ∆t+min1−∆t, vt+wI = min1, vt+∆t+wI ≤ 1+wI . Sinceall intervals have size 1/g for some integer g, both the overestimated load v′t and the rejected loadvrt (sI) are multiples of 1/g (if v′t is not, it can be rounded down to the closest 1/g multiple, since weknow the extra load does not correspond to some interval). Therefore, since v′t − vrt (sI) < 1 + wIand both loads are multiples of 1/g, we have v′t − vrt (sI) ≤

g−1g + 1

g = 1. This shows that I is

accepted by the algorithm and leads to a contradiction proving that vat ≥ min(1 −∆t)+, vt for

each time t.For the non-uniform case: We assumed that vat < min(1

2 − β −∆t)+, vt. If ∆t ≥ 1

2 − β, thisleads to an obvious contradiction, since it would imply that vat < 0. Therefore, we focus on thecase ∆t <

12 − β, so (1

2 − β −∆t)+ = 1

2 − β −∆t. We then have v′t − vrt (sI) = v′t − vt + vat + wI =∆t+v

at +wI < ∆t+min1

2−β−∆t, vt+wI = min12−β, vt+∆t+wI ≤ 1

2−β+β = 12 . Therefore, I

is accepted by the algorithm leading to a contradiction that proves that vat ≥ min(12−β−∆t)

+, vtfor each time t.

We now present the Online Covering Algorithm. This algorithm uses copies of Algorithm 6 tocreate a set of covers online and schedule all intervals to machines. In particular, when an interval

22

Page 23: Online Virtual Machine Allocation with Lifetime and Load

I arrives, the algorithm passes it as input to the first copy of Algorithm 6. If it gets rejected, itis then passed on to the second copy, followed by the remaining copies in increasing order until itgets accepted by a cover. Each copy i of Algorithm 6 also receives as input an estimate v′i of theload it will receive; v′i might be an overestimate of the real load the copy will end up receiving.

Algorithm 7: Online Covering Algorithm (with load vector predictions)

1 In the non-uniform size case: Schedule each interval with size greater than 14 on a

separate machine.2 Run copies i = 1, 2, . . . of the online covering algorithm with overestimate predictions

(Algorithm 6). The ith copy receives an overestimate of

v′it =

(vt − (i− 1))+ in the uniform size case(vt − (i− 1) · (1

2 − β))+ in the non-uniform size case

The ith copy of the algorithm receives as its input all intervals that are rejected fromcopies 1, 2, . . . , i− 1.

3 Schedule all intervals accepted by copy i using Algorithm 1 (First Fit).

Lemma 5.2. Let vat,i be the total load of accepted intervals of copy i at time t. Let vw and vn be

the load vector of the intervals that have size more than 14 and at most 1

4 respectively, and let βn bethe largest size of intervals in vn. For every time t, and j:

j∑i=1

vat,i ≥ minj, vt for the uniform size case

j∑i=1

vat,i ≥ min(j · (1

2− βn)− vwt )+, vnt for the non-uniform size case

Proof. The proof is by induction on j. For the first copy, j = 1, we have ∆t = 0 for the uniformsize case and ∆t = vwt for the non-uniform size case, where vwt is the total width of intervals of sizeslarger than 1/4. For j = 1 the claim follows by Lemma 5.1. For time t, let v1, v2, . . . , vj−1 be theload accepted by copies 1, 2, . . . , j − 1.

For the uniform case: By our guarantee,∑j−1

i=1 vi ≥ minj − 1, vt. We assume that vt >∑j−1i=1 vi, and

∑j−1i=1 vi < j, otherwise we are done. For the last copy the actual load at time t is v′′ =

vt−∑j−1

i=1 vi. Hence, ∆t =∑j−1

i=1 vi− (j− 1) (which is greater than 0 by the induction hypothesis),and it is guaranteed to accept at least min(1−∆t)

+, v′′t load by Lemma 5.1. Therefore, the totalload of accepted intervals of the first j copies is at least:

j−1∑i=1

vi + min(1−∆t)+, v′′t =

j−1∑i=1

vi + min(j −j−1∑i=1

vi)+, vt −

j−1∑i=1

vi = minj, vt

For the non-uniform case: Similarly to the uniform case, by our guarantee,∑j−1

i=1 vi ≥ min((j−1) · (1

2 − βn) − vwt )+, vnt . We assume that vnt >∑j−1

i=1 vi, and∑j−1

i=1 vi < j · (12 − βn) − vwt ,

otherwise we are done. For the last copy the actual load at time t is v′′ = vnt −∑j−1

i=1 vi. Hence,

23

Page 24: Online Virtual Machine Allocation with Lifetime and Load

∆t =∑j−1

i=1 vi−(j−1) ·(12−βn)+vwt , and it is guaranteed to accept at least min(1

2−βn−∆t)+, v′′t

load by Lemma 5.1. Therefore, the total load of accepted intervals of the first j copies is at least:

j−1∑i=1

vi + min(1

2− βn −∆t)

+, v′′t =

j−1∑i=1

vi + min(j · (1

2− βn)− vwt −

j−1∑i=1

vi)+, vnt −

j−1∑i=1

vi

≥ min(j · (1

2− βn)− vwt )+, vnt

Theorem 5.3. Given an instance of Clairvoyant Bin Packing with load vector predictions, thetotal cost of Algorithm 7 is at most 2 · ‖v‖1, for the uniform size case, 8 · ‖v‖1 in the non-uniformcase. If β ≤ 1

4 the total cost is at most∑

td2vt

1−2β e.

Proof. For the uniform case: From Lemma 5.2, for each time t, the first vt copies will acceptall intervals that are active at time t, and according to Lemma 2.3, each copy can schedule itsintervals using 2 machines. As a result, the total cost of the algorithm is

∑t 2vt = 2‖v‖1, proving

2-competitiveness.For the non-uniform case: If β ≤ 1

4 , then vnt = vt and vwt = 0. Then, by Lemma 5.2, the firstd vt

12−β e copies will accept all intervals that are active at each time t. By Lemma 5.1, each of them

is paying 1, so the total cost is at most∑

td2vt

1−2β e.If β > 1

4 , let Wt be the number of intervals with size larger than 14 that are active at time t.

Since the algorithm opens a separate machine of unit size for each of them, it pays cost Wt at eachtime t. The cost at time t to schedule intervals of size smaller than 1

4 is at most dvwt +vnt12− 1

4

e = d4vte.Thus, the total cost is at most: Wt + d4vte = dWt + 4vte ≤ d4vwt + 4vte ≤ 8dvte.

6 Conclusion

This paper studies the VM allocation problem in both offline and online settings. Our maincontribution is the design of novel algorithms that use certain predictions about the load (eitherits average or the future time series). We show that this extra information leads to substantialimprovements in the competitive ratios. As future work, we plan to consider additional models ofprediction errors and examine the performance on real-data simulations.

Acknowledgements

The research of Niv Buchbinder and Joseph (Seffi) Naor is supported in part by ISF grant 1585/15and BSF grant 2014414.

References

[1] Alicherry, M., and Bhatia, R. Line system design and a generalized coloring problem. InEuropean Symposium on Algorithms (2003), Springer, pp. 19–30.

24

Page 25: Online Virtual Machine Allocation with Lifetime and Load

[2] Azar, Y., Kalp-Shaltiel, I., Lucier, B., Menache, I., Naor, J., and Yaniv, J.Truthful online scheduling with commitments. In Proceedings of the Sixteenth ACM Conferenceon Economics and Computation (2015), pp. 715–732.

[3] Azar, Y., and Vainstein, D. Tight bounds for clairvoyant dynamic bin packing. ACMTrans. Parallel Comput. 6, 3 (2019), 15:1–15:21.

[4] Balogh, J., Bekesi, J., Dosa, G., Epstein, L., and Levin, A. A new lower bound forclassic online bin packing. In International Workshop on Approximation and Online Algorithms(2019), Springer, pp. 18–28.

[5] Bianchini, R., Cortez, E., Fontoura, M. F., and Bonde, A. Method for deployingvirtual machines in cloud computing systems based on predicted lifetime, Sept. 24 2019. USPatent 10,423,455.

[6] Boutaba, R., Salahuddin, M. A., Limam, N., Ayoubi, S., Shahriar, N., Estrada-Solano, F., and Caicedo, O. M. A comprehensive survey on machine learning for net-working: evolution, applications and research opportunities. Journal of Internet Services andApplications 9, 1 (2018), 16.

[7] Chawla, S., Devanur, N., Kulkarni, J., and Niazadeh, R. Truth and regret in onlinescheduling. In Proceedings of the 2017 ACM Conference on Economics and Computation(2017), pp. 423–440.

[8] Coffman, E. G., and Csirik, J. Performance guarantees for one-dimensional bin packing.In Handbook of approximation algorithms and metaheuristics. CRC Press, 2007, pp. 32–1.

[9] Coffman, Jr, E. G., Garey, M. R., and Johnson, D. S. Dynamic bin packing. SIAMJournal on Computing 12, 2 (1983), 227–258.

[10] Cortez, E., Bonde, A., Muzio, A., Russinovich, M., Fontoura, M., and Bianchini,R. Resource central: Understanding and predicting workloads for improved resource manage-ment in large cloud platforms. In Proceedings of the 26th Symposium on Operating SystemsPrinciples (2017), pp. 153–167.

[11] Flammini, M., Monaco, G., Moscardelli, L., Shachnai, H., Shalom, M., Tamir, T.,and Zaks, S. Minimizing total busy time in parallel scheduling with application to opticalnetworks. Theoretical Computer Science 411, 40-42 (2010), 3553–3562.

[12] Gao, J. Machine learning applications for data center optimization. Available also from:https://ai. google/research/pubs/pub42542 (2014).

[13] Guenter, B., Jain, N., and Williams, C. Managing cost, performance, and reliabilitytradeoffs for energy-aware server provisioning. In 2011 Proceedings IEEE INFOCOM (2011),IEEE, pp. 1332–1340.

[14] Johnson, D. S. Near-optimal bin packing algorithms. PhD thesis, Massachusetts Institute ofTechnology, 1973.

25

Page 26: Online Virtual Machine Allocation with Lifetime and Load

[15] Kamali, S., and Lopez-Ortiz, A. Efficient online strategies for renting servers in thecloud. In SOFSEM 2015: Theory and Practice of Computer Science - 41st InternationalConference on Current Trends in Theory and Practice of Computer Science, Pec pod Snezkou,Czech Republic, January 24-29, 2015. Proceedings (2015), G. F. Italiano, T. Margaria-Steffen,J. Pokorny, J. Quisquater, and R. Wattenhofer, Eds., vol. 8939 of Lecture Notes in ComputerScience, Springer, pp. 277–288.

[16] Kumar, S., and Khuller, S. Brief announcement: A greedy 2 approximation for the activetime problem. In Proceedings of the 30th on Symposium on Parallelism in Algorithms andArchitectures, SPAA 2018, Vienna, Austria, July 16-18, 2018 (2018), ACM, pp. 347–349.

[17] Kumar, V., and Rudra, A. Approximation algorithms for wavelength assignment. InFSTTCS 2005: Foundations of Software Technology and Theoretical Computer Science, 25thInternational Conference, Hyderabad, India, December 15-18, 2005, Proceedings (2005), R. Ra-manujam and S. Sen, Eds., vol. 3821 of Lecture Notes in Computer Science, Springer, pp. 152–163.

[18] Lattanzi, S., Lavastida, T., Moseley, B., and Vassilvitskii, S. Online scheduling vialearned weights. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on DiscreteAlgorithms (2020), SIAM, pp. 1859–1877.

[19] Lee, C. C., and Lee, D.-T. A simple on-line bin-packing algorithm. Journal of the ACM(JACM) 32, 3 (1985), 562–572.

[20] Li, Y. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017).

[21] Li, Y., Tang, X., and Cai, W. On dynamic bin packing for resource allocation in the cloud.In 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’14, Prague,Czech Republic - June 23 - 25, 2014 (2014), pp. 2–11.

[22] Li, Y., Tang, X., and Cai, W. Play request dispatching for efficient virtual machine usagein cloud gaming. IEEE Transactions on Circuits and Systems for Video Technology 25, 12(2015), 2052–2063.

[23] Li, Y., Tang, X., and Cai, W. Dynamic bin packing for on-demand cloud resource alloca-tion. IEEE Trans. Parallel Distrib. Syst. 27, 1 (2016), 157–170.

[24] Lin, M., Wierman, A., Andrew, L. L., and Thereska, E. Dynamic right-sizing forpower-proportional data centers. IEEE/ACM Transactions on Networking 21, 5 (2012), 1378–1391.

[25] Lucier, B., Menache, I., Naor, J., and Yaniv, J. Efficient online scheduling for deadline-sensitive jobs. In Proceedings of the twenty-fifth annual ACM symposium on Parallelism inalgorithms and architectures (2013), pp. 305–314.

[26] Lykouris, T., and Vassilvitskii, S. Competitive caching with machine learned advice.arXiv preprint arXiv:1802.05399 (2018).

[27] man Jr, E. C., Garey, M., and Johnson, D. Approximation algorithms for bin packing:A survey. Approximation algorithms for NP-hard problems (1996), 46–93.

26

Page 27: Online Virtual Machine Allocation with Lifetime and Load

[28] Purohit, M., Svitkina, Z., and Kumar, R. Improving online algorithms via ml predic-tions. In Advances in Neural Information Processing Systems (2018), pp. 9661–9670.

[29] Ren, R., and Tang, X. Clairvoyant dynamic bin packing for job scheduling with minimumserver usage time. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithmsand Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13,2016 (2016), pp. 227–237.

[30] Tang, X., Li, Y., Ren, R., and Cai, W. On first fit bin packing for online cloud server allo-cation. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2016), IEEE, pp. 323–332.

[31] Winkler, P., and Zhang, L. Wavelength assignment and generalized interval graph color-ing. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms(2003), Society for Industrial and Applied Mathematics, pp. 830–831.

[32] Wong, P. W., Yung, F. C., and Burcea, M. An 8/3 lower bound for online dynamicbin packing. In International Symposium on Algorithms and Computation (2012), Springer,pp. 44–53.

[33] Zhang, Y., Prekas, G., Fumarola, G. M., Fontoura, M., Goiri, I., and Bianchini,R. History-based harvesting of spare cycles and storage in large-scale datacenters. In 12thUSENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (2016),pp. 755–770.

A Improving the Approximation For Non-Uniform Sizes

In this section we show how to use ideas from the analysis of the Harmonic algorithm for the staticbin packing to improve the performance of algorithms for the dynamic bin packing. This can bedone for any algorithm for the dynamic case (with certain good properties). The reduction is givenas Algorithm 8.

Algorithm 8: Partition Algorithm (parameter k)

1 Let A be an offline algorithm for the dynamic bin packing problem.

2 Partition I so that I =⋃kj=1 Ij , where Ij = I ∈ I | wI ∈ ( 1

j+1 ,1j ] for j = 1, . . . , k − 1,

and Ik = I ∈ I | wI ≤ 1k.

3 Schedule each subset Ij separately using A.

Lemma A.1. Let A be an offline dynamic bin packing algorithm that for instance I with loadvector v when measured without the ceiling on each coordinate has a total cost of:

• c · ‖v‖1 + f(v) for the uniform size case.

• cβ · ‖v‖1 + g(v) for the non-uniform size when parametrized by β,

27

Page 28: Online Virtual Machine Allocation with Lifetime and Load

where f and g are non-decreasing functions of the load vector. Then, for integer k ≥ 3, the totalcost of Algorithm 8 is at most

Πk ·maxc, c 1k· k − 1

k ·OPT + (k − 1)f(2v) + g(2v)

If f is also concave then the total cost is at most:

Πk ·maxc, c 1k· k − 1

k ·OPT + (k − 1) · f

(2v

k − 1

)+ g(2v)

Proof. We define a new size for each interval. For I ∈ Ij , 1 ≤ j ≤ k − 1 we set w′I = 1j and for

I ∈ Ik we set w′I = wI · kk−1 .

Let vj and v′j be the load vectors with respect to wI and w′I (respectively) of Ij , 1 ≤ j ≤ k− 1.In any feasible schedule at most j intervals can be scheduled on the same machine given both sizefunctions. The interval sizes w′ in instance Ij are uniform thus, the total cost of scheduling Ij isat most c · ‖v′j‖1 + f(v′j).

The total cost of the schedule of Ik created by algorithm A with respect to load vector vk iscβ · ‖vk‖1 + g(vk), β ≤ 1

k . The load of each machine with respect to w′ is larger as the size of each

interval is multiplied by kk−1 . Thus, the total cost with respect to vector v′k is c 1

k· k−1k ·‖v

′k‖1 +g(v′k).

Summing over I1, ..., Ik, the total cost of the algorithm is at most

maxc, c 1k· k − 1

k ·

k∑j=1

‖v′j‖1 +k−1∑j=1

f(v′j) + g(v′k)

≤ maxc, c 1k· k − 1

k · ‖v′‖1 + (k − 1) · f(v′) + g(v′)

If f is concave we can use Jensen’s inequality to bound∑k−1

j=1 f(v′j) from above by (k−1)f( 1k−1 ·∑k−1

j=1 v′j) ≤ (k − 1)f( v′

k−1).Any optimal solution must pay 1 to pack Πk of the load defined by w′. Thus, we can bound the

optimal solution, OPT ≥ ‖v′‖1

Πk. In addition, v′ ≤ 2v and ‖v′‖0 = ‖v‖0 which proves the lemma.

Corollary A.2. The total cost of Algorithm 8 is at most:

• 2 ·(

1 + 1k−2

)·Πk ·OPT + k · ‖v‖0 for k ≥ 4 with Algorithm 2 as A.

• Πk ·OPT +(√

k + 1)·O(∑T

t=1

√vt logµ

)with Algorithm 3 as A.

Proof. As proven in Theorem 3.1 the cost of Algorithm 2 is at most 2‖dve‖1 ≤ 2‖v‖1 + ‖v‖0 in theuniform size case and

∑Tt=1d

2vt1−2β e ≤

21−2β‖v‖1 + ‖v‖0 in the non-uniform size case. Thus, c = 2,

c 1k

= 2kk−2 and f(v) = g(v) = ‖v‖0. Thus, the total cost of Algorithm 8 with Algorithm 2 as A is at

most

c 1k· k − 1

k·Πk ·OPT + k · ‖2v‖0 ≤ 2 ·

(1 +

1

k − 2

)·Πk ·OPT + k · ‖v‖0

28

Page 29: Online Virtual Machine Allocation with Lifetime and Load

By Theorem 3.5 Algorithm 3 has performance guarantee c = 1, c 1k

= kk−1 and f(v) = g(v) =

O(∑T

t=1

√vt logµ

)which are concave functions. Thus, the total cost of Algorithm 8 with Algo-

rithm 3 as A is at most

Πk ·OPT + k ·O

(T∑t=1

√vtk

logµ

)≤ Πk ·OPT +O

(√k ·OPT · T · logµ

)where the inequality follows by Jensen’s inequality.

B Proofs Omitted

B.1 Proofs omitted from Section 2.1

Proof of Lemma 2.3. The proof for each case follows.Uniform case. Assume in contradiction that there exists a time t in which an interval I arrives,resulting in ‖v‖∞ + 1 open machines. This can only happen if the ‖v‖∞ machines that were openprior to the arrival of I (at time t) are all fully occupied. Along with the interval I, by the propertiesof first fit, it implies that the number of intervals at time t is greater than Nt, a contradiction.Non uniform case when β ≤ 1

2 . Let M denote the maximum number of machines used at anytime over the horizon, and let t be a time where the algorithm decided to open an Mth machine.Since the size of each interval is at most β and at time t the algorithm could not accommodate anarriving interval in the existing M − 1 machines, that implies that the M − 1 first machines haveload at least 1− β at time t. Furthermore, the total load of machines M − 1 and M at time t is atleast 1, otherwise the algorithm would not need to open a new machine. As a result, at time t:

‖v‖∞ > (M − 2) · (1− β) + 1 =⇒ 1

1− β· ‖v‖∞ ≥M − 2 +

1

1− β≥M − 1.

Therefore, M ≤ 11−β · ‖v‖∞ + 1, as claimed.

Non uniform case when β > 12 . Fix a time t. A machine is called wide if it has load at least

1/2 and narrow otherwise. Denote by Mw and Mn the number of wide and narrow machines,respectively. We show that Mw ≤ 2 · ‖v‖∞ and Mn ≤ 2 · ‖v‖∞.

• Wide machines: Assume Mw ≥ 2 · ‖v‖∞ + 1. The wide machines have load at least 1/2,thus their total load is at least 1

2Mw ≥ ‖v‖∞ + 12 > ‖v‖∞, which is a contradiction. Thus,

Mw ≤ 2 · ‖v‖∞.

• Narrow machines: Assume Mn ≥ 2 · ‖v‖∞ + 1 and let m denote the last machine activatedat t, among the narrow machines. At time t, by definition, m has at least one interval I withsize less than 1/2. At the start time sI of I, the algorithm chose to assign it to machine m,implying that I could not be assigned to all other narrow machines, meaning they each hadload at least 1/2 at time sI . Hence, the total load on the narrow machines at time sI is atleast (Mn − 1)1

2 + wI ≥ ‖v‖∞ + wI > ‖v‖∞, which is a contradiction.

As a result, at time t the total number of open machines is at most Mw +Mn ≤ 4‖v‖∞, and, sincethis is true for all t, we get the desired result.When ‖v‖∞ ≤ 1: If the algorithm opens more than a single machine at any time t, the total loadat that time is more than 1, contradicting our assumption.

29

Page 30: Online Virtual Machine Allocation with Lifetime and Load

Proof of Lemma 2.6. Proof of (1): Next-Fit holds a single active bin that is still accepting inter-vals. The rest of the bins are full in the uniform size case and at least max1

2 , 1 − β-full in thenon-uniform size case. Similarly, in an instance decomposed into n sub-instances there are n activebins, while the rest of the bins are full in the uniform size case and at least max1

2 , 1 − β-full inthe non-uniform size case. This translates to a total cost of at most OPTS(I) + n in the uniform

size case and min

2, 11−β

·OPTS(I) + n in the non-uniform size case.

Proof of (2): The k-bounded harmonic algorithm is composed of k copies of the Next-Fitalgorithm. The k − 1 first copies (of the biggest items) can be seen as uniform size bin packingsince exactly j items are packed in each bin in the j-th copy. In the k-th copy sizes are not uniform,though for the sake of the analysis of the harmonic algorithm a bin is considered as full if it is 1− 1

kfull. Thus, this copy of Next-Fit is also (1, 1)-decomposable. Decomposing each copy of Next-Fitleads to an additional cost of n− 1, overall k · (n− 1).

B.2 Proofs omitted from Section 3.1

Proof of Lemma 3.3. We start with the non-uniform size case. Given a set of intervals I each withsize less than 1/2 (i.e., β < 1

2), we show how to efficiently construct a [12 − β, 1]-cover C ⊆ I.

Initially, we start with C = I and v′ is the load vector of the subset C. Clearly, initially, for everyt, v′t ≥ minvt, 1

2 − β. If for every t, v′t ≤ 1 then we are done. Otherwise, there exists a time tsuch that v′t > 1. Consider the set of intervals A ⊆ C that are active at time t. Using Lemma2.2 with α = 1, we see that there exists an interval I that observes load more than 1/2 for itswhole duration. Removing I, the load vector of the subset A remains at least 1

2 − β. Since A ⊂ Cremoving such an interval maintains the load at any time it intersects at least 1

2 − β also in thesubset C. Thus, after iteratively removing these intervals we have that v′t ∈ [minvt, 1

2 − β, 1].The proof for the uniform case follows the same lines. Given a set of intervals I, we initially set

C = I. Let v′ be the load vector of the subset C. Clearly, initially, for every t, v′t ≥ minvt, 1. Iffor every t, v′t ≤ 2 then we are done. Otherwise, there exists time t such that v′t > 2. Using Lemma2.2 with α = 2, we see that there exists an interval I that observes load strictly more than 1 at anypoint. Hence, removing any such intervals (and using the fact that we are in the uniform case), theload vector of the subset A remains at least 1. Since A ⊆ C this is true also for the subset C.

Proof of Theorem 3.1. Consider an iteration r of the while loop of Algorithm 2. Let Ir denote theset of intervals in the beginning of iteration r, i.e., the intervals that have not been assigned inprevious iterations. Denote by vr the load vector of Ir, and by Cr the cover that was obtained byLemma 3.3 during iteration r. Let v(Cr) be the load vector of Cr.

For the uniform case, since vt(Cr) ≤ 2 for all t by construction, we have ‖v(Cr)‖∞ ≤ 2, and

the total cost of the algorithm in this iteration is 2 · ‖vr‖0 by Lemma 2.3. By the properties of thecover, |Cr(t)| ≥ minvrt , 1 for every t. Hence, ‖vr‖1 decreases by at least ‖vr‖0. Summing up overall iterations we get that the cost is at most 2 · ‖v‖1.

For the non-uniform case, in each iteration r, ‖v(Cr)‖∞ ≤ 1, so First Fit schedules the cor-responding intervals using one machine by Lemma 2.3. Hence, if the maximum size interval is atmost 1

4 , by the properties of the cover, |Cr(t)| ≥ minvrt , 12 − β for every t, so summing up over

all iterations, the algorithm pays at most∑

tdvt

12−β e. If β > 1

4 , let Wt be the number of intervals

with width larger than 14 that are active at time t. Since the algorithm opens a separate machine

of unit size for each of them, it pays cost Wt at each time t. Let vw and vn be the load vector of

30

Page 31: Online Virtual Machine Allocation with Lifetime and Load

the intervals that have size more than 14 and at most 1

4 respectively, and let βn be the largest sizeof intervals in vn. At each time t the algorithm pays:

Wt + d vnt12 − βn

e ≤Wt + d4vnt e = dWt + 4vnt e ≤ 4 · d14·Wt + vnt e ≤ 4 · dvwt + vnt e = 4dvte

In the above, the first inequality follows from βn ≤ 14 and the subsequent equality from Wt being

an integer. The second inequality is due to dαxe ≤ αdxe for α integer, and the final inequalityis based on the fact that vwt ≥ 1

4 ·Wt since wide intervals have by definition size larger than 14 .

Summing over all t, the total cost of the algorithm is at most:∑t

4dvte = 4‖v‖1

B.3 Proofs omitted from Section 3.2

Proof of Lemma 3.4. Let t be a time with vt ≥ 2 + 4 lnµ. We partition the intervals in I(t) into 1

+ log1+ε µ length classes Ci, with ε =√

Dvt

where D = 2+4 lnµ. The ith class contains all intervals

in I(t) whose length is in the range [(1 + ε)i−1, (1 + ε)i). Note that since vt ≥ 2 + 4 lnµ, then ε ≤ 1.Consider the intervals in the ith class Ci, and let ` = (1 + ε)i−1. By our partition, all lengths of

intervals in Ci are in the range [`, `(1 + ε)). Furthermore, as they all belong to I(t) (and are henceactive at time t), the starting time of each interval I ∈ Ci is in the range (t− `(1 + ε), t]. We next,further partition the intervals in Ci to (1 + 1

ε ) sub-classes Ci,j by their starting times. Ci,j containsall intervals in Ci whose starting time is in the time interval (t−`(1+ε)+(j−1)ε`, t−`(1+ε)+j ·ε`].

Overall, the total number of sets in our partition is at most,(1

ε+ 1

)(1 +

lnµ

ln(1 + ε)

)≤(

1

ε+ 1

)(1 +

2 lnµ

ε

)≤ 2 + 4 lnµ

ε2=D

ε2

where the first inequality follows since for ε ≤ 1, ln(1 + ε) ≥ ε2 .

Hence, one of the sets must contain a load of at least vt · ε2

D ≥ 1 at time t. This means that inthe uniform case, where each interval has size 1/g for some integer g, at least one set contains atleast g intervals. In the non-uniform case it means there exists a set with size at least 1. Given amax size of β, this set contains a subset of size at least max1

2 , 1 − β. As all the intervals are oflength [`, `(1 + ε)), and their starting time is at most `ε apart, it is possible to open a machine of

length at most `(1 + 2ε) = `(

1 + 2√

Dvt

)for the selected subset of size at least 1/c.

31