overbooking.ppt

Preview:

Citation preview

1Computer Science

Resource Overbooking and Application Profiling in Shared

Hosting PlatformsBhuvan Urgaonkar

Prashant Shenoy Timothy Roscoe †

UMASS Amherst and Intel Research †

2Computer Science

Motivation

❒ Proliferation of Internet applications❍ Electronic commerce, streaming media, online

games, online trading,…

❒ Commonly hosted on clusters of servers❍ Cheaper alternative to large multiprocessors

ClientsInternetStreaming

Games

E-commercecluster

3Computer Science

Hosting Platforms

❒ Hosting platform: server cluster that runs third-party applications

❒ Application providers pay for server resources❍ CPU, disk, network bandwidth, memory

❒ Platform provider guarantees resource availability❍ Performance guarantees provided to applications

❒ Central challenge: Maximize revenue while providing resource guarantees

4Computer Science

Design Challenges

❒ How to determine an application’s resource needs?

❒ How to provision resources to meet these needs?

❒ How to map applications to nodes in the platform?

❒ How to handle dynamic variations in the load?

5Computer Science

Talk Outline➾ Introduction

❒ Inferring Resource Requirements

❒ Provisioning Resources

❒ Handling Dynamic Load Variations

❒ Experimental Evaluation

❒ Related Work

6Computer Science

Hosting Platform Model❒ Hosting Platforms: Dedicated vs Shared

❍ Dedicated: Applications get integral # nodes❍ Shared: Applications may get fractional # nodes

❒ Our focus: Shared Hosting Platforms❍ Nodes may have competing applications

❒ Capsule: component of an application running on a node❍ Example: e-commerce application: HTTP server, app

server, database server

7Computer Science

Provisioning By Overbooking❒ How should the platform allocate resources?

❍ Provision resources based on worst-case needs❒ Worst-case provisioning is wasteful

❍ Low platform utilization❒ Applications may be tolerant to occasional violations

❍ E.g., CPU guarantees should be met 99% of the time

❒ Possible to provide useful guarantees even after provisioning less than worst-case needs

ð Idea: Improve utilization by overbooking resources

8Computer Science

Application Profiling

❒ Use the Linux trace toolkit

time

Begin CPU quantum End CPU quantum

ON OFF

❒ Profiling: process of determining resource usage❍ Run the application on an isolated set of nodes❍ Subject the application to a real workload❍ Model CPU and network usage as ON-OFF processes

9Computer Science

Resource Usage Distribution

time

Measurement Interval

Cum

ulat

ive

Pr

obab

ilit

y

Fractional usage0 1

1

r(100)

0.99

r(99)

Prob

abil

ity

Fractional usage0 1

10Computer Science

Capturing Burstiness: Token Bucket

❒ Token Bucket (σ, ρ)❍ Resource usage over t ≤ σ.t + ρ

Algorithm by Tang et al

❒ Additional parameter T❍ Satisfy token bucket guarantees only for t ≥ T

ρ1

ρ2

time

usage

σ1.t + ρ1

σ2.t + ρ2

11Computer Science

Profiles of Server Applications

❒ Applications exhibit different degrees of burstiness ❍ May have a long tail

❒ Insight: Choose (σ, ρ) based on a high percentile

0

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 1

Postgres Server, 10 clients

Pro

bab

ilit

y

Fraction of CPU

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Apache Web Server, 50% cgi-bin

Pro

bab

ilit

y

Fraction of CPU

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 .2 0.3 0.4 0.5 0.6 0.7 0.8

Streaming Media Server, 20 clients

Pro

bab

ilit

y

Fraction of NW bandwidth

12Computer Science

Resource Overbooking❒ Applications specify overbooking tolerance O

❍ Probability with which capsule needs may be violated

❒ Controlled overbooking via admission control:

ΣK (σk ·Tmin + ρk)·(1 - Ok) ≤ C·Tmin

Pr (ΣKUk > C) ≤ min (O1,…,Ok)

❒ A node that has sufficient resources for a capsule is feasible for it

13Computer Science

Mapping Capsules to Nodes

❒ A bipartite graphs of capsules and feasible nodes❍ Greedy mapping: consider capsules in non-decreasing

order of degrees: O( c . Log c ) ❍ Guaranteed to find a placement if one exists!❍ Multiple feasible nodes => best fit, worst fit, random…

1

2

3

1234

capsules nodes capsules nodes

1

33

1

2

4

Final Mapping

14Computer Science

Handling Flash Crowds❒ Detect overloads by online profiling

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Apache Web Server, Overload

Pro

bab

ilit

y

Fraction of CPU

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Apache Web Server, Expected Workload

Pro

bab

ilit

y

Fraction of CPU

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Apache Web Server, Offline Profile

Pro

bab

ilit

y

Fraction of CPU

❒ Reacting to overloads (ongoing work)❍ Compute new allocations❍ Change allocations, move capsules, add servers

15Computer Science

Talk Outline➾ Introduction

➾ Inferring Resource Requirements

➾ Provisioning Resources

➾ Handling Dynamic Load Variations

❒ Experimental Evaluation

❒ Related Work

16Computer Science

The SHARC Prototype

❒ A Linux-based Shared Hosting Platform

❍ 6 Dell Poweredge 1550 servers

❍ Gigabit Ethernet link

❒ Software Components❍ Profiling

Vanilla Linux + Linux trace toolkit

❍ Control plane Overbooking, placement

❍ QoS-enhanced Linux kernel HSFQ schedulers

17Computer Science

Experimental Setup

❒ Prototype running on a 5 node cluster❍ Each server: 1 GHz PIII with 512MB RAM and Gigabit

ethernet❍ Control plane runs on a dedicated node❍ Applications run on the other four nodes

❒ Workload: mix of server applications❍ PostgreSQL database server with pgbench (TPC-B) benchmark❍ Apache web server with SPECWeb99 (static & dynamic HTTP)❍ MPEG streaming server with 1.5 Mb/s VBR MPEG-1 clients❍ Quake I game server with “terminator” bots

18Computer Science

Resource Overbooking Benefits

❒ Small amounts of overbooking can yield large gains❍ Bursty applications yields larger benefits

0

50

100

150

200

250

300

350

0 20 40 60 80 100 120 140

Placement of Streaming Media Servers

No OvbOvb=1%Ovb=5%

Num

ber

of

Ap

ps

Pla

ced

Number of Nodes

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100 120 140

No OvbOvb=1%Ovb=5%

We

b S

erv

ers

Pla

ced

Number of Nodes

Placement of Apache Web Servers

19Computer Science

Capsule Placement Algorithms

❒ Diverse requirements: worst-fit outperforms others❒ Similar requirements: all perform similarly

0

20

40

60

80

100

16 32 64

Placement Algorithms, Ovb=5%

RandomBest-fitWorst-fit

Num

ber

of

Ap

ps

Pla

ced

Number of Nodes

0

500

1000

1500

2000

2500

3000

3500

16 32 64

Placement Algorithms, Ovb=5%

RandomBest-fitWorst-fit

Nu

mb

er o

f A

pp

s P

lace

dNumber of Nodes

20Computer Science

Performance with Overbooking

❒ Performance degradation is within specified overbooking tolerance

5.230.590.3100Viol (sec)

Streaming

9.0421.7822.2122.4622.8Tput(trans/s)

PostgreSQL

39.864.8166.9167.5167.9Tput(req/s)

Apache

Avg95th 99th 100th IsolatedMetricApplication

21Computer Science

Related Work❒ Single node resource management

❍ Proportional share schedulers: WFQ, SFQ, BVT, …❍ Reservation based schedulers: Nemesis, Rialto, …

❒ Cluster-based resource management❍ Cluster Reserves [Aron00], Aron thesis [Aron00]❍ MUSE [Chase01]: economic approach ❍ Oceano [IBM], Planetary computing [HP]❍ Clusters for high availability: Porcupine [Saito99]❍ Grid computing

22Computer Science

Concluding Remarks

❒ Resource management in shared hosting platforms❍ Application profiling to determine resource usage❍ Revenue maximization using controlled overbooking ❍ Ability to handle dynamic workloads (ongoing work)

❒ URL: http://lass.cs.umass.edu