U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTS, ASSACHUSETTS, AAMHERST MHERST – – Department of Computer ScienceDepartment of Computer Science

Dynamic Resource Management in Internet Hosting Platforms

Ph.D. Thesis DefenseBhuvan Urgaonkar

Advisor: Prashant Shenoy

UUNIVERSITYNIVERSITY OFOF M MASSACHUSETTS, ASSACHUSETTS, AAMHERST MHERST – – Department of Computer ScienceDepartment of Computer Science 2

Internet Applications

Proliferation of Internet applications

auction site online game online retail store

Growing significance in personal, business affairs Focus: Internet server applications


Hosting Platforms Data Centers

Clusters of servers Storage devices High-speed interconnect

Hosting platforms: Rent resources to third-party applications Performance guarantees in return for revenue

Benefits: Applications: don’t need to maintain their own infrastructure

o Rent server resources, possibly on demand Platform provider: generates revenue by renting resources


Goals of a Hosting Platform Meet service-level agreements

Satisfy application performance guaranteeso E.g., average response time, throughput

Maximize revenue E.g., maximize the number of hosted applications

Question: How should a hosting platform manage its resources to meet these goals?


Challenge #1: Dynamic Workloads

Multi-time-scale variations Time-of-day, hour-of-day

Overloads E.g., Flash crowds

User threshold for response time: 8-10 s

Key issue: How to provide good response time under varying workloads?

0

20000

40000

60000

80000

100000

120000

140000

0 5 10 15 20

Time (hrs)

Req

uest

Rat

e (r

eq/m

in)

0 12 24 Time (hours)

Time (days)0 1 2 3 4 5

Arr

ival

s pe

r min 0

0

140K

1200


Challenge #2: Complexity of Applications

Complex software architecture Diverse software components Web servers, Java application servers, databases

Multiple classes of clients How to provide differentiated service?

Replicable components How many replicas to have?

Tunable configuration parameters E.g., MaxClient in Apache How to set these parameters?

Key issue: How to capture all this complexity?


Talk Outline

MotivationThesis contributions

Application modeling Dynamic provisioning Scalable request policing Conclusions


Hosting Platform Models

Small applications Require only a fraction of a server Shared Web hosting, $20/month to run own Web site

Shared hosting: multiple applications on a server Co-located applications compete for server resources


Hosting Platform Models

Large applications May span multiple servers eBay site uses thousands of servers!

Dedicated hosting: at most one application per server Allocation at the granularity of a single server


Thesis Contributions

Dynamic resource management in hosting platforms

Shared Hosting Statistical multiplexing and under-provisioning [OSDI 2002] Application placement [PDCS 2004]

Dedicated Hosting Analytical model for an Internet application [SIGMETRICS 2005] Dynamic provisioning [Autonomic Computing 2005] Scalable request policing [PODC 2004, WWW 2005]


Talk Outline

Motivation Thesis contributions

Application modeling Dynamic provisioning Scalable request policing Conclusions


Internet Application Architecture

Multi-tier architecture Each tier uses services provided by its successor

Session-based workloads

HTTP J2EE Databaserequest processing in an online bookstore

search “moby” queries

response

Melville’s ‘Moby Dick’Music CDs by Moby


Baseline Application Model

Model consists of two components Sub-system to capture behavior of clients Sub-system to capture request processing inside the application

SIGMETRICS’05

clients application


Modeling Clients

Clients think between successive requests Infinite server system to capture think time Z Captures independence of Z from processing in application

Client 1

Client 2

Client N

Z

Z

Z

Q0

applicationclients


Modeling Request Processing

Q1 Q2 QM

tier 1 tier 2 tier M

pM=1p3p1

p2

S1 S2 SM

Transitions defined to capture circulation of requests Request may move to next queue or previous queue

Multiple requests are processed concurrently at tiers Processor sharing scheduling discipline

Caching effects get captured implicitly!

N


Putting It All Together

Q0

Q1 Q2 QM

pM=1p3p1

p2

Z

Z

S1 S2 SM

N

A closed-queuing model that captures a given number of simultaneous sessions being served

tier 1 tier 2 tier M

client

client


Mean-value Analysis

Q0

Q1 Q2 QM

Product-form closed queuing network Lm: average length of Qm Am: average number of clients in Qm seen by arriving client

Am (n+1) = Lm (n) Iterative algorithm to compute mean queue lengths, sojourn times

client

clientn

clientn+1

1

A2(n+1)= AM(n+1)=A1(n+1)= L1(n) L2(n) LM(n)


Parameter Estimation

Visit ratios Equivalent to trans. probs. for MVA Vi ≈ λi / λreq ; λreq at sentry, λi from logs

Service times Use residence time Xi logged at tier i For last tier, SM ≈ XM

Si = Xi – ( Vi+1 / Vi ) · Xi+1

Think time Measured at the application sentry


Evaluation of Baseline Model

Auction site RUBiS One server per tier

Apache JBOSS Mysql

Concurrency limits not captured

0

5000

10000

15000

20000

25000

30000

0 100 200 300 400 500

ObservedBasic Model

Avg

resp

tim

e (m

sec)

Num sessions75150


Q0Q1 Q2 QM

Z

ZS1 S2 SM

N

Requests may be dropped due to concurrency limits Need to model the finiteness of queues!

Handling Concurrency Limits

dropped requests


QMp1 pM

S1 SM

Q0Q1 Q2 QM

Z

ZS1 S2 SM

N

Approach: Subsystems to capture dropped requests Distinguish the processing of dropped requests

Handling Concurrency Limits

dropQ1drop drop

dropdrop

drop


Estimating Drop Probabilities and Delay Values

Drop probability Step 1: Estimate throughput using MVA assuming no concurrency limits Step 2: Estimate pi

drop as the drop probability of M/M/1/Ki queue

Delay value for tier i Subject the application to offline workload that causes limit to be

exceeded only at tier i; record response time of failed requests

Highlimit

Highlimit

Lowlimit

Tput=tt t*(1-pi

drop)

t*pidrop

Ki


Enhanced model can capture concurrency limits

Response Time Prediction

0

5000

10000

15000

20000

25000

30000

0 100 200 300 400 500

ObservedBasic ModelEnh Model

Avg

resp

tim

e (m

sec)

Num sessions


Replication and Load Imbalances

Causes of imbalance “Sticky” sessions Variation in session durations and resource requirements

Imbalance factor for jth most-loaded replica of tier i imbalance(i, j) = num_arrivals(i, j) / num_arrivals(i)

Scale visit ratio Vi, j = Vi * imbalance(i, j)

Apache Mysql

JBOSS

JBOSS


Capturing Load ImbalanceNumber of requests (per-replica)

0

200

400

600

800

1000

30 90 150 210 210 270

Time (sec)

Num

ber o

f req

uest

s

Replica 1

Replica 2

Replica 3

Response times (based on load)

0200400600800

10001200140016001800

Observed PerfectLoad balancing

Enhanced Model

Avg

. res

p. ti

me

(mse

c)

Least loadedMedium loadedMost loadedAverage

Session affinity causes load imbalance Imbalance shifts among replicas

Our enhancement helps improve response time prediction

JBOSSApache Mysql

JBOSS


Talk Outline

Motivation Thesis contributions Application modeling

Dynamic provisioning Scalable request policing Conclusions


Dynamic Provisioning

Key idea: increase or decrease allocated servers to handle workload fluctuations Monitor incoming workload Compute current or future demand Match number of allocated servers to demand

Monitor workload

Compute current/future demand Adjust allocation

Auto. Computing’05


Dynamic Provisioning at Multiple Time-scales

Predictive provisioning Certain Internet workloads patterns can be predicted

o E.g., time-of-day effects, increased workload during Thanksgiving Provision using model at time-scale of hours or days

Reactive provisioning Applications may see unpredictable fluctuations

o E.g., Increased workload to news-sites after an earthquake Detect such anomalies and react fast (minutes)


Request Policing

Key Idea: If incoming req. rate > current capacity Turn away excess requests

Why police when you can provision? Provisioning is not instantaneous

o Residual sessions on reallocated servero Application and OS installation and configuration overheads

Overhead of several (5-30) minutes

Sentry policing

drop


Existing Work

Lots of existing work on request policing [Kanodia00, Li00, Verma03, Welsh03, Abdelzaher99, …]

Shortcomings of existing work: Does not attempt to integrate policing and provisioning Does not address scalability of the policer!

o The policer itself may become the bottleneck during overloads


Policer: Design Goals

Each class should sustain its guaranteed admission rate

Class-based differentiation and revenue maximization Challenging due to online nature of the problem

o An admitted request may cause a more important request arriving later to be dropped

Approach: Preferential admission to higher class requests

Scalability The policer should remain operational even under extremely

high arrival rates


Overview of Policer Design

Our policer has three components Request classifier and per-class leaky buckets Class-specific queues Admission control

Classifier

Leaky buckets

Class gold

Class silver

Class bronze

Class-specific queues

Admission controldgold

dsilver

dbronzedropped

admitted

PODC’04 / WWW’05


Class-based Differentiation

Classifier

Leaky buckets

Class gold

Class silver

Class bronze


dgold

dsilver

dbronze

Each incoming request undergoes classification Per-class leaky buckets used to ensure that

rates guaranteed in SLA are admitted

Admission control

dropped

admitted


Revenue Maximization

Classifier

Leaky buckets

Class gold

Class silver

Class bronze


dgold

dsilver

dbronze

Idea: Different delays in processing requests of different classes More important requests processed more frequently Methodology to compute delay values in online manner

Bounds probability of a request denying admission to a more important request [Appendix B of thesis]

Admission control

dropped

admitted


Admission Control

Classifier

Leaky buckets

Class gold

Class silver

Class bronze


dgold

dsilver

dbronze

Admission control

Goal: Ensure that an admitted request meets its response time target Measurement-based admission control algorithm Use information about current load on servers and estimated size of

new request to make decision

dropped

admitted


Scalability of Admission Control Idea #1: Reduce the per-request admission control cost Admission control on every request may be expensive

Bursty arrivals during overloads => batches get formed Delays for class-based differentiation => batches get formed

Admission control that operates on batches instead of requests

Idea #2: Sacrifice accuracy for computational overhead When batch-based processing becomes prohibitive

Threshold-based schemeo E.g., Admit all Gold requests, drop all Silver and Bronze requestso Thresholds chosen based on observed arrival rates and service times+ Extremely efficient- Wrong threshold => bad response times or fewer requests admitted


Scaling Even Further … Protocol processing overheads will saturate sentry

resources at extremely high arrival rates Indiscriminate dropping of requests will occur

o Important requests may be turned away without even undergoing the admission control test

o Loss in revenue! Sentry should still be able to process each arriving request!

Idea: Dynamic capacity provisioning for sentry Pull in an additional sentry if CPU utilization of existing

sentries exceeds a threshold (e.g., 90%) Round-robin DNS to load balance among sentries


Class-based DifferentiationArrival rate

0

50

100

150

200

250

0 100 200 300 400 500

Time (sec)

Arriv

al ra

te (r

eq/s

) Gold

Silver

Bronze

Fraction admitted

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500

Time (sec)

Frac

tion

adm

itted

GoldSilver

Bronze

Three classes of requests: Gold, Silver, Bronze Policer successful in providing preferential

admission to important requests


Threshold-based: Higher ScalabilityScalability

0

20

40

60

80

100

0 5000 10000 15000 20000

Arrival rate (req/s)

CPU

util

(%)

Batch

Threshold

Threshold-based processing allows the policer to handle upto 4 times higher arrival rate Single sentry can handle about 19000 req/s


Threshold-based: Loss of Accuracy

Admission rate

0

50

100

150

200

250

0 100 200 300 400 500

Time (sec)

Adm

issi

on ra

te (r

eq/s

)

Gold

Silver

Bronze

95th resp time

0

1000

2000

3000

4000

5000

0 100 200 300 400 500

Time (sec)

95th

resp

tim

e (m

sec)

Gold

Silver

Bronze

Higher scalability comes at a loss in accuracy of admission control

More violations of response time targets


Talk Outline

Motivation Thesis contributions Application modeling Dynamic provisioning Scalable request policing

Summary and Future Research


Thesis Contributions

Dynamic resource management in hosting platforms

Shared Hosting Statistical multiplexing and under-provisioning [OSDI 2002] Application placement [PDCS 2004]

Dedicated Hosting Analytical model for Internet applications [SIGMETRICS 2005] Dynamic provisioning [Autonomic Computing 2005] Scalable request policing [PODC 2004, WWW 2005]


Future Research Directions

Virtual machine based hosting Recent research has shown feasibility of migrating VMs across nodes Adds a new dimension to the capacity provisioning problem

Characterizing multi-tier workloads Workloads for standalone Web servers are well-characterized E.g., typical service times at Java tier or query processing times? Offshoot of this study: workloads generators for multi-tier applications

Automated determination of provisioning parameters Predictor and reactor invoked based on manually chosen frequencies System administrators use rules-of-thumb => error-prone


Thanks to … Advisor

Prashant Shenoy

Thesis committeeEmery Berger, Jim Kurose, Don Towsley, Tilman Wolf

Collaborators Abhishek Chandra, Pawan Goyal, Giovanni Pacifici, Timothy Roscoe, Arnold Rosenberg, Mike Spreitzer, Asser Tantawi

All my teachersPaul Cohen, Mani Krishna, Don Towsley

Friends and family


Questions or comments?


Query Caching at the Database

0200400600800

100012001400

0 20 40 60 80 100

ObservedModel

Avg

resp

tim

e (m

sec)

% queries cached

Caching effects Captured by tuning Vi and/or Si

Bulletin-board site RUBBoS 50 sessions

SELECT SQL_NO_CACHE causes Mysql to not cache the response to a query


Agile Switching Using Virtual Machine Monitors

Use VMMs to enable fast switching of servers Switching time only limited by residual sessions

VMM VMM

active dormant dormant active

VM1 VM1 VM2 VM3VM2 VM3

VMMs allow multiple “virtual” m/c on a server E.g., Xen, VMWare, …


Prototype Data Center

40+ Linux servers Gigabit switches Multi-tier applications

Auction (RUBiS) Bulletin-board (RUBBoS) Apache, JBOSS (replicable) Mysql database

Control Plane Application placementDynamic provisioning

Nuc

leus

Apps

OS

Server NodeApplication capsulesSentries

Resource monitoringParameter estimationN

ucle

us

Apps

OS Nuc

leus

Apps

OS


Sentry Provisioning (XXX)

CPU util

0

20

40

60

80

100

0 100 200 300 400 500 600

Time (sec)

CPU

util

(%)

CPU util

Arrival rate

0

10000

20000

30000

40000

50000

0 100200 300 400 500 600

Time (sec)

Arr

ival

rate

(req

/s)

Total arrival

Arrival at sentry 1


System Overview

Control Plane Centralized resource manager

Nucleus Per-server measurements and resource management

Sentry Per-application admission control

Capsule Component of an application running on a server

Control Plane

Nuc

leus

Apps

OS

Server NodeApplication capsulesSentries

Resource monitoringParameter estimationN

ucle

us

Apps

OS Nuc

leus

Apps

OS

Application placementDynamic provisioning


Existing Application Models Models for Web servers [Chandra03, Doyle03]

Do not model Java server, database etc.

Black-box models [Kamra04, Ranjan02] Unaware of bottleneck tier

Extensions of single-tier models [Welsh03] Fail to capture interactions between tiers

Existing models inadequate for multi-tier Internet applications


Existing Work Predictable resource management within a single server

Proportional-share schedulers for CPU, network [Duda,Goyal,Waldspurger]o Multi-processors [Chandra]

Memory management [Berger,Waldspurger] Disk scheduling [Shenoy]

Hosting platforms and Internet applications Rice, Duke, Penn State: shared platforms for Web servers IBM, HP Labs: shared platforms, workload prediction Berkeley: novel architecture for Internet applications

Main shortcomings Possible statistical multiplexing gains in shared platforms unexplored Most work assumes simplistic applications (e.g., only Web servers) Provisioning either purely reactive or purely predictive Handling of extreme overloads not addressed satisfactorily


Predictive Provisioning

Allocator

Predictors

Monitor

ApplicationModels

Predictedworkload

Observedworkload

Resourcereqmts

Servers

Workloadmeasurements

Serverallocations


Reactive Provisioning

Idea: react to current conditions Useful for capturing significant short-term fluctuations Can correct errors in predictions

Track error between long-term predictions and actual Allocate additional servers if error exceeds a threshold Can be invoked if request drop rate exceeds a threshold

Operates over time scale of a few minutes Pure reactive provisioning: lags workload

Reactive + predictive more effective!

Predictionerrorpred

actual

error > Invokereactortime series

allocate servers


Dynamic Capacity Provisioning

01000200030004000500060007000

0 10 20 30 40 50 60

Res

p tim

e (m

sec)

Time (min)

Workload Response timeServer allocations

Auction application RUBiS Factor of 4 increase in 30 min

0

2

4

6

8

10

12

0 10 20 30 40 50 60

Web serversApp servers

Num

ber o

f ser

vers

Time (min)

20406080

100120140160

0 10 20 30 40 50 60

Arr

ival

s pe

r min

Time (min)

Server allocations increased to match increased workload Response time kept below 2 seconds

Documents

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense