22
How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim [email protected] [email protected] [email protected] Department of Electrical and Computer Engineering Carnegie Mellon University LISA 2010 - November 11, 2010

How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim [email protected] [email protected]

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

How to Tame your VM: an Automated Control System for Virtualized Services

Akkarit Sangpetch Andrew Turner Hyong Kim [email protected] [email protected] [email protected]

Department of Electrical and Computer Engineering

Carnegie Mellon University

LISA 2010 - November 11, 2010

Page 2: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Virtualized Services

• Virtualized Infrastructure (vSphere,Hyper-V,ESX,KVM)

– Easy to deploy

– Easy to migrate

– Easy to re-use virtual machines

• Multi-tier services are configured and deployed as virtual machines

2

DB2

App1

Hypervisor

App2

Web1

Hypervisor

Web2

DB1

Hypervisor

Db3 Web3 App3

Page 3: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Problem: What about performance?

• VMs share physical resource (CPU time / Disk / Network / Memory bandwidth)

• Need to maintain service quality

– Response time / Latency

• Existing infrastructure provides mismatch interface between the service’s performance goal (latency) and configurable parameters (resource shares) – Admins need to fully understand the

services before able to tune the system

3

DB2

App1

Hypervisor

App2

Web1

Hypervisor

Web2

DB1

Hypervisor

Db3 Web3 App3

Page 4: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Existing Solutions – Managing Service Performance

• Resource Provisioning

– Admins define resource shares / reservations / limits

– The “correct” values depends on hardware / applications / workloads

• Load Balancing

– Migrate VMs to balance host utilization

– Free up capacity for each host (low utilization ~ better performance)

4

vm2

vm1 vm3

Host 1

vm5

Host 2

vm4

vm4

vm1

vm2

vm3

vm4

50

10

30

10

# Shares

Page 5: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Automated Resource Control

• Dynamically adjust resource shares during run-time

• Input = required service response time – I want to serve pages from

vm1 in less than 4 seconds

• The system calculates number of shares required to maintain service level – Adapt to changes in workload

– Admins do not have to guess the number of shares

5

vm1

vm2

vm3

vm4

25

25

25

vm1

vm2

vm3

vm4

50

10

10

30

25

Control system

vm1 target vm1 response

vm2 target vm2 response

vm3 target vm3 response

vm4 target vm4 response

vm1 target vm1 response

vm2 target vm2 response

vm3 target vm3 response

vm4 target vm4 response

Page 6: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Control System

Controller

Modeler

VM

VM

Sensor

Actuator

VM

VM

Sensor

Actuator

KVM Host

VM

VM

Sensor

Actuator

6

KVM Host KVM Host

Record service response time

Adjust host CPU shares

Calculate model parameters

Calculate # of shares required

Page 7: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Sensor unit

Server VM

Sensor VM

Host Bridge

Request Packets

xtables TEE

tshark

To Modeler analyzer

{service:blog, tier:web, vm: www1, response time 250 ms}, {service:blog, tier:db, vm:db1, response time 2.5 ms}

7

• Record service response time • Starts when the server sees

“request” packet • Until the server sends

“response” packet • Based on libpcap / tshark – need

to recognize the application’s header

Request Packets

Request Packets

Page 8: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Actuator Unit

Actuator Agent

Hypervisor

Shares Allocation

8

CPU Cgroup

Server 2 VM

• Adjust number of shares based on controller’s allocation

• We use Linux/KVM as our hypervisor • Cgroup cpu.shares –

control CFS scheduling shares

• Similar mechanism exists on other platform (Xen weight / ESX shares)

Server 1 VM

50

{server1:80, server2:20 }

80

50

20

Page 9: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Modeler

Web server Database

Client

TDB

Tweb

9

Tweb = f (TDB ) TDB = f (Scpu)

• The modeler calculates the expected service response time based on shares allocated to each VM

• We use an intuitive model for our 2-tier services in the experiment

Scpu

Page 10: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Resource Shares Effect

Host 1 Host 2

Web server

VM (Wordpress)

SQL Server VM

(mysql)

Cruncher VM

(100% cpu)

NAS

Test Clients

10

Page 11: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

SQL Server CPU Allocation effect

0

50

100

150

200

250

300

350

400

450

500

0

100

200

300

400

500

600

700

800

1 101 201 301 401 501 601 701 801 901

# o

f D

B C

PU

sh

are

s (o

ut

of

10

00

)

We

b R

esp

on

se T

ime

(m

s)

Time step (5s)

Web Response

thttp-msec cpu

0

50

100

150

200

250

300

350

400

450

500

0

5

10

15

20

25

30

35

40

1 101 201 301 401 501 601 701 801 901

# o

f D

B C

PU

sh

are

s (o

ut

of

10

00

)

Daa

bas

e R

esp

on

se T

ime

(m

s)

Time step (5s)

Database Response

tdb-msec cpu

11

Page 12: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Database Response vs CPU

12

0

2

4

6

8

10

12

14

16

18

20

0 50 100 150 200 250 300 350 400 450 500

Dat

abas

e R

esp

on

se T

ime

(m

s)

# of CPU Shares allocated to SQL VM (out of 1000)

TDB = a0 (Scpu)b0

Page 13: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Database Response vs CPU (controlled)

13

0

5

10

15

20

25

30

0 100 200 300 400 500 600 700 800 900 1000

Dat

abas

e R

esp

on

se T

ime

(m

s)

# of CPU Shares allocated to SQL server VM (out of 1000)

Page 14: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

HTTP vs Database response

0

100

200

300

400

500

600

700

800

0 2 4 6 8 10 12 14 16 18 20

We

b R

esp

on

se (

ms)

Database Response (ms)

14

Tweb = a1 (TDB ) + b1

Page 15: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Controllers

• Model uses readings to estimate the service parameters <a0,b0,a1,b1> – TDB = a0 (Scpu)b0

– Tweb = a1 (TDB ) + b1

• Controller finds the minimal Scpu such that Tweb < specified response time – Long-term control: uses moving average to find model

parameters & shares – Short-term control: uses the last-period reading to

find model parameters • Avoid excessive response time violation while waiting for the

long-term model to settle

15

Page 16: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

System Evaluations

Host 1 Host 2

NAS

Test Clients

Host 3

Web server 1 VM DB Server 1

VM

Web server 2 VM

DB server 2 VM

Actuator Sensor Sensor Sensor

16

Page 17: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Static Workload

0

1000

2000

3000

4000

5000

6000

7000

8000

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191

web1 web1-target web2 web2-target

17

Serv

ice

Re

spo

nse

Tim

e (

ms)

Se

rvic

e R

esp

on

se T

ime

(m

s)

0

1000

2000

3000

4000

5000

6000

7000

8000

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191

web1 web1-target web2 web2-target

Page 18: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Dynamic Workload

18

0

1000

2000

3000

4000

5000

6000

7000

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601

web1 web1-target web2 web2-target 50 per. Mov. Avg. (web1) 50 per. Mov. Avg. (web2)

Web1 Load

0

1000

2000

3000

4000

5000

6000

7000

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601

web1 web1-target web2 web2-target 50 per. Mov. Avg. (web1) 50 per. Mov. Avg. (web2)

Serv

ice

Re

spo

nse

Tim

e (

ms)

Se

rvic

e R

esp

on

se T

ime

(m

s)

Page 19: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Deviation from Target

Without Control With Control

Mean Deviation (ms)

Target Violation Period

Mean Deviation (ms)

Target Violation Period

Static Load

Instance 1 540 0% 109 (-80%) 9%

Instance 2 1043 100% 282 (-73%) 26%

Dynamic Load

Instance 1 276 7% 182 (-34%) 9%

Instance 2 354 27% 336 (-5%) 12%

19

Our control system helps track the target service response time

Page 20: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Further Improvements

• Enhance Service Model

– Service dependencies

• Cache / Proxy / Load balancer effects

– Non-cpu resource (disk / network / memory)

• Robust Control system

– Still susceptible to temporal system effects (garbage collections / cron jobs)

– Need to determine feasibility of the target performance

20

Page 21: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Conclusions

• Automated Resource Control system

– Maintain target system’s response time

– Allows admin to express allocation in term of performance target

• Managing service performance could be automate with sufficient domain knowledge

– Need basic framework to describe performance model

– Leave the details to the machines (parameter fitting / optimization / resource monitoring)

21

Page 22: How to Tame your VM - USENIX · How to Tame your VM: an Automated Control System for Virtualized Services Akkarit Sangpetch Andrew Turner Hyong Kim asangpet@andrew.cmu.edu andrewtu@andrew.cmu.edu

Sample Response & Share values

22

0

100

200

300

400

500

600

700

1 51 101 151 201 251 301 351

tWeb

p95

target

0

100

200

300

400

500

1 51 101 151 201 251 301 351

shares

share