David Oppenheimer

David Oppenheimer

UCB ROC Retreat12 January 2005

A case for resource discovery in shared distributed platforms

Introduction• Application performance is a function of

1. resources available to the application2. resources needed by the application

• or, “application sensitivity to resource constraints”

• At summer retreat, described SWORD at app deployment time, find best set of nodes given

1. resources available on a set of distributed nodes2. application sensitivity to resource constraints

assumptions1. available resources vary among nodes enough to

matter• spare CPU, mem, disk space; inter-node latency, avail.

bw; ...

2. applications are sensitive to resource constraints enough to matter

• Focus of this talk: verify assumption (1)

Introduction (cont.)

• Questions we will address is there enough variation among nodes at any

given (deployment) time to justify service placement?

is there enough variation over time on a single node to justify periodic task migration?

are there correlations between attributes on a single node, or among nodes at the same site?

• All of these questions are important in designing a system for resource discovery and service placement (like SWORD)

Outline

1. How much does the available amount of per-node resources vary among nodes at a fixed time?

2. How much does the available amount of per-node resources vary over time? How much do inter-node latency and available bandwidth vary over time?

3. On a given node, are any per-node attributes strongly correlated? Are inter-node latency and available bandwidth correlated?

Experimental environment

• Per-node attributes: Ganglia, CoMon two-week period (Oct 10-Oct 24, 2004) each node polled every 5 minutes free memory, free swap, free disk, load average,

network bytes sent and received/sec, # active slices

• Inter-node latency: all-pairs pings one month period ending Oct 24, 2004 each pair of nodes measured every 15 minutes

• Inter-node bandwidth: Iperf one month period ending Oct 24, 2004 each pair of nodes measured 1-2x/week

• About 250 nodes in the trace each day

Outline




Resource heterogeneity: averages

• How much does available resources vary over the trace?

attribute meanstd. dev.

10th %ile

90th %ile

# of CPUs 1.0 0.0 1.0 1.0

CPU speed (MHz) 1942 572 1263 2652

Total disk (GB) 127 88.5 35.1 232

Total memory (MB)

1153 467 628 2017

Total swap (GB) 1.0 0.0 1.0 1.0

Resource heterogeneity: averages

• How much does available resources vary over the trace?

attribute meanstd. dev.

10th %ile

90th %ile

1 min load average

6.81 20.06 1.05 11.86

Free memory (MB)

62.359 125.234 13.668 105.432

Free swap (MB)755.59

6178.795 524.336 1000.268

Free disk (GB) 102.8 86.04 8.088 208.3

Active slices 13.3 5.96 0.0 20.0

Bytes/s in 50477 117023 5568 92877

Bytes/s out 52543 130112 5476 96214

Resource heterogeneity: CV vs. time

0

1

2

3

4

5

6

time

Coe

ffic

ient

of V

aria

tion

aslices

bytes_in

bytes_out

disk_free

load_one

mem_free

swap_free

swap_freeaslices

disk_free

mem_free

load_one

bytes_in

bytes_out

bytes_out

bytes_in

Outline




Variability of per-node attributes over time

Average magnitude of % difference from initial value, 30 minute windows

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100

Average magnitude of % difference from initial value

Fra

ctio

n o

f n

od

es

disk_free

swap_free

aslices

mem_free

load_one

bytes_out

bytes_in

Variability of per-node attributes over timeAverage magnitude of % difference from initial value, 1 hour windows

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100


Fra

ctio

n o

f n

od

es

disk_free

swap_free

aslices

mem_free

load_one

bytes_out

bytes_in

Variability of per-node attributes over timeAverage magnitude of % difference from initial value, 4 hour windows

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100


Fra

ctio

n o

f n

od

es

disk_free

swap_free

load_one

mem_free

aslices

bytes_out

bytes_in

Variability of per-node attributes over time

• Can rank degree of variability of each attribute disk, swap < mem, load < net bytes; #slices mod to

sig.

• CDF curve shifts to right as interval length incrs. attributes vary less over short time periods than long migration interval: find “sweet spot” in curve of

variability vs. interval length

• CDF slope decreases as median var. of attr. incr. may be able to classify nodes as high/low var. over

time for mem, load, net bytes (they have high median var.)

Average magnitude of % difference from initial value, 30 minute windows

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100


Fra

ctio

n o

f n

od

es

disk_free

swap_free

aslices

mem_free

load_one

bytes_out

bytes_in

Average magnitude of % difference from initial value, 1 hour windows

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100


Frac

tion

of n

odes

disk_free

swap_free

aslices

mem_free

load_one

bytes_out

bytes_in

Average magnitude of % difference from initial value, 4 hour windows

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100


Frac

tion

of n

odes

disk_free

swap_free

load_one

mem_free

aslices

bytes_out

bytes_in

Inter-node latency and BW variation over timeAverage magntude of % difference from initial value for latency and bandwidth

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80


Fra

cti

on

of

no

de

pa

irs

Latency

Bandwidth

• Most nodes have low latency (and bw) variability even over a month-long trace migration may not be worthwhile

Outline




Correlation among per-node attributes

• No strong correlations between different attrs. though some one-hour trace segments had some

• Some correlation between nodes at same site

rloadone

memfree

swapfree

diskfree

actvslice

byte_inbyte_ou

t

loadone

.080

memfree

-.050 .627

swapfree

-.231 .274 .473

diskfree

-.035 .192 .212 .929

actvslice

.079 -.050 -.219 .049 .773

byte_in .059 -.033 -.074 .059 .140 .209

byte_out

.058 -.033 -.059 .078 .137 .443 .188

Correlation between latency and avail BW

0

2000

4000

6000

8000

10000

12000

0 20 40 60 80 100 120 140 160 180 200

Latency (ms)

Ba

nd

wid

th (

kb

/s)

• Moderate inverse power law correlation Using latency to estimate BW gives 233% error

• some nodes are bandwidth-capped, some in weird ways

• Some node pairs showed strong lat-BW correlation 17% within 25%, 56% within 50%

r=-.59

Conclusion1. How much does the available amount of per-

node resources vary among nodes at a fixed time?

significantly; enough to warrant svc. placement


moderate variability; may warrant migration3. On a given node, are any per-node attributes

strongly correlated? Are inter-node latency and available bandwidth correlated?

no strong correlation between diff. attrs.some correlation between same attr, same

sitelatency can predict avail. bandwidth

Future work

• Ask same questions but use application model to answer, rather than analysis of raw data different apps have different resource

sensitivities different apps have different migration costs

• Can we predict attribute values? give warning before migration or just don’t bother to deploy on “bad” nodes

• How much “better” could we do if SWORD could schedule jobs?

Documents

David Oppenheimer