24
Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy Modelling a supercomputer with the model Australia and New Zealand Applied Probability Workshop

Modelling a supercomputer with the model

  • Upload
    trang

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Modelling a supercomputer with the model. Tuan V. Dinh , Lachlan Andrew and Yoni Nazarathy. Australia and New Zealand Applied Probability Workshop. Supercomputer clusters. large scale simulation: climate, genome, astronomy, etc. foundation of cloud computing. - PowerPoint PPT Presentation

Citation preview

Page 1: Modelling a supercomputer with the                 model

Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy

Modelling a supercomputer with the model

Australia and New Zealand Applied Probability Workshop

Page 2: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 2 Australia and New Zealand Applied Probability Workshop

Supercomputer clusters

large scale simulation: climate, genome, astronomy, etc.

foundation of cloud computing

BIG DATAEXASCALE

COMPUTINGMORE COMPUTING POWER DESIRED

Electricity bills

Heat – thermal management

Investment – cooling systems,

hardware, etc.

Page 3: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 3 Australia and New Zealand Applied Probability Workshop

Power proportionality

Load

Pow

er

ideal

reality

60% peak

single server(1)

(1) Bassoro, “The case for energy proportional”, 2007.

idle server ~ 60% peak

power

turn off idle servers

challenges: switching cost

(setup, wear-and-tear), performance impacts ?

Swinburne Supercomputer

Page 4: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 4 Australia and New Zealand Applied Probability Workshop

An energy saving framework

CONTROL FRAMEWORK

system congestion

model

number of active servers needed ?

historical implications

?ongoingsystem states ?

arrival characteristics ?

job elapsed times ?

min ( )energyperformance

penaltyswitching+ +

Objective:

Page 5: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 5 Australia and New Zealand Applied Probability Workshop

Congestion model

CONTROL FRAMEWORK

number of active servers needed ?

historical implications

?ongoingsystem states ?

arrival characteristics ?

job elapsed times ?

min ( )energyperformance

penaltyswitching+ +

Objective:

Page 6: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 6 Australia and New Zealand Applied Probability Workshop

Congestion model -

1

2

3

batch Poisson, rate function

batch size distributionwith c.d.f

i.i.d service time

WHY ?

jobs arrive in “batch” manner,

i.e within seconds, from same user

system mostly under-utilized,

using infinite server approximation

substantial daily variations

Page 7: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 7 Australia and New Zealand Applied Probability Workshop

Discrete-time cost

timeT+tt

: current running jobs

t +k

{jobs arriving in (t,t+k],

still around at t+k} {jobs arriving before t, still around at t+k}

C(k) = n(k) + |n(k) – n(k-1)| +

C1(k):energy C3(k):performance penaltyC2(k):switching

Page 8: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 8 Australia and New Zealand Applied Probability Workshop

Optimization formulationC(k) = n(k) + |n(k) – n(k-1)|+

C1(k):energy C3(k):performance penaltyC2(k):switching

(*)

solving (*): load estimation in far future.

the system can feedback the ACTUAL load U(s) for s

< k

Page 9: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 9 Australia and New Zealand Applied Probability Workshop

A Model Predictive Control framework

CONTROL FRAMEWORK

number of active servers needed ?

historical implications

?

ongoingsystem states ?

arrival characteristics ?

job elapsed times ?

min ( )energyperformance

penaltyswitching+ +

Objective:

MPC

Page 10: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 10 Australia and New Zealand Applied Probability Workshop

Model Predictive Control execution

timeT+tt

T

Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0).

t +1

T

T+t+1

Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0).

(**)

Limited look-ahead

1. less sensitive to load estimation accuracy2. Use “on-going” information

know how many jobs actually arrived in (t,t+1]

Page 11: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 11 Australia and New Zealand Applied Probability Workshop

Solving the optimization problem

{ n(k) + |u(k)| } (***)

s.t: ,

k =0,1…,K-1

Normal approximation

C(k) = n(k) + |n(k) – n(k-1)|+

C1(k):energy C3(k):performance penaltyC2(k):switching

k =0,1…,K-1

solved numerically using LP

Page 12: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 12 Australia and New Zealand Applied Probability Workshop

X(k): new arrivals

[Carrillo,89]: is a compound Poisson RV, with batch rate:

, where s = (k+1/2)Δ; Δ: slot-time.

even if the arrival process is NOT Poisson, [Whitt,99].

{jobs arriving in (t,t+k],

still around at t+k}

N ~ Poisson( )

bi: i.i.d batch size, mean and variance

Page 13: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 13 Australia and New Zealand Applied Probability Workshop

U(k): existing jobs

[Carrillo,91]: is a binomial RV, with parameters:

and , where s = (k+1/2)Δ; Δ: slot-time.

Hence:

{jobs arriving before t, still around at t+k}

one can use job elapsed runtimes to calculate

[Whitt,99]

Page 14: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 14 Australia and New Zealand Applied Probability Workshop

Summary of analytical framework

CONTROL FRAMEWORK

number of active servers needed ?

historical implications

?

ongoingsystem states ?

arrival characteristics ?

job elapsed times ?

Objective:

MPC

LP optimization

Normal approximation

min ( )energyperformance

penaltyswitching+ +

Page 15: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 15 Australia and New Zealand Applied Probability Workshop

Numerical evaluation

supercomputer simulator CONTROLLER

system states

control decision

Swinburne supercompute

rlogs

cost performance

Page 16: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 16 Australia and New Zealand Applied Probability Workshop

Scheme 1: All up (no turn off)

supercomputer simulator

system states

control decision

cost performanceNO CONTROL

Swinburne supercompute

rlogs

Page 17: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 17 Australia and New Zealand Applied Probability Workshop

Scheme 2: twait heuristic

supercomputer simulator

system states

control decision

cost performancetwait heuristic

Server idle for twait

=> turn OFF

Swinburne supercompute

rlogs

Page 18: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 18 Australia and New Zealand Applied Probability Workshop

Scheme 3: predictive control

supercomputer simulator

system states

control decision

cost performanceMPC

estimated from historical

data

Swinburne supercompute

rlogs

Page 19: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 19 Australia and New Zealand Applied Probability Workshop

S.3: rate function

time of day

rate

arr

ivals

2010 2011

use daily periodic rates

Page 20: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 20 Australia and New Zealand Applied Probability Workshop

S.3: service time & batch size

[Lublin et al.,2003]: Hyper-Gamma, Log-uniform

[Li et al.,2005]: Log Normal, Weibull

Empirical (2010)

Gamma

time(sec)

c.d

.f

size(CPU)

c.d

.fOur approximations only concern MEAN and VARIANCE of X

X: batch size

G: service time

(2010)

Page 21: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 21 Australia and New Zealand Applied Probability Workshop

S.3: cost performance

ε ~ service availability

norm

alis

ed c

ost

Cost 1 = total cost when there is NO CONTROL (energy only)

Simulation period: 1 year

Page 22: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 22 Australia and New Zealand Applied Probability Workshop

Cost performance: all schemes

“offline”

optimal cost[Lu et al., 12].

No perf. penalty

S.1 S.2 S.3, ε = 0.58

consider predictive settings (S.3) whose demand penalty cost is the same as twait

heuristic (S.2)

after all, model is to estimate

θ(k)s.

still > 20% to gain

Page 23: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 23 Australia and New Zealand Applied Probability Workshop

Remarks and considerations

1. Room for improvement: ~20% to gain!

2.Examining our estimations ?

rate function not accurate

Use job elapsed times

Normal approximation

?

3. Fundamental bound on what to achieve given uncertainty ?

[Dinh,Andrew and Branch,CCgrid13]

Page 24: Modelling a supercomputer with the                 model

http://caia.swin.edu.au/cv/tdinh 10 July 2013 Slide 24 Australia and New Zealand Applied Probability Workshop

Thank you

CONTROL FRAMEWORK

number of active servers needed ?

historical implications

?

ongoingsystem states ?

arrival characteristics ?

job elapsed times ?

Objective:

MPC

LP optimization

Normal approximation

min ( )energyperformance

penaltyswitching+ +