48
Scalably Verifiable Dynamic Power Management Opeoluwa (Luwa) Matthews, Meng Zhang, and Daniel J. Sorin Duke University HPCA-20 Orlando, FL, February 19, 2014

Opeoluwa (Luwa) Matthews, Meng Zhang, and Daniel J. Sorin

  • Upload
    kimball

  • View
    63

  • Download
    0

Embed Size (px)

DESCRIPTION

Scalably Verifiable Dynamic Power Management. Opeoluwa (Luwa) Matthews, Meng Zhang, and Daniel J. Sorin. Duke University. E xecutive Summary. Dynamic Power Management (DPM) used to improve power-efficiency at several levels of computing stack - PowerPoint PPT Presentation

Citation preview

Page 1: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Scalably Verifiable Dynamic Power Management

Opeoluwa (Luwa) Matthews, Meng Zhang,

and Daniel J. Sorin

Duke University

HPCA-20 Orlando, FL, February 19, 2014

Page 2: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• Dynamic Power Management (DPM) used to improve power-efficiency at several levels of computing stack

-within multicore chip, across servers in datacenter, etc.

• Deploying DPM scheme risky if not fully verified-difficult to verify scheme for large-scale systems

• Our contribution: Fractal DPM-framework for designing scalably verifiable DPM-implement Fractal DPM on 2-chip (16-core) system-experimental evaluation on real system

Executive Summary

HPCA-20 Orlando, FL, February 19, 2014

Page 3: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

DPM aims to: -dynamically allocate power to computing resources

(e.g. cores, chips, servers, etc.)-attain best performance at given power budget-achieve lowest power consumption for desired performance

n cores in CMP

DPMR

eque

st P

ower R

equest Pow

ergr

ant deny

Dynamic Power Management

HPCA-20 Orlando, FL, February 19, 2014

Page 4: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

……

DPMR

eque

st P

ower

Req

uest P

ow

er

gra

nt d

eny

n machines in datacenter

Dynamic Power ManagementDPM aims to: -dynamically allocate power to computing resources

(e.g. cores, chips, servers, etc.)-attain best performance at given power budget-achieve lowest power consumption for desired performance

HPCA-20 Orlando, FL, February 19, 2014

Page 5: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

[Hennessy and Patterson Computer Architecture]

• Chips have hit power density ceiling

Case for Dynamic Power Management

HPCA-20 Orlando, FL, February 19, 2014

Page 6: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

[hp.com]

Reducing cloud electricity consumption by half saves as much as UK consumes

• Datacenters consume increasing amounts of power

Case for Dynamic Power Management

Cloud map of UK

HPCA-20 Orlando, FL, February 19, 2014

Page 7: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Case for Verifiable DPM

• Want formal verification - prove correctness for all possible DPM allocations

- guarantee safety of DPM scheme

• DPM can greatly improve energy efficiency

• Unverified DPM could -overshoot power budget system damage-underutilize resources-deadlock

HPCA-20 Orlando, FL, February 19, 2014

Page 8: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• CMPs and datacenters have many computing resources

S power

states per CR+ ⟹

Sn

possible DPM states

Why Scalably Verifiable DPM is Hard

n computing resources (CR)

• Checking Sn states is intractable for typical values of S and n

HPCA-20 Orlando, FL, February 19, 2014

Page 9: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Hypothesis and AssumptionsProblem: verification of existing DPM protocols is unscalable

Hypothesis: We can design DPM such that it is scalably verifiable

-key idea: design DPM amenable to inductive verification-change architecture to match verification methodologies

Approach:-abstract away details of computing resources-abstract power states – e.g. Medium power-focus on decision policy (not mechanism e.g. DVFS)

HPCA-20 Orlando, FL, February 19, 2014

Page 10: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Outline

• Background and Motivation

• Fractal DPM

• Experimental Evaluation

• Conclusions

HPCA-20 Orlando, FL, February 19, 2014

Page 11: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Our Inductive Approach• Induction key to scalable verification can prove DPM

correct for arbitrary number of computing resources

• Base case: small scale system with few CRs is correct - small enough that it’s easy to verify with existing tools

• Inductive step: system behaves the same at every scale fractal behavior

• Prove base case + prove inductive step DPM scheme is correct for any number of CRs

• Approach more general than DPM, borrowed from prior work on coherence protocols [Zhang 2010]

HPCA-20 Orlando, FL, February 19, 2014

Page 12: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Attaining Scalable Verification

-base case of induction• CRs request power from DPM controller

• DPM controller grants or denies each request

• Few states easy to verify that DPM is correct note: over-simplified base case for now

Req

ues

t Pow

er

Gra

nt/D

eny

DPM-C

CRCR

HPCA-20 Orlando, FL, February 19, 2014

Page 13: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

CR

DPM-C

CR

CR

Root DPM-C

• Base Case -Refine our base case a little-Need all types of structures: CR, DPM-C, Root DPM-C

Attaining Scalable Verification

-base case of induction

HPCA-20 Orlando, FL, February 19, 2014

Page 14: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• behavior must be fractal

Req

ues

t Pow

er

Gra

nt/D

eny

DPM-C

CRCR

Attaining Scalable Verification-inductive step

HPCA-20 Orlando, FL, February 19, 2014

Page 15: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Req

ues

t Pow

er

Gra

nt/D

eny

DPM-C

CRR

eques

t Pow

er

Gra

nt/D

eny

DPM-C

CRCR

• can scale system by replacing CR with larger system

{DPM-C + 2 CRs} “behaves just like” 1 CR observational equivalence

Attaining Scalable Verification-inductive step

HPCA-20 Orlando, FL, February 19, 2014

Page 16: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

1) “Looking-down” equivalence check

Attaining Scalable Verification-observational equivalence

• Inductive Step – Two Observational Equivalences

Observed externally from P1, A and A’ behave same

A A’

O1

O1

(a) Small System (b) Large System

Small SystemLarge System

A A’

P1

P1

HPCA-20 Orlando, FL, February 19, 2014

Page 17: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• By induction, protocol correct for all scales

2) “Looking-up” equivalence check

Attaining Scalable Verification -observational equivalence

• Inductive Step – Two Observational Equivalences

Observed externally from P2, B and B’ behave same

B B’

O2

O2

(a) Small System (b) Large System

Large System

Small System

B’B

P2

P2

HPCA-20 Orlando, FL, February 19, 2014

Page 18: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• CR can be in 1 of 5 power states: L(ow), LM, M(ed), MH and H(igh)

• Parent DPM controller “sees” child DPM controller in averaged state

• DPM controller state is <Left Child State>:<Right Child State>

H L

H:LL

M:L

M

Fractal DPM Design

Avg(H:L) = M

HPCA-20 Orlando, FL, February 19, 2014

Page 19: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• CR can be in 1 of 5 power states: L(ow), LM, M(ed), MH and H(igh)

• Parent DPM controller “sees” child DPM controller in averaged state

• DPM controller state is <Left Child State>:<Right Child State>

Fractal DPM Design

MH H

MH:HL

H:L

H

Avg(MH:H) = H

HPCA-20 Orlando, FL, February 19, 2014

Page 20: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design-fractal invariant

• Fractal design + inductive proof invariant must also be fractal- Invariant must apply at every scale of system- Not OK to specify, e.g., <75% of all CRs are in H state

• Our fractal invariant: children of DPM controller not both in H

H H

H:HL

H:L

H MH

H:MHH

H:H

ILLEGAL ILLEGAL

H

HPCA-20 Orlando, FL, February 19, 2014

Page 21: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Translating Fractal Invariant to System-Wide Cap

• We must have fractal invariant for fractal design• But most people interested in system-wide invariants

• We prove (not shown) that our fractal invariant implies system-wide power cap

• Max power for n CRs is: (n-1)MH + Hi.e., (n-1) CRs in state MH and one CR in state H

HPCA-20 Orlando, FL, February 19, 2014

Page 22: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design -illustration

• CR requests MH

H L

L

M:L

H:L

Req

. MH

HPCA-20 Orlando, FL, February 19, 2014

Page 23: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

H L

L

M:L

MH:L

block

Gra

nt M

H

Fractal DPM Design -illustration• CR requests MH

• Granting request doesn’t change controller’s Avg stateAvg(H:L)=Avg(MH:L)=M

• Request Granted, doesn’t violate invariant

• Controller blocks waiting for ack

HPCA-20 Orlando, FL, February 19, 2014

Page 24: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design -illustration

• CR sends ack to Controller

MH L

L

M:L

MH:L

block

a

ck

• CR sets its state

HPCA-20 Orlando, FL, February 19, 2014

Page 25: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design -illustration

• Controller unblocks

H L

L

M:L

H:L

HPCA-20 Orlando, FL, February 19, 2014

Page 26: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• Computing Resource requests H

Fractal DPM Design -illustration

L L

L

L:L

L:L

R

eq. H

HPCA-20 Orlando, FL, February 19, 2014

Page 27: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• Controller defers request to its parent-new request is M (not H) because Avg(H:L)=M

• CR requests H from its Controller

Fractal DPM Design -illustration

L L

L

L:L

L:L

R

eq. M

R

eq. H

HPCA-20 Orlando, FL, February 19, 2014

Page 28: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design -illustration

• Root grants request to Controller, blocks

L L

L

M:L

L:L

G

rant

M

block

HPCA-20 Orlando, FL, February 19, 2014

Page 29: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• Controller grants request to CR, blocks

Fractal DPM Design -illustration

L L

L

M:L

H:L

G

rant

H

block

Gra

nt M

block

HPCA-20 Orlando, FL, February 19, 2014

Page 30: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design -illustration

• acks percolate up tree from CR

H L

L

M:L

H:L

a

ck

block

block

HPCA-20 Orlando, FL, February 19, 2014

Page 31: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design -illustration

• acks percolate up tree from CR

H L

L

M:L

H:L

a

ck

block

• Controllers unblock upon receiving ack

a

ck

HPCA-20 Orlando, FL, February 19, 2014

Page 32: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Fractal DPM Design -illustration

• acks percolate up tree from CR

H L

L

M:L

H:L

• Controllers unblock upon receiving ack

HPCA-20 Orlando, FL, February 19, 2014

Page 33: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• Use same model checker to verify observational equivalences- use prior aggregation method for equivalence check

(Park, TCAD 2000)

• Use model checker to verify base case- we use well-known, automated Murphi model checker

Verification Procedure

HPCA-20 Orlando, FL, February 19, 2014

Page 34: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Outline• Background and Motivation

• Fractal DPM

• Experimental Evaluation

• Conclusions

HPCA-20 Orlando, FL, February 19, 2014

Page 35: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

⟹ overshooting system-wide power cap

Illegal: total power = 4MHLegal: total power = 4MH

violates fractal invariant

• Our fractal invariant implies system-wide cap > n*MH

MH MH

MH:MH

MH MH

MH:MH MH:MH

M M

M:H

H H

H:HM:M

• Violating fractal invariant

• Situations are few and don’t significantly degrade performance

Experimental Evaluation-fractal inefficiency: cost of fractal behavior

HPCA-20 Orlando, FL, February 19, 2014

Page 36: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• Implemented Fractal DPM on 16-core linux system, 2 sockets-2 cores act as a CR-controllers communicate through UDP across sockets

Experimental Evaluation-system model

HPCA-20 Orlando, FL, February 19, 2014

Page 37: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Experimental Evaluation-experimental setup

Power Modes

L ML M MH H

Freq. (GHz) 1.4 2.1 2.7 3.3 3.6

Power Mode DVFS Mappings

• Entire system plugged into power meter (Wattsup?)

HPCA-20 Orlando, FL, February 19, 2014

Page 38: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Experimental Evaluation-comparison schemes

• Static Scheme:- no DPM, set all CRs to the same power state (e.g. MH)- trivially correct, poor energy efficiency

• Oracle DPM:- allocates for optimal energy efficiency (ED2) under budget- oracle doesn’t scale, unimplementable

• Optimized Fractal DPM (OptFractal): - CRs re-request lower power state when denied- no change to Fractal DPM decision algorithm

HPCA-20 Orlando, FL, February 19, 2014

Page 39: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Experimental Evaluation

• Benchmarks: Details in the paper.

HPCA-20 Orlando, FL, February 19, 2014

Page 40: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Results- compared to static scheme

• OptFractalDPM within 2% of Oracle DPM ED 2 savings

• FractalDPM within 8% of Oracle DPM ED2 savings

-5

0

5

10

15

20

Delay

Energy

% S

avi

ngs

from

S

tati

c-M

H

HPCA-20 Orlando, FL, February 19, 2014

Page 41: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• Most power requests serviced within 1ms.

- UDP packet round trip ~0.6ms

0

20

40

60

80

100

0 0.5 1 1.5 2 2.5 3 3.5

% C

DF

Response Time (ms)

Results- response latency

HPCA-20 Orlando, FL, February 19, 2014

Page 42: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

• We show how a scalably verifiable DPM can be built

• Fractal behavior enables one-time verification for all scales

• Entire verification is done completely automated in model checker

• Fractal DPM achieves energy-efficiency close to optimal allocator

Conclusions

HPCA-20 Orlando, FL, February 19, 2014

Page 43: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Scalably Verifiable Dynamic Power Management

Opeoluwa (Luwa) Matthews, Meng Zhang,

and Daniel J. Sorin

Duke University

HPCA-20 Orlando, FL, February 19, 2014

Page 44: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Important: experiments must stress all Fractal DPM power modes• Each CR repeatedly launches bodytrack (from PARSEC benchmark

suite), under a range of predetermined duty cycles• Under given duty cycle, CRs request power state that minimizes

ED2

Why rely on duty cycle, not just different benchmarks or phases?

• Stressing all Fractal DPM power modes stressing DVFS states• Without varying duty cycle, optimal ED2 always under highest

frequency for all benchmarks tried [Dhiman 2008]• Predetermined set of duty cycles for launching bodytrack that

directly maps to set of power modes (or DVFS state)• Experiment constitutes running sequence of bodytrack jobs,

randomly selecting duty cycles from predetermined set

Benchmarks

HPCA-20 Orlando, FL, February 19, 2014

Page 45: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Results

()

% C

DF

• Millions of time steps simulated

% system performance loss

𝒑𝒆𝒓𝒇(𝐂𝐑)𝑪𝑹 • For each time step, system perf =

% system perf loss =

* 100%

HPCA-20 Orlando, FL, February 19, 2014

Page 46: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Results

()

% C

DF

• Millions of time steps simulated

% system performance loss

𝒑𝒆𝒓𝒇(𝐂𝐑)𝑪𝑹 • For each time step, system perf =

% system perf loss =

* 100%

On 72.6% of time steps Fractal DPM ≡ Oracle DPM

HPCA-20 Orlando, FL, February 19, 2014

Page 47: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Results

()

% C

DF

• Millions of time steps simulated

% system performance loss

𝒑𝒆𝒓𝒇(𝐂𝐑)𝑪𝑹 • For each time step, system perf =

% system perf loss =

* 100%

On 99.9% of time stepsFractal DPM < 20% off from Oracle

HPCA-20 Orlando, FL, February 19, 2014

Page 48: Opeoluwa  (Luwa) Matthews,  Meng Zhang, and  Daniel J.  Sorin

Results

()

% C

DF

• Millions of time steps simulated

% system performance loss

𝒑𝒆𝒓𝒇(𝐂𝐑)𝑪𝑹 • For each time step, system perf =

% system perf loss =

* 100%

Worst case, Fractal DPM < 36.4% off from Oracle

HPCA-20 Orlando, FL, February 19, 2014