41
The SEEC Computational Model Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011

The SEEC Computational Model

  • Upload
    jock

  • View
    76

  • Download
    3

Embed Size (px)

DESCRIPTION

The SEEC Computational Model. Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011. In the beginning…*. Application programmers had one goal:. Performance. *The beginning, in this case, refers to the beginning of my career (1999). beat/s. Lo. Hi. Power. - PowerPoint PPT Presentation

Citation preview

Page 1: The SEEC  Computational Model

The SEEC Computational Model

Henry Hoffmann, Anant Agarwal

PEMWS-2April 6, 2011

Page 2: The SEEC  Computational Model

In the beginning…*

2*The beginning, in this case, refers to the beginning of my career (1999)

Performance

Application programmers had one goal:

Page 3: The SEEC  Computational Model

But Modern Systems Have Increased the Burden on Application Programmers

3

Performance

Pow

er

Quality

Resiliency

Many additional constraints

Even worse, constraints can change dynamicallyE.g. power cap, workload fluctuation, core failure

beat/s

Lo Hi

Power

Page 4: The SEEC  Computational Model

Most Execution Models Designed for Performance

5

Coherent Shared Memory Message Passing

Communication

Concurrency

Coordination

Multi-threaded Multi-process

Through Memory Through Network

Locks Messages

Control Procedural Procedural

Global Shared Cache

LocalCache

RF

data

store

LocalCache

RF

data

load

Core 0 Core 1

data

LocalMemory

RF

data

store

LocalMemory

RFNetworkInterface

data

load

Core 0 Core 1

NetworkInterface

Network

Procedural control insufficient to meet the needs of modern systems

Page 5: The SEEC  Computational Model

6

Angstrom Replaces Procedural Control with Self-Aware Control

Procedural Control Self-Aware Control

Decide

Act

• Run in open loop • Assumptions made at design time• Based on guesses about future

Decide Act

Observe

• Run in closed loop• Understand user goals • Monitor the environment

• Application optimized for system• No flexibility to adapt to changes

• System optimizes for application• Flexibly adapt behavior

The self-aware model allows the system to solve constrained optimization problems dynamically

Page 6: The SEEC  Computational Model

7

Outline

• Introduction/Motivating Example

• The SEEC Model and Implementation

• Experimental Validation

• Conclusions

7

Page 7: The SEEC  Computational Model

8

Angstrom’s SElf-awarE Computing (SEEC) Model

• Goal: Reduce programmer burden by continuously optimizing online

• Key Features: 1. Applications explicitly state goals and progress2. System software and hardware state available actions3. The SEEC runtime system dynamically selects actions to maintain goals

Application Developer

Systems Developer

Observe Act

Decide

API SPI

SEECRuntime

A unified decision engine adapts applications, system software, and hardware

Page 8: The SEEC  Computational Model

9

Example Self-Aware System Built from SEEC

Video Encoder

Actuators

Goals:30 beat/s, Minimize Power

20 b/s30 b/s33 b/s

Algorithm Cores

Frequency Bandwidth

Control and Learning System

_r

Hea

rt R

ate

Time

pure delayslow convergenceoscillating

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30

Action

Spee

dup

Initial Model

Observe

Decide Act

Page 9: The SEEC  Computational Model

10

Roles in the SEEC Model

Application Developer

Systems Developer

SEECRuntime System

Express application goals and progress(e.g. frames/ second)

Read goals and performance

Determine how to adapt (e.g. How much to speed up the application)

Provide a set of actions and a callback function(e.g. allocation of cores to process)

Initiate actions based on results of decision phase

Observe

Decide

Act

Page 10: The SEEC  Computational Model

11

Registering Application Goals

• Performance– Goals: target heart rate and/or latency between tagged heartbeats– Progress: issue heartbeats at important intervals

• Quality– Goals: distortion (distance from application defined nominal value)– Progress: distortion over last heartbeat

• Power– Goals: target heart rate / Watt and/or target energy between tagged heartbeats– Progress: Power/energy over last heartbeat interval

Observe

Application

Lo Hi

Power

SEECDecision Engine

Performance

Power

Quality

Research to date focuses on meeting performance while minimizing power/maximizing quality

Page 11: The SEEC  Computational Model

Actuators

12

Registering System Actions

Each action has the following attributes:• Estimated Speedup

– Predicted benefit of taking an action• Cost

– Predicted downside of taking an action (increased power, lowered quality)• Callback

– A function that takes an id an implements the associated action

Act

SEECDecision Engine

Estimated Speedup

Cost

Callback

Algorithm Cores

Frequency Bandwidth

Page 12: The SEEC  Computational Model

SEECDecision Engine

13

Picking Actions to Meet Goals

Decisions are made to select actions given observations:• Read application goals and heartbeats• Determine speedup with control system• Translate speedup into one or more actions

Decide

Application System Services

Lo Hi

Power

The control system provides predictable and analyzable behavior

Controller(Decide)

Actuator(Act)

Application(Observe)-

PerformanceGoal

CurrentPerformance

Page 13: The SEEC  Computational Model

14

Optimizing Resource Allocation with SEEC

• SEEC can observe, decide and act

• How does this enable optimal resource allocation?

• Let’s implement the video encoder example from the introduction

Page 14: The SEEC  Computational Model

15

Performance/Watt Adaptation in Video Encoding

Performance goal

0

10

20

30

40

50

60

70

80

50 150 250 350 450

Time (Heartbeat)

Per

form

ance

(Fra

me/

s)

130

140

150

160

170

180

0 2 4 6 8 10 12 14 16

Time (s)

Pow

er (W

)

Performance Power

Page 15: The SEEC  Computational Model

16

System Models in SEEC

16

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30

Action

Spee

dup

Initial Model

SEEC’s control system takes actions based on models (of speedup and cost per action) associated with actions

What if the models are inaccurate?

Page 16: The SEEC  Computational Model

SEEC Decision Engine

17

Updating Models in SEEC

• After every action, SEEC updates application and system models• Kalman filters used to estimate true state of application and system

– We are exploring a number of different alternatives here

Decide

SEEC combines predictability of control systems with adaptability of learning systems

Controller(Decide)

Actuator(Act)

Application(Observe)-

PerformanceGoal

ApplicationModel

(Decide)

SystemModel

(Decide)

CurrentPerformance

Adaptation Level 2

Adaptation Level 1

Adaptation Level 0

System Services

Application

Lo Hi

Power

Page 17: The SEEC  Computational Model

18

SEEC Online Learning of Speedup Model for Application with Local Minima

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30

Action

Spee

dup

Initial Model Actual Speedups Learned Speedups

Page 18: The SEEC  Computational Model

SEEC Decision Engine

19

Handling Multiple Applications

19

Decide

• Control actions computed separately for each application• For finite resources, several alternatives:

• Priorities (for real-time, admission control)• Weighting/Centroid method (for overall system throughput)

Controller(Decide)

Actuator(Act)

Application(Observe)-

PerformanceGoal

ApplicationModel

(Decide)

SystemModel

(Decide)

CurrentPerformance

Adaptation Level 2

Adaptation Level 1

Adaptation Level 0

Application

Lo Hi

Power

Application

Lo Hi

Power

Application

Lo Hi

Power

Application

Lo Hi

Power

System Services

Page 19: The SEEC  Computational Model

20

Outline

• Introduction/Motivating Example

• The SEEC Model and Implementation

• Experimental Validation

• Conclusions

20

Page 20: The SEEC  Computational Model

21

Systems Built with SEEC

21

System Actions Tradeoff BenchmarksDynamic Loop Perforation

Skip some loop iterations Performance vs. Quality

7/13 PARSECs

Dynamic Knobs Make static parameters dynamic

Performance vs. Quality

bodytrack, swaptions, x264, SWISH++

Core Scheduler Assign N cores to application

Compute vs. Power 11/13 PARSECs

Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs

Bandwidth Allocator Assign memory controllers to application

Memory vs. Power STREAM (doesn’t make a difference for PARSEC)

Power Manager Combination of the three above

Performance vs. Power

PARSEC, STREAM, simple test apps (mergesort, binary search)

Learned Models Power Manager with speedup and cost learned online

Performance vs. Power

PARSECs

Multi-App Control Power Manager with multiple applications

Performance vs. Power and Quality for multiple applications

Combinations of PARSECs

Page 21: The SEEC  Computational Model

22

Systems Built with SEEC

System Actions Tradeoff BenchmarksDynamic Loop Perforation

Skip some loop iterations Performance vs. Quality

7/13 PARSECs

Dynamic Knobs Make static parameters dynamic

Performance vs. Quality

bodytrack, swaptions, x264, SWISH++

Core Scheduler Assign N cores to application

Compute vs. Power 11/13 PARSECs

Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs

Bandwidth Allocator Assign memory controllers to application

Memory vs. Power STREAM (doesn’t make a difference for PARSEC)

Power Manager Combination of the three above

Performance vs. Power

PARSEC, STREAM, extra test apps (mergesort, binary search)

Learned Models Power Manager with speedup and cost learned online

Performance vs. Power

PARSECs

Multi-App Control Power Manager with multiple applications

Performance vs. Power and Quality for multiple applications

Combinations of PARSECs

22

Page 22: The SEEC  Computational Model

23

Dynamic Knobs: Creating Adaptive Applications

23

Application Goals

System Actions

Experiment

Turn static command line parameters into dynamic structure

Detail in Hoffmann et al. “Dynamic Knobs for Power Aware Computing” ASPLOS 2011

Maintain performance and minimize quality loss

Adjust memory locations to change application settings

Benchmarks: bodytrack, swaptions, SWISH++, x264

Maintain performance when clock speed changes

Page 23: The SEEC  Computational Model

24

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 50 100 150 200 250

Time

Nor

mal

ized

Per

form

ance

Baseline Power Cap Dynamic Knobs

ResultsEnabling Dynamic Applications

bodytrack

Clock drops 2.4-1.6GHz

w/o SEEC perf. drops

w/ SEEC perf. recovers

Clock rises 1.6-2.4 GHz

SEEC returns quality to baseline

Page 24: The SEEC  Computational Model

25

swaptionsSWISH++ x264

Dynamic knobs automatically enable dynamic response for a range of applications using a single mechanism

Maintains performance despite noise

Perfect behavior Maintains baseline performance

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 200 400 600 800 1000

Time

Nor

mal

ized

Per

form

ance

Baseline Power Cap Dynamic Knobs

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 100 200 300 400 500

Time

Nor

mal

ized

Per

form

ance

Baseline Power Cap Dynamic Knobs

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 100 200 300 400 500 600

Time

Norm

aliz

ed P

erfo

rman

ce

Baseline Power Cap Dynamic Knobs

ResultsEnabling Dynamic Applications

Page 25: The SEEC  Computational Model

26

Optimizing Performance per Watt for Video Encoding

26

Application Goals

System Actions

Experiment

Adapt system behavior to needs of individual inputs

Maintain 30 frame/s while minimizing power

Change cores, clock speed, and memory bandwidth

Benchmark: x264 w/ 16 different 1080p inputs

Compare performance/Watt w/ SEEC to best static allocation of resources for each input

Page 26: The SEEC  Computational Model

Average Performance per Watt for Video Encode

27

avera

ge0

0.2

0.4

0.6

0.8

1

1.2static worst static oracle AL 0 AL 1 AL 2

Input

Nor

mal

ized

Per

form

ance

/Wat

t

Page 27: The SEEC  Computational Model

Performance per Watt for All Inputs

28

0

0.2

0.4

0.6

0.8

1

1.2

Nor

mal

ized

Perf

orm

ance

/Watt

Input

static worst static oracle AL 0 AL 1 AL 2

Page 28: The SEEC  Computational Model

29

Learning Models Online

29

Application Goals

System Actions

Experiment

Adapt system behavior when initial models are wrong

Four targets: Min, Median, Max, Max+

Minimize power consumption for each target

Change cores, clock speed, and mem. bandwidth

Initial models are incredibly optimistic(Assume linear speedup with any resource increase)

Benchmark: STREAM

Converge to within 5% of target performance, measure power and convergence time

Page 29: The SEEC  Computational Model

Complex Response to Resources

30

1 2 3 4 5 6 7 80.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.62.4 GHz, 2 MC 2.4 GHz, 1 MC 1.6 Ghz, 2 MC 1.6 GHz, 1 MC

Cores

Spee

dup

Adding Memory Bandwidth Can Slow

Down

Adding Cores Can Slow the App Down

Page 30: The SEEC  Computational Model

Power Savings

31

Min Median Max Max+0

50

100

150

200

250Oracle AL 0 AL 1 AL 2

Performance Target

Tota

l Sys

tem

Pow

er (W

atts

)

AL 2 can provide 30 Watt power savings over AL1

Page 31: The SEEC  Computational Model

Convergence Time

32

Min Median Max Max+0

10

20

30

40

50

60Oracle AL 0 AL 1 AL 2

Performance Target

Con

verg

ence

Tim

e (q

uant

a)

Page 32: The SEEC  Computational Model

33

Managing Application and System Resources Concurrently

33

Application Goals

System Actions

Experiment

Manage multiple applications when clock frequency changes

bodytrack: maintain performance, minimize power

x264: maintain performance, minimize quality loss

Change core allocation to both applications

Change x264’s algorithms

Maintain performance of both applications when clock frequency changes

Page 33: The SEEC  Computational Model

34

ResultsSEEC Management of Multiple Applications

bodytrack x264

34

0

0.5

1

1.5

2

2.5

40 90 140 190 240Time (Heartbeat)

Nor

mal

ized

Per

form

ance

0

1

2

3

4

5

6

7

Cor

es

bodytrack w/ adaptationbodytrackbodytrack cores

0

0.5

1

1.5

2

2.5

40 90 140 190 240Time (Heartbeat)

Nor

mal

ized

Per

form

ance

0

1

2

3

4

5

6

7

Cor

es

x264 w/ adaptationx264x264 cores

Clock drops 2.4-1.6GHz w/o SEEC app

misses goals

SEEC allocates cores to bodytrack

w/o SEEC app exceeds goals

SEEC removes cores from x264

SEEC adjusts algorithm to meet goals

Page 34: The SEEC  Computational Model

Summary of Case Studies

Experiment Demonstrated Benefit of SEECDynamic Knobs Maintains application performance in the face of loss of

compute resources

Performance/Watt Out-performs oracle for static allocation of resources by adapting to fluctuations in input data

Performance/Watt with learning

Learns models online starting from initial model which is ~10x off true values

Multi-App control Maintains performance of multiple apps by managing algorithm and system resources to adapt to loss of compute resources

35

Page 35: The SEEC  Computational Model

36

Outline

• Introduction/Motivating Example

• The SEEC Framework

• Experimental Validation

• Conclusions

36

Page 36: The SEEC  Computational Model

Angstrom’s Hardware Support for Self-Awareness

• “If you want to change something, measure it.”

• We need sensors to measure everything we want to manage:– Performance (there is a rich body of work in this area)– Power – Quality– Resiliency– …

37

Page 37: The SEEC  Computational Model

Need Performance-level Support for Power Analysis

1. gprof -> pprof?

38

index % time self children called name Power?? Energy??? <spontaneous>----------------------------------------------- 0.00 0.05 1/1 main [2] [1] 100.0 0.00 0.05 1 report [3] 0.00 0.03 8/8 timelocal [6] 0.00 0.01 1/1 print [9] 0.00 0.01 9/9 fgets [12] 0.00 0.00 12/34 strncmp <cycle 1> [40] 0.00 0.00 8/8 lookup [20] 0.00 0.00 1/1 fopen [21] 0.00 0.00 8/8 chewtime [24] 0.00 0.00 8/16 skipspace [44] ----------------------------------------------- [2] 59.8 0.01 0.02 8+472 <cycle 2 as a whole> [4] 0.01 0.02 244+260 offtime <cycle 2> [7] 0.00 0.00 236+1 tzset <cycle 2> [26] -----------------------------------------------

Page 38: The SEEC  Computational Model

Angstrom’s Hardware Support for SEEC

Low-power core for SEEC runtime system

39

Page 39: The SEEC  Computational Model

40

Angstrom’s System Software and Hardware Support for SEEC

Actuators

Algorithm Cores

Frequency Memory Bandwidth

Cache

Registers

Network Bandwidth

ALU Width

Enable tradeoffs wherever possible

Support fine-grain allocation of resources

TLB Size

Page 40: The SEEC  Computational Model

41

Conclusions

• SEEC is designed to help ease programmer burden– Solves resource allocation problems– Adapts to fluctuations in environment

• SEEC has three distinguishing features– Incorporates goals and feedback directly from the application– Allows independent specification of adaptation– Uses an adaptive second order control system to manage adaptation

• Demonstrated the benefits of SEEC in several experiments– SEEC can optimize performance per Watt for video encoding– SEEC can adapt algorithms and resource allocation to meet goals in the

face of power caps or other changes in environment

• SEEC navigates tradeoff spaces– So build more tradeoffs into apps, system software, system hardware, etc.41

Page 41: The SEEC  Computational Model

42

SEEC References

• Application Heartbeats Interface:– Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats: A Generic

Interface for Specifying Program Performance and Goals in Autonomous Computing Environments. ICAC 2010

• Controlling Heartbeat-enabled Systems:– Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Controlling software applications within

the Heartbeats framework. CDC 2010– Maggio, Hoffmann, Agarwal, Leva. Control theoretic allocation of CPU resources. FeBID

2011• Creating Adaptive Applications:

– Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209-042. 2009

– Hoffmann, Sidiroglou, Carbin, Misailovic, Agarwal, Rinard. Dynamic Knobs for Power-Aware Computing. ASPLOS 2011.

• The SEEC Framework:– Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware

Management of Multicore Resources. MIT-CSAIL-TR-2011-016 2011.– Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware

Computing. MIT-CSAIL-TR-2010-049 2010.

42