Upload
jock
View
76
Download
3
Tags:
Embed Size (px)
DESCRIPTION
The SEEC Computational Model. Henry Hoffmann, Anant Agarwal PEMWS-2 April 6, 2011. In the beginning…*. Application programmers had one goal:. Performance. *The beginning, in this case, refers to the beginning of my career (1999). beat/s. Lo. Hi. Power. - PowerPoint PPT Presentation
Citation preview
The SEEC Computational Model
Henry Hoffmann, Anant Agarwal
PEMWS-2April 6, 2011
In the beginning…*
2*The beginning, in this case, refers to the beginning of my career (1999)
Performance
Application programmers had one goal:
But Modern Systems Have Increased the Burden on Application Programmers
3
Performance
Pow
er
Quality
Resiliency
Many additional constraints
Even worse, constraints can change dynamicallyE.g. power cap, workload fluctuation, core failure
beat/s
Lo Hi
Power
Most Execution Models Designed for Performance
5
Coherent Shared Memory Message Passing
Communication
Concurrency
Coordination
Multi-threaded Multi-process
Through Memory Through Network
Locks Messages
Control Procedural Procedural
Global Shared Cache
LocalCache
RF
data
store
LocalCache
RF
data
load
Core 0 Core 1
data
LocalMemory
RF
data
store
LocalMemory
RFNetworkInterface
data
load
Core 0 Core 1
NetworkInterface
Network
Procedural control insufficient to meet the needs of modern systems
6
Angstrom Replaces Procedural Control with Self-Aware Control
Procedural Control Self-Aware Control
Decide
Act
• Run in open loop • Assumptions made at design time• Based on guesses about future
Decide Act
Observe
• Run in closed loop• Understand user goals • Monitor the environment
• Application optimized for system• No flexibility to adapt to changes
• System optimizes for application• Flexibly adapt behavior
The self-aware model allows the system to solve constrained optimization problems dynamically
7
Outline
• Introduction/Motivating Example
• The SEEC Model and Implementation
• Experimental Validation
• Conclusions
7
8
Angstrom’s SElf-awarE Computing (SEEC) Model
• Goal: Reduce programmer burden by continuously optimizing online
• Key Features: 1. Applications explicitly state goals and progress2. System software and hardware state available actions3. The SEEC runtime system dynamically selects actions to maintain goals
Application Developer
Systems Developer
Observe Act
Decide
API SPI
SEECRuntime
A unified decision engine adapts applications, system software, and hardware
9
Example Self-Aware System Built from SEEC
Video Encoder
Actuators
Goals:30 beat/s, Minimize Power
20 b/s30 b/s33 b/s
Algorithm Cores
Frequency Bandwidth
Control and Learning System
_r
Hea
rt R
ate
Time
pure delayslow convergenceoscillating
SEECController
Application-
Desired Heart Rate
_r
Observed Heart Rate
r(k)
Error
e(k)
Speedup
s(k)
SEECController
Application-
Desired Heart Rate
_r
Observed Heart Rate
r(k)
Error
e(k)
Speedup
s(k)
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30
Action
Spee
dup
Initial Model
Observe
Decide Act
10
Roles in the SEEC Model
Application Developer
Systems Developer
SEECRuntime System
Express application goals and progress(e.g. frames/ second)
Read goals and performance
Determine how to adapt (e.g. How much to speed up the application)
Provide a set of actions and a callback function(e.g. allocation of cores to process)
Initiate actions based on results of decision phase
Observe
Decide
Act
11
Registering Application Goals
• Performance– Goals: target heart rate and/or latency between tagged heartbeats– Progress: issue heartbeats at important intervals
• Quality– Goals: distortion (distance from application defined nominal value)– Progress: distortion over last heartbeat
• Power– Goals: target heart rate / Watt and/or target energy between tagged heartbeats– Progress: Power/energy over last heartbeat interval
Observe
Application
Lo Hi
Power
SEECDecision Engine
Performance
Power
Quality
Research to date focuses on meeting performance while minimizing power/maximizing quality
Actuators
12
Registering System Actions
Each action has the following attributes:• Estimated Speedup
– Predicted benefit of taking an action• Cost
– Predicted downside of taking an action (increased power, lowered quality)• Callback
– A function that takes an id an implements the associated action
Act
SEECDecision Engine
Estimated Speedup
Cost
Callback
Algorithm Cores
Frequency Bandwidth
SEECDecision Engine
13
Picking Actions to Meet Goals
Decisions are made to select actions given observations:• Read application goals and heartbeats• Determine speedup with control system• Translate speedup into one or more actions
Decide
Application System Services
Lo Hi
Power
The control system provides predictable and analyzable behavior
Controller(Decide)
Actuator(Act)
Application(Observe)-
PerformanceGoal
CurrentPerformance
14
Optimizing Resource Allocation with SEEC
• SEEC can observe, decide and act
• How does this enable optimal resource allocation?
• Let’s implement the video encoder example from the introduction
15
Performance/Watt Adaptation in Video Encoding
Performance goal
0
10
20
30
40
50
60
70
80
50 150 250 350 450
Time (Heartbeat)
Per
form
ance
(Fra
me/
s)
130
140
150
160
170
180
0 2 4 6 8 10 12 14 16
Time (s)
Pow
er (W
)
Performance Power
16
System Models in SEEC
16
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30
Action
Spee
dup
Initial Model
SEEC’s control system takes actions based on models (of speedup and cost per action) associated with actions
What if the models are inaccurate?
SEEC Decision Engine
17
Updating Models in SEEC
• After every action, SEEC updates application and system models• Kalman filters used to estimate true state of application and system
– We are exploring a number of different alternatives here
Decide
SEEC combines predictability of control systems with adaptability of learning systems
Controller(Decide)
Actuator(Act)
Application(Observe)-
PerformanceGoal
ApplicationModel
(Decide)
SystemModel
(Decide)
CurrentPerformance
Adaptation Level 2
Adaptation Level 1
Adaptation Level 0
System Services
Application
Lo Hi
Power
18
SEEC Online Learning of Speedup Model for Application with Local Minima
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30
Action
Spee
dup
Initial Model Actual Speedups Learned Speedups
SEEC Decision Engine
19
Handling Multiple Applications
19
Decide
• Control actions computed separately for each application• For finite resources, several alternatives:
• Priorities (for real-time, admission control)• Weighting/Centroid method (for overall system throughput)
Controller(Decide)
Actuator(Act)
Application(Observe)-
PerformanceGoal
ApplicationModel
(Decide)
SystemModel
(Decide)
CurrentPerformance
Adaptation Level 2
Adaptation Level 1
Adaptation Level 0
Application
Lo Hi
Power
Application
Lo Hi
Power
Application
Lo Hi
Power
Application
Lo Hi
Power
System Services
20
Outline
• Introduction/Motivating Example
• The SEEC Model and Implementation
• Experimental Validation
• Conclusions
20
21
Systems Built with SEEC
21
System Actions Tradeoff BenchmarksDynamic Loop Perforation
Skip some loop iterations Performance vs. Quality
7/13 PARSECs
Dynamic Knobs Make static parameters dynamic
Performance vs. Quality
bodytrack, swaptions, x264, SWISH++
Core Scheduler Assign N cores to application
Compute vs. Power 11/13 PARSECs
Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs
Bandwidth Allocator Assign memory controllers to application
Memory vs. Power STREAM (doesn’t make a difference for PARSEC)
Power Manager Combination of the three above
Performance vs. Power
PARSEC, STREAM, simple test apps (mergesort, binary search)
Learned Models Power Manager with speedup and cost learned online
Performance vs. Power
PARSECs
Multi-App Control Power Manager with multiple applications
Performance vs. Power and Quality for multiple applications
Combinations of PARSECs
22
Systems Built with SEEC
System Actions Tradeoff BenchmarksDynamic Loop Perforation
Skip some loop iterations Performance vs. Quality
7/13 PARSECs
Dynamic Knobs Make static parameters dynamic
Performance vs. Quality
bodytrack, swaptions, x264, SWISH++
Core Scheduler Assign N cores to application
Compute vs. Power 11/13 PARSECs
Clock Scaler Change processor speed Compute vs. Power 11/13 PARSECs
Bandwidth Allocator Assign memory controllers to application
Memory vs. Power STREAM (doesn’t make a difference for PARSEC)
Power Manager Combination of the three above
Performance vs. Power
PARSEC, STREAM, extra test apps (mergesort, binary search)
Learned Models Power Manager with speedup and cost learned online
Performance vs. Power
PARSECs
Multi-App Control Power Manager with multiple applications
Performance vs. Power and Quality for multiple applications
Combinations of PARSECs
22
23
Dynamic Knobs: Creating Adaptive Applications
23
Application Goals
System Actions
Experiment
Turn static command line parameters into dynamic structure
Detail in Hoffmann et al. “Dynamic Knobs for Power Aware Computing” ASPLOS 2011
Maintain performance and minimize quality loss
Adjust memory locations to change application settings
Benchmarks: bodytrack, swaptions, SWISH++, x264
Maintain performance when clock speed changes
24
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 50 100 150 200 250
Time
Nor
mal
ized
Per
form
ance
Baseline Power Cap Dynamic Knobs
ResultsEnabling Dynamic Applications
bodytrack
Clock drops 2.4-1.6GHz
w/o SEEC perf. drops
w/ SEEC perf. recovers
Clock rises 1.6-2.4 GHz
SEEC returns quality to baseline
25
swaptionsSWISH++ x264
Dynamic knobs automatically enable dynamic response for a range of applications using a single mechanism
Maintains performance despite noise
Perfect behavior Maintains baseline performance
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 200 400 600 800 1000
Time
Nor
mal
ized
Per
form
ance
Baseline Power Cap Dynamic Knobs
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 100 200 300 400 500
Time
Nor
mal
ized
Per
form
ance
Baseline Power Cap Dynamic Knobs
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 100 200 300 400 500 600
Time
Norm
aliz
ed P
erfo
rman
ce
Baseline Power Cap Dynamic Knobs
ResultsEnabling Dynamic Applications
26
Optimizing Performance per Watt for Video Encoding
26
Application Goals
System Actions
Experiment
Adapt system behavior to needs of individual inputs
Maintain 30 frame/s while minimizing power
Change cores, clock speed, and memory bandwidth
Benchmark: x264 w/ 16 different 1080p inputs
Compare performance/Watt w/ SEEC to best static allocation of resources for each input
Average Performance per Watt for Video Encode
27
avera
ge0
0.2
0.4
0.6
0.8
1
1.2static worst static oracle AL 0 AL 1 AL 2
Input
Nor
mal
ized
Per
form
ance
/Wat
t
Performance per Watt for All Inputs
28
0
0.2
0.4
0.6
0.8
1
1.2
Nor
mal
ized
Perf
orm
ance
/Watt
Input
static worst static oracle AL 0 AL 1 AL 2
29
Learning Models Online
29
Application Goals
System Actions
Experiment
Adapt system behavior when initial models are wrong
Four targets: Min, Median, Max, Max+
Minimize power consumption for each target
Change cores, clock speed, and mem. bandwidth
Initial models are incredibly optimistic(Assume linear speedup with any resource increase)
Benchmark: STREAM
Converge to within 5% of target performance, measure power and convergence time
Complex Response to Resources
30
1 2 3 4 5 6 7 80.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.62.4 GHz, 2 MC 2.4 GHz, 1 MC 1.6 Ghz, 2 MC 1.6 GHz, 1 MC
Cores
Spee
dup
Adding Memory Bandwidth Can Slow
Down
Adding Cores Can Slow the App Down
Power Savings
31
Min Median Max Max+0
50
100
150
200
250Oracle AL 0 AL 1 AL 2
Performance Target
Tota
l Sys
tem
Pow
er (W
atts
)
AL 2 can provide 30 Watt power savings over AL1
Convergence Time
32
Min Median Max Max+0
10
20
30
40
50
60Oracle AL 0 AL 1 AL 2
Performance Target
Con
verg
ence
Tim
e (q
uant
a)
33
Managing Application and System Resources Concurrently
33
Application Goals
System Actions
Experiment
Manage multiple applications when clock frequency changes
bodytrack: maintain performance, minimize power
x264: maintain performance, minimize quality loss
Change core allocation to both applications
Change x264’s algorithms
Maintain performance of both applications when clock frequency changes
34
ResultsSEEC Management of Multiple Applications
bodytrack x264
34
0
0.5
1
1.5
2
2.5
40 90 140 190 240Time (Heartbeat)
Nor
mal
ized
Per
form
ance
0
1
2
3
4
5
6
7
Cor
es
bodytrack w/ adaptationbodytrackbodytrack cores
0
0.5
1
1.5
2
2.5
40 90 140 190 240Time (Heartbeat)
Nor
mal
ized
Per
form
ance
0
1
2
3
4
5
6
7
Cor
es
x264 w/ adaptationx264x264 cores
Clock drops 2.4-1.6GHz w/o SEEC app
misses goals
SEEC allocates cores to bodytrack
w/o SEEC app exceeds goals
SEEC removes cores from x264
SEEC adjusts algorithm to meet goals
Summary of Case Studies
Experiment Demonstrated Benefit of SEECDynamic Knobs Maintains application performance in the face of loss of
compute resources
Performance/Watt Out-performs oracle for static allocation of resources by adapting to fluctuations in input data
Performance/Watt with learning
Learns models online starting from initial model which is ~10x off true values
Multi-App control Maintains performance of multiple apps by managing algorithm and system resources to adapt to loss of compute resources
35
36
Outline
• Introduction/Motivating Example
• The SEEC Framework
• Experimental Validation
• Conclusions
36
Angstrom’s Hardware Support for Self-Awareness
• “If you want to change something, measure it.”
• We need sensors to measure everything we want to manage:– Performance (there is a rich body of work in this area)– Power – Quality– Resiliency– …
37
Need Performance-level Support for Power Analysis
1. gprof -> pprof?
38
index % time self children called name Power?? Energy??? <spontaneous>----------------------------------------------- 0.00 0.05 1/1 main [2] [1] 100.0 0.00 0.05 1 report [3] 0.00 0.03 8/8 timelocal [6] 0.00 0.01 1/1 print [9] 0.00 0.01 9/9 fgets [12] 0.00 0.00 12/34 strncmp <cycle 1> [40] 0.00 0.00 8/8 lookup [20] 0.00 0.00 1/1 fopen [21] 0.00 0.00 8/8 chewtime [24] 0.00 0.00 8/16 skipspace [44] ----------------------------------------------- [2] 59.8 0.01 0.02 8+472 <cycle 2 as a whole> [4] 0.01 0.02 244+260 offtime <cycle 2> [7] 0.00 0.00 236+1 tzset <cycle 2> [26] -----------------------------------------------
Angstrom’s Hardware Support for SEEC
Low-power core for SEEC runtime system
39
40
Angstrom’s System Software and Hardware Support for SEEC
Actuators
Algorithm Cores
Frequency Memory Bandwidth
Cache
Registers
Network Bandwidth
ALU Width
Enable tradeoffs wherever possible
Support fine-grain allocation of resources
TLB Size
…
41
Conclusions
• SEEC is designed to help ease programmer burden– Solves resource allocation problems– Adapts to fluctuations in environment
• SEEC has three distinguishing features– Incorporates goals and feedback directly from the application– Allows independent specification of adaptation– Uses an adaptive second order control system to manage adaptation
• Demonstrated the benefits of SEEC in several experiments– SEEC can optimize performance per Watt for video encoding– SEEC can adapt algorithms and resource allocation to meet goals in the
face of power caps or other changes in environment
• SEEC navigates tradeoff spaces– So build more tradeoffs into apps, system software, system hardware, etc.41
42
SEEC References
• Application Heartbeats Interface:– Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats: A Generic
Interface for Specifying Program Performance and Goals in Autonomous Computing Environments. ICAC 2010
• Controlling Heartbeat-enabled Systems:– Maggio, Hoffmann, Santambrogio, Agarwal, Leva. Controlling software applications within
the Heartbeats framework. CDC 2010– Maggio, Hoffmann, Agarwal, Leva. Control theoretic allocation of CPU resources. FeBID
2011• Creating Adaptive Applications:
– Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209-042. 2009
– Hoffmann, Sidiroglou, Carbin, Misailovic, Agarwal, Rinard. Dynamic Knobs for Power-Aware Computing. ASPLOS 2011.
• The SEEC Framework:– Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware
Management of Multicore Resources. MIT-CSAIL-TR-2011-016 2011.– Hoffmann, Maggio, Santambrogio, Leva, Agarwal. SEEC: A Framework for Self-aware
Computing. MIT-CSAIL-TR-2010-049 2010.
42