Upload
winifred-rose
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
A Run-Time Feedback Based Energy Estimation Model for
Embedded Systems
Selim GürünChandra Krintz
Department of Computer ScienceU.C. Santa Barbara
International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS)
Seoul, KoreaOctober 22-25, 2006
CODES-ISSS’06 2
Power-Aware Execution: Big Picture
• Power-aware methods divide task execution into operations, and prepare an execution plan for each• Operation: smallest user-visible unit of
execution • Typical operation: Rendering a scene,
translating a sentence, calculating a shortest path in a map
• Need to know energy cost of each plan
Knowing future energy cost of operations requires profiling them at run-time
IdentifyOperations
Profile atRuntime
PredictFuture Costs
Develop Power-Aware Execution Strategy
CODES-ISSS’06 3
Outline
• Extant run-time power profiling techniques• Power profiling methodologies for
embedded computers
• Proposed model• Overview• Model construction• Capturing system dynamics
• Evaluation
• Summary and Conclusion
CODES-ISSS’06 4
Run-Time Energy Profiling: Overview
OS Interfaces like ACPI:
+ Provides simple API to battery voltage sensors+ Ok for different hw. power levels - Very coarse- Not precise
Execution Time:
+ Simple to measure
+ Fast and precise- Not correlated to power- Not suitable when hw.
power levels change: DVS, sleep
HPMs:
+ Fast access
+ Quite accurate- Architecture dependent- Not designed for power
estimation --many events missing
CODES-ISSS’06 5
Run Time Energy Profiling: HPMs
• CPU counters provide unparalleled insight into program behavior• Cache, TLB misses• Instructions executed per cycle (IPC)
• How can we accurately gather program energy consumption by monitoring key parameters?
Use HPMs as pseudo CPU component access counters:Energy Consumption = I Cache * a0 + D Cache * a1 +
ALU * a2 +…
CODES-ISSS’06 6
Run Time Energy Profiling: HPMs
• CPU counters provide unparalleled insight into program behavior• Cache, TLB misses• Instructions executed per cycle (IPC)
• How can we accurately gather program energy consumption by monitoring key parameters?
• Useful but not enough:• CPU consumes a portion of total energy; power-aware
strategies need to know full picture.• Fails when hardware changes its behavior: DVS, sleep states
• A different strategy needed!
Use HPMs as pseudo CPU component access counters:Energy Consumption = I Cache * a0 + D Cache * a1 +
ALU * a2 +…
CODES-ISSS’06 7
Proposed Energy Profiling Model
Construct Power Model
Update Model Coefficients
Measure Energy Consumption in Large Intervals and Compare to
HPM Model Estimates
Determine Model
Coefficients
Predict Energy Consumption using HPM Model
Power-Efficient Execution Plan
Offline Analysis
Continuous model improvement at run-time
Fine-Grain Energy Estimation
CODES-ISSS’06 8
Case Study: Intel XScale on Stargate
• 32 bit XScale – 400MHz• 64 MB RAM• Runs Familiar Linux• No Display• Wireless 802.11• Compact Flash
XScale Major HPM Events
Inst/Data cache misses
Data dependency stalls
Inst/Data TLB misses
Brach mispredicted
Instruction executedSCL
CODES-ISSS’06 9
Constructing Model
• Are there any correlations between HPM values and full system power consumption?• Absolutely! --but some challenges exist.
• Good correlation in memory/CPU subsystem• High IPC -> CPU intensive application • High cache misses/hits -> memory intensive application
• But I/O is the problem!• Some heuristics possible, e.g. Low memory activity and low IPC
-> possible I/O wait state • Better to use software counters embedded into drivers
CODES-ISSS’06 10
Model Coefficients
E = X1 a1 + X2 a2 + X3 a3 +…
XI : Independent VariablesaI : Coefficients
• Estimate coefficients using least squares linear regression (LSQ)• Stable and simple• Linearity assumption
Only Major All related
LSQ Model: Which variables?
Efficient, clear Easier to understand Less accurate
More accurate Run-time overhead Modeling difficulties
due to variable dependencies
CODES-ISSS’06 11
Parameter Selection & Dependencies
• Hard to include all variables: Too many parameters clutter model• Parameter dependencies unstable parameter estimations• E.g. Volume = a0 + a1 * pounds + a2 * grams
• Work-around is non-trivial; HPM characteristics e.g.:• TLB miss more CPU cycles & cache miss• Memory Stall Fewer instruction executed
Multicollinearity!
CODES-ISSS’06 12
Run-Time Energy Estimation
Computation Communication
SimpleCore Clock CyclesData Stalls
Core Clock CyclesBytes TransmittedBytes Received
Complex
Core Clock CyclesInstruction Cache MissesInstructions NDeliveredData StallsITLB MissesDTLB Misses
Core Clock CyclesBytes TransmittedBytes ReceivedPackets TransmittedPackets Received
CODES-ISSS’06 13
Run-Time Model Improvement
• Global coefficients• Compute using off-line model
• Continuously update coefficients• Improve using most recent
data• Gradually phase out previous
measurements
• Recursive least squares with exponential decay• Smaller decay factor-> more
agile
Global Coefficients
Measure Power
Update with RLS
Model Parameters: Decay factor Update period Measurement error
CODES-ISSS’06 14
Feedback Source: DS 2760
• Measures current flow in and out of battery
• Internally: A small A/D converter attached to a high precision internal resistor
Pros/Cons:+ Highly Available
e.g. iPAQs, sensor network gateways, cell phones
- Not precise enough for monitoring task energy consumption
0.25 mAh error in each reading
- Slow, one-wire serial interface
CODES-ISSS’06 15
Stargate and Our Evaluation Bench
PowerTool
VPerfmon
VPMonSCL High-precision
Data Acquisition Device
ProgrammablePower Supply
CODES-ISSS’06 16
Methodology
• Collect energy consumption every so often• Every 10 million instructions ( a so-called interval)
• Validate model accuracy on imprecise measurement data• Inject uniformly distributed random error • Evaluate various precision (error) levels: 1X – 8X• Predict energy consumption of each interval• Continuously improve model parameters every 10M * K
intervals
• Use a large group of workload• Computational benchmarks• Computational + communication oriented benchmarks
CODES-ISSS’06 17
Static vs. Adaptive Models
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
bisor
t
em3d
gsm
deco
de
gsm
enco
de
jpegde
code
jpegen
code life
mpe
g2dec
ode
mpe
g2enc
ode
pvkx
pvkx
bpv
nx
treea
dd
AVERAGE
Err
or
%
0.9
Static
CODES-ISSS’06 18
Average Error Rates
Interval Size
1X 2X 4X 8X Best
100 53.3% 29.9% 14.5% 16.9% 3.8%
200 68.8% 24.1% 12.9% 7.3% 2.7%
400 33.0% 22.0% 9.1% 7.7% 2.8%
Error rates and Interval sizes –Simple Model
Measurement Precision
CODES-ISSS’06 19
Average Error Rates-Complex Model
Interval Size
8X 8X Best Best
100 16.9% 28.1% 3.8% 4.3%
200 7.3% 33.3% 2.7% 3.8%
400 7.7% 24.0% 2.8% 4.1%
Measurement imprecision reduce complex model quality more than the simple one!
Simple Model
CODES-ISSS’06 20
Related Work
• High-End CPU Power Models• Define CPU component access rate using HPM access heuristics• OS calls power consumption as a function of IPC
• Embedded CPU Power Models• Five HPM counters for XScale• Also evaluated memory model
• Memory models• UltraSparc memory subsystem
• All above are static models
• Power profiling setups • Powerscope
CODES-ISSS’06 21
Summary & Conclusions
• Our Goal: An accurate, efficient run-time power profiling system• Hardware counters are key
• Define software counters for I/O • Smart battery monitors expose dynamics in power behavior• We propose a hybrid system that combine both
• Lessons learned• Dynamic models are much better than static ones in power
modeling• Models should decay old measurements conservatively when
measurement errors are present• Measurement errors in the presence of multicollinearity can be
deadly
CODES-ISSS’06 22
Backup Slides
CODES-ISSS’06 23
Decay Factor vs. Accuracy
0%
20%
40%
60%
80%
100%
120%
bisor
t
em3d
gsm
deco
de
gsm
enco
de
jpegde
code
jpegen
code life
mpe
g2dec
ode
mpe
g2enc
ode
pvkx
pvkx
bpv
nx
treea
dd
AVERAGE
Err
or
%
0.5
0.7
0.9
1
CODES-ISSS’06 24
Execution Cost
0
5
10
15
20
25
30
35
40
compactcomputation
compact network complex network complexcomputation
Mil
liS
eco
nd
s
BMU access
Processing
CODES-ISSS’06 25
Benefit from an Offline Profiler
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
bisor
t
em3d
gsm
deco
de
gsm
enco
de
jpegde
code
jpegen
code life
mpe
g2dec
ode
mpe
g2enc
ode
pvkx
pvkx
bpv
nx
treea
dd
AVERAGE
Err
or
%
with offline profiling
w/o offline profiling
CODES-ISSS’06 26
Power-Aware Execution: Case Study
0
20
40
60
80
100
120
140
160
En
erg
y (J
ou
les)
.
Speech Recognition Execution Plans
src: Flinn’01BaselineLocal ReducedRemote Reduced
Requires run-time power prediction of different
execution plans!