Upload
elijah-mathews
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
ECE 692 2009 1
Power Control for Chip Multiprocessors
Xue Li
Oct 27, 2009
ECE 692 2009 2
Outline
• Two ways to control power of chip multiprocessors– MPC control with online model estimation– Simple closed loop control with risk evaluation
ECE 692 2009 3
Temperature-Constrained Power Control for Chip Multiprocessors with
Online Model Estimation
Yefu Wang, Kai Ma, Xiaorui Wang
ECE 692 2009 4
Introduction
• Power and thermal are the major constraints for further throughput improvement of CMP– Peak power consumption of a CMP should be
controlled to enable higher computing densities.– The temperature of a CMP should be kept lower than
a threshold in case of thermal failures.– Performance delivered per watt needs to be
maximized.
ECE 692 2009 5
State of the Art
• Power control for CMP– Open-loop search or optimization [Isci’06],
[Teodorescu’08], etc.• Highly dependent on the accuracy of the system model
– Heuristics [Isci’06], [Meng’08], etc.• No theoretical guarantee of control accuracy/stability
– Chip-wide DVFS (Dynamic Voltage and Frequency Scaling) [McGowen’06], [Floyd’07], etc.
• Suboptimal in performance
• Dynamic thermal management– Heuristics or feedback control theory [Brooks’01],
[Skadron’03], etc.• Power and temperature are controlled separately
ECE 692 2009 6
Challenges and Solutions
• Multiple cores may need to be manipulated simultaneously to control both power and temperature.
Multi-Input-Multi-Output (MIMO) control
• Optimal control algorithms need to be designed for power shifting among different cores.
Model predictive control (MPC) theory
• Different cores may be coupled together.
Specific design constraints
• Workload is unpredictable at design time.
Online parameter estimation
• Control accuracy and system stability is critical
Theoretically guaranteed control performance and stability
ECE 692 2009 7
Temperature-Constrained Power Control
• MIMO control loop invoked periodically– Power monitor sends the chip-level power consumption to the controller– Controller reads temperature and performance metrics of each core – Controller computes new DVFS levels based on MPC control theory– New per-core DVFS levels are sent to the cores– Online model estimator updates the power model
ECE 692 2009 8
Steps of Model Predictive Control
• System modeling– Power model
• Controller design– MPC controller design– Constrains:
• Frequency range • Power budget • Temperature • Other design requirements
• System stability analysis
ECE 692 2009 9
Steps of Model Predictive Control
• System modelingSystem modeling– Power modelPower model
• Controller design– MPC controller design– Constrains:
• Frequency range • Power budget • Temperature • Other design requirements
• System stability analysis
ECE 692 2009 10
System Modeling: Power Model (1)
• Power consumption of one core
iiii ckfakp )()( )()()1( kfakpkp iiii
A • Estimated system parameters • Initial value can be defined by system identification• May change for different workloads• Can be updated by online estimation
p(k +1) = p(k) + AΔf(k)
ECE 692 2009 11
1
1
( )
( 1) ( )
( )N
N
f k
cp k cp k a a
f k
System Modeling: Power Model (2)
• Total power consumption of CMP
• Power model validation
Total power consumption of the chip
ECE 692 2009 12
Steps of Model Predictive Control
• System modeling– Power model
• Controller designController design– MPC controller designMPC controller design– Constrains: Constrains:
• Frequency range Frequency range • Power budget Power budget • Temperature Temperature • Other design requirementsOther design requirements
• System stability analysis
ECE 692 2009 13
Controller Design: MPC Controller
• Control objective: minimize the cost function
1
0
22
)(1
)()()(M
iiQ
P
i
kikrefkikcpkVR(i)maxF)kif(k)kiΔf(k
Control accuracy Performance optimization
Model prediction
Measured from power meter:
feedback
( | ) ( ( ))ref
Ti
T
s sref k i k P e P cp k
ECE 692 2009 14
Controller Design: Constraints (1)
• Physical frequency range
• Power budget for each core
• Other design requirements
min, max,( 1)j j jF f k F (1 )j N
( ) scp k P
( 1) ( 1)i jf k f k (1 , )i j N
ECE 692 2009 15
Controller Design: Constraints (2)
• Model between temperature and frequency– Temperature & power
– Power & frequency
( ) ( )i i i ip k a f k c
• Temperature constraint
( 1)i it k T
( 1) ik s iB f
t tt(k +1) = A t(k) +B p(k)
tΔt(k) = A Δt(k -1) +BΔf(k -1)
ECE 692 2009 16
Steps of Model Predictive Control
• System modeling– Power model
• Controller design– MPC controller design– Constrains:
• frequency range • power budget • temperature • other design requirements
• System stability analysisSystem stability analysis
ECE 692 2009 17
Controller Design: Stability Analysis
• Stability: – Converge to desired bounds from any initial condition
• Unknown system gain: – Actual system parameter , estimating system parameter
– The bigger range, the better system adaptability.
• The system is proved to be stable in a wide range– Uniform workload
• 0< g ≤ 8.83
– Different workload• 0 < g1 ≤ 15.7• 0 < g2 ≤ 17.6
The model can work as long as the real parameter
of a system is less than 8.83 times of the value
used to design the system.
A =GAA A
gG I
ECE 692 2009 18
Online Model Estimation
• Recursive Least Square (RLS) estimator to update the model periodically
– RLS estimator records and – The estimator calculates and– The estimator updates with in the system model
( )cp k e eA f (k) 1 Na a ceA (k)
Tef (k) = f(k) 1
f(k)
ef (k) eA (k)
A ia
( )cp k
ECE 692 2009 19
System ImplementationPower lines
(Current signal)
Current probe(1mv/A)
USB interface
Physical Testbed Simulation
CPU Intel Xeon X5365 Alpha 21264 like
Cores 4 4, 8, 16
Power Monitor Digital MultiMeter Wattch
Temp Sensor Coretemp driver
Controller Software
Workload SPEC CPU 2006
ECE 692 2009 20
Experimentation
• Baselines
• Empirical results– Control accuracy– Application performance– Temperature constraints– Online model estimator
• Simulation results– Control accuracy– Application performance
ECE 692 2009 21
Experimentation
• Baselines
• Empirical results– Control accuracy– Application performance– Temperature constraints– Online model estimator
• Simulation results– Control accuracy– Application performance
ECE 692 2009 22
Baselines
• Priority– Per-core DVFS– Heuristic based
• Power > budget
DVFS decreases by 1 • Power < budget
DVFS increases by 1
• Improved priority– Priority with safety
margin
• MaxBIPS– Per-core DVFS– Predictive based: uses
a typical workload to build a static table offline
– Exhaustive search from combination of DVFS levels for all cores
Workload sensitive
ECE 692 2009 23
Baselines: MaxBIPS
• Define two N*M matrices: Power and BIPS
– N: number of cores
– M: number of power modes
• Fill in the matrices with actual and predictive values
– Power: cubic scaling
– BIPS: linear scaling
• Find out the power and core combination to achieve best BIPS under power budget
Core1 Core2
Mode1 20 16
Mode2 17 14
Core1 Core2
Mode1 80 60
Mode2 69 52
PowerMatrix
BIPSMatrix
Actual value
317.15 20 0.95
Mode Power Savings
Performance Degradation
Mode1 None None
Mode2 15% 5%
BIPS Power
80+60=140 20+16=36
80+52=132 20+14=34
69+60=129 17+16=33
69+52=121 17+14=31
If power budget is 32, last one
will be selected
ECE 692 2009 24
Experimentation
• Baselines
• Empirical results– Control accuracy– Application performance– Temperature constraints– Online model estimator
• Simulation results– Control accuracy– Application performance
ECE 692 2009 25
Empirical Results: Control Accuracy (1)
• Comparison of steady state errors– Steady state error: violation of power budget at
different power level.– MPC follows the set point well.
ECE 692 2009 26
Empirical Results: Control Accuracy (2)
• MPC V.S. MaxBIPS / Priority / Improved Priority Much lower than
the set point
Fits well
Oscillates around the set point
Exceeds the budget at times
ECE 692 2009 27
Empirical Results: Application Performance
• SPEC performance between MPC, MaxBIPS and improved priority under different power budgets.
– MPC achieves better performance because MPC can precisely achieve the set-point power.
– Average improvement of MPC is 9.69% over MaxBIPS and 8.95% over Improved Priority.
ECE 692 2009 28
Empirical Results: Temperature Constraints
• Emulate a thermal emergency by lowering the temperature constraint– Figure (a) shows that the temperature of cores are
quickly constrained to the lower bound.– Figure (b) shows that the temperature constraints
works effectively to reduce power consumption.
ECE 692 2009 29
Empirical Results: Online Model Estimator
• MPC V.S. MPC with estimator– Workload may change significantly at run time.– Estimator can correct system parameters
dynamically.– MPC without estimator suffers large oscillations.
ECE 692 2009 30
Experimentation
• Baselines
• Empirical results– Control accuracy– Application performance– Temperature constraints– Online model estimator
• Simulation results– Control accuracy– Application performance
ECE 692 2009 31
Simulation Results: Control Accuracy
• Simulation with more cores (4, 8, 16)– Average power and standard deviation of
different control method.• MPC precisely converges to the budget.
MaxBIPS’ absence of 16 due to
exponentially increase of static prediction table
ECE 692 2009 32
Simulation Results: Application Performance
• SPEC benchmark performance comparison under different number of cores (Set point = 95%, 85%)
ECE 692 2009 33
Conclusion
• A temperature-constrained chip-level power controller– Designed based on MPC control theory– Accurately controls power consumption– Temperatures of the cores are limited to stay below the
constraint.– An online model estimator periodically updates the system
model
• Compared with state-of-the-art work– More accurate power control– Better application performance
ECE 692 2009 34
Multi-Optimization Power Management for Chip
Multiprocessors
Ke Meng, Russ Joseph, Robert P. Dick
Northwestern University
Li Shang
University of Colorado
ECE 692 2009 35
Introduction
• Power is still a first-class design constraint in CMP era.– Higher transistor density– Higher leakage power
• Power is still a precious computing resource– When power is limited, maximizing the chip-
wide performance requires global and local coordination.
High power density
ThermalIssues
ECE 692 2009 36
System Framework
Select power optimizations and allowable power modes
Collect data from sensors and counters; calculate power /performance.
Analyze, search and tune
Soft-limit budget
ECE 692 2009 37
Optimization Pool (1)
• DVFS– Simple models
• Frequency: linear with voltage• Power: changes cubically with voltage• Performance: roughly linear with frequency
– High efficiency• Cubical relationship between frequency and power
ECE 692 2009 38
Optimization Pool (2)
• Cache resizing– Large leakage: big savings– Workload variety: unused private capacity
ECE 692 2009 39
Models and Experimentations
• Models– Dynamic voltage / frequency scaling (DVFS)
– Cache resizing
– Unified analytic models
– Risk evaluation
– Search algorithms
• Experimentation– Configuration
– Model validation
– Model evaluation
– Power violation
ECE 692 2009 40
Models and Experimentations
• ModelsModels– Dynamic voltage / frequency scaling (DVFS)Dynamic voltage / frequency scaling (DVFS)
– Cache resizingCache resizing
– Unified analytic modelsUnified analytic models
– Risk evaluationRisk evaluation
– Search algorithmsSearch algorithms
• Experimentation– Configuration
– Model validation
– Model evaluation
– Power violation
ECE 692 2009 41
Analytic Models: DVFS
• DVFS modeling – CPI stack counters: counts computing stalls
and L2 miss stalls• Computing stalls: changes with frequency• L2 miss stalls: constant in spite of frequency
– Performance model
iCPU
iMem
( )ji ii
j ii i
FreqCPU Mem
FreqPerf Perf
CPU Mem
Power: Cubic with
frequency
j jCPU Mem
ECE 692 2009 42
Analytic Models: Cache Resizing
• Cache resizing modeling– Non-stall cycles– Stall cycles due to cache misses
jj i
i
MissCPI BaseCPIPerf Perf
MissCPI BaseCPI
MissCPI
BaseCPI
Power: Average leakage power of a cache way times number of active
ways
ECE 692 2009 43
Analytic Models: Unification
• Unified analytic models with DVFS and cache resizing– Performance
• Weak interaction among multiple optimization allow independent speed-ups
– Power• DVFS has a strong influence • Additive contribution of cache resizing
jk
ki
PerfOptSpeedup
Perf
iOptPower
3( ) ( )jj i kk i
FreqPower Power OptPower
Freq
ECE 692 2009 44
Analytic Models: Risk Evaluation
• Why to do risk evaluation?– Some optimizations are more prone to phase
adjustment. – Severe performance loss and power violation.
• How to do risk evaluation?– DVFS: assume zero risk.– Cache resizing: cache activities variation
threshold.
ECE 692 2009 45
• Brute-force search– Traverse all possible power modes– Always find the best combination– Slow when search space are large
• Greedy search– Take currently best step available
• Current best step: power mode with the maximal delta power/performance ratio.
– Fast– Can get stuck in local minima
Results show it happens rarely
Analytic Models: Search Algorithms(1)
ECE 692 2009 46
Models and Experimentations
• Models– Dynamic voltage / frequency scaling (DVFS)
– Cache resizing
– Unified analytic models
– Risk evaluation
– Search algorithms
• ExperimentationExperimentation– ConfigurationConfiguration
– Model validationModel validation
– Model evaluationModel evaluation
– Power violationPower violation
ECE 692 2009 47
Experiment: Configuration
Processor Setup
Cores 4 Alpha21264-like cores
L1 I/D Cache 64KB 2-way private 64B blocks
L2 Cache 2MB 8-way private 128B blocks
Tech node 65 nm
DVFS Range 85%, 90%, 95%, 100% of 3GHz
Group No. Workloads Stability
Group A equake, swim, sixtrack, gcc Moderate
Group B applu, gap, facerec, vortex Moderate
Group C mesa, eon, lucas, wupwise Stable
Group D art, mcf, parser, vpr Un-stable
ECE 692 2009 48
Experiment: Model Validation
• Cache CPI model validation
ECE 692 2009 49
Experiment: Model Evaluation (1)
• Modeling-greedy vs. modeling-global / trial-and-error– Trial-and-error (DVFS + cache re
sizing):• Starting trial-stage when entering a s
table phase• Only works with workloads possessi
ng stable phases (Group C).
– Analytical modeling (DVFS + cache resizing):
• 8% perf loss vs. 35% power saving• Greedy search works extremely well
ECE 692 2009 50
Experiment: Model Evaluation (2)
• Modeling with risk management vs. MaxBIPS– Simple (DVFS + cache resizing):
Analytical modeling without risk evaluation.
– With risk evaluation:Results either better or almost unchanged.
– MaxBIPS (only DVFS): Not always the worst.Difficult to manage multiple optimizationsEven with risk evaluation, errors can be made before risk being identified.
ECE 692 2009 51
Conclusion
• Power problem is critical in CMP.
• CMP power management must coordinate global and local power usage.
• Analytical modeling are more favorable than trial-and-error.
• Risk evaluation is necessary to avoid frequent prediction errors.
ECE 692 2009 52
Comparison
1st paper 2nd Paper
Controlling Predictive based Heuristic based
Power budget Hard limit Soft limit
Temperature management
Yes No
L2 cache involvement
No Yes
Hardware implementation
Yes No
ECE 692 2009 53
Critiques
• First paper– Temperature constraint seems much higher
than normal working condition.– Explanation of in temperature constraints is
not very clear.
• Second paper– Modeling accuracy is low.– No absolute guarantee of power consumption.– Too many arbitrary assumptions.
is
ECE 692 2009 54
Thank you