Upload
brett-may
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
An Analysis of Efficient Multi-Core Global Power Management Policies
Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret Martonosi
The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)Speaker: Jun Shen
Agenda
Background Motivation Contribution Details of the contributions Overview of the global power management policy Briefs of the simulation Comparison of the different policies Evaluation methodology Three neglected issues Advance and Drawback of the paper The relationship between the paper and the course The impact of the paper on project Q&A
Background
Multicore architecture is more and more popular and widespread due to the famous “walls”
Power and temperature problems are becoming more and more crucial
Motivation
To solve two “How” questions How to enforce a power budget through
global power manager? How to minimize power given a performance
target?
Contributions
Primarily three contributions The creation of Global power manager (PM) A fast static power management analysis tool Evaluation of different PM policies (with
different focus such as prioritization, fairness, throughput)
Overview of global PM (1)
Why we need global power manager?– To exploit the widely known variability in demand
and characteristics of the workloads (e.g. those across threads (cores))
– To cooperate with the adaptive action of each core with a given power budget
Overview of global PM (3)
Some preconditions
Each core has its own dynamic controller has its power-performance monitor (e.g.
current monitor, perf monitoring counter hw) can be running in multiple power modes
Overview of global PM (4)
the loop of PM’s work PM periodically collects power-performance data
from local monitors PM reports it to OS OS returns power budget, thread affinities, high-level
scheduling and load-balancing plan to PM PM decides the power-mode of each core based on
those info
Overview of global PM (5)
Optional implementation of PM Separate ondie microcontroller with some
foxton-alike underlying monitors Separate helper daemon on a dedicated core Low level hypervisor-like program interface
Brief of simulation (1)
Based on IBM Turandot simulator Power statistics from IBM PowerTimer The list of core parameter
Brief of simulation(2)
Use single-threaded Turando result for each power mode simulation
Simulating multicore by simultaneously progressing over Turando-traces, and these traces are the execution of different benchmarks.
Validate simulation with a cycle-accurate full CMP imple. of Turandot(???)
Brief of simulation (3)
New ideas:– Time-driven L2– Thread synchronization to handle multiple clock domain
mode
Experiment result: – Simulation power variation with CMP less than 5%– Performance variation [9%,30%]
note:the upper bound is achieved with a highly memory-bound app
Brief of simulation (4)
Core power mode Target: PowerSavings amount:
PerformanceDegradation amount ratio of 3 : 1
Comparison between objective and experiment estimation
Global PM policies (1)
Policy Introduction Priority: every core has a pre-defined priority, the
core with higher priority, then the core has higher voltage---higher throughput
Power balancing: try to equal the power consumption of every core.
Throughput Optimization: pick up a combination of power mode so that maximizing throughput
Global PM policies (2)
Chip-wide DVFS– an alternative Advantage: simple implementation (no
synchronization across cores) Disadvantage:
– high penalty with few power mode for small power overshoot
– Great performance deviation for different type of tasks
Evaluation Methodology (1)
Proposed Evaluation Methodology– Policy curve
Overall performance degradation under several budget (wrt all turbo execution)
– Budget curve Plot the percent of power consumed( with one specific
policy) over the original power budget.
Evaluation Methodology (4)
Other issues:– How about the fairness?--- some cores always get full budget
while others always in starvation– Some metrics on fairness: weighted speedup, harmonic mean of
thread speedups
– Weighted slowdown = harmonic mean of individual speedup wrt turbo execution(harmonic mean stress the most unfairness)
– Formula of Speedup = performance with enhancement / baseline performance, in this paper, this is actually a slowdown
How to get the knowledge of power/performance behavior of applications?– Careful exploration---try small scale of power
change??– Not suitable for harsh adaptation policy– Set up a profile of each application from past
experience?? – Not always reliable
Issues (1)
Issues (2)
A new solution:– Rationale behind: An application’s behavior at
another DVFS setting can be estimated analytically with reasonable accuracy
– How to do: Setup core * power mode matrix Power has cubic relationship with scaling ratio BIPS has a linear relationship with scaling ratio
Frequency has a linear dependency on voltage
Issues (3)
Validity of the solution– With SPEC, power estimation error range
0.1%~0.3%– BIPS estimation error range 2%-4%
Issues (4)
Where is the ceiling of optimization?– If we can know the future, everything will be easy.
(sorry, I don’t know how to get the data from oracle)
Advance and Drawback
Advance– Refer to the contribution
Drawback– Some important details are skipped, such as how to get
data of oracle policy– How to keep power saving / performance degradation ratio
3:1– The authors fail to reveal the relationship between number
of power mode and power management efficiency
Link btween this paper and the cse520
Explore the power and performance relationship in a CMP system
The optimization thought can extend to the architecture design
Project
My project plans to explore how the number of power modes can influence the efficiency of power management policy