An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret

An Analysis of Efficient Multi-Core Global Power Management Policies

Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret Martonosi

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)Speaker: Jun Shen

Agenda

Background Motivation Contribution Details of the contributions Overview of the global power management policy Briefs of the simulation Comparison of the different policies Evaluation methodology Three neglected issues Advance and Drawback of the paper The relationship between the paper and the course The impact of the paper on project Q&A

Background

Multicore architecture is more and more popular and widespread due to the famous “walls”

Power and temperature problems are becoming more and more crucial

Motivation

To solve two “How” questions How to enforce a power budget through

global power manager? How to minimize power given a performance

target?

Contributions

Primarily three contributions The creation of Global power manager (PM) A fast static power management analysis tool Evaluation of different PM policies (with

different focus such as prioritization, fairness, throughput)

Overview of global PM (1)

Why we need global power manager?– To exploit the widely known variability in demand

and characteristics of the workloads (e.g. those across threads (cores))

– To cooperate with the adaptive action of each core with a given power budget



Some preconditions

Each core has its own dynamic controller has its power-performance monitor (e.g.

current monitor, perf monitoring counter hw) can be running in multiple power modes


the loop of PM’s work PM periodically collects power-performance data

from local monitors PM reports it to OS OS returns power budget, thread affinities, high-level

scheduling and load-balancing plan to PM PM decides the power-mode of each core based on

those info


Optional implementation of PM Separate ondie microcontroller with some

foxton-alike underlying monitors Separate helper daemon on a dedicated core Low level hypervisor-like program interface

Brief of simulation (1)

Based on IBM Turandot simulator Power statistics from IBM PowerTimer The list of core parameter

Brief of simulation(2)

Use single-threaded Turando result for each power mode simulation

Simulating multicore by simultaneously progressing over Turando-traces, and these traces are the execution of different benchmarks.

Validate simulation with a cycle-accurate full CMP imple. of Turandot(???)


New ideas:– Time-driven L2– Thread synchronization to handle multiple clock domain

mode

Experiment result: – Simulation power variation with CMP less than 5%– Performance variation [9%,30%]

note:the upper bound is achieved with a highly memory-bound app


Core power mode Target: PowerSavings amount:

PerformanceDegradation amount ratio of 3 : 1

Comparison between objective and experiment estimation


Target

estimation

Global PM policies (1)

Policy Introduction Priority: every core has a pre-defined priority, the

core with higher priority, then the core has higher voltage---higher throughput

Power balancing: try to equal the power consumption of every core.

Throughput Optimization: pick up a combination of power mode so that maximizing throughput


Chip-wide DVFS– an alternative Advantage: simple implementation (no

synchronization across cores) Disadvantage:

– high penalty with few power mode for small power overshoot

– Great performance deviation for different type of tasks


Evaluation Methodology (1)

Proposed Evaluation Methodology– Policy curve

Overall performance degradation under several budget (wrt all turbo execution)

– Budget curve Plot the percent of power consumed( with one specific

policy) over the original power budget.


Policy Curve


Budget Curve


Other issues:– How about the fairness?--- some cores always get full budget

while others always in starvation– Some metrics on fairness: weighted speedup, harmonic mean of

thread speedups

– Weighted slowdown = harmonic mean of individual speedup wrt turbo execution(harmonic mean stress the most unfairness)

– Formula of Speedup = performance with enhancement / baseline performance, in this paper, this is actually a slowdown


Weighted slowdown



Dynamic adaptability

How to get the knowledge of power/performance behavior of applications?– Careful exploration---try small scale of power

change??– Not suitable for harsh adaptation policy– Set up a profile of each application from past

experience?? – Not always reliable

Issues (1)

Issues (2)

A new solution:– Rationale behind: An application’s behavior at

another DVFS setting can be estimated analytically with reasonable accuracy

– How to do: Setup core * power mode matrix Power has cubic relationship with scaling ratio BIPS has a linear relationship with scaling ratio

Frequency has a linear dependency on voltage

Issues (3)

Validity of the solution– With SPEC, power estimation error range

0.1%~0.3%– BIPS estimation error range 2%-4%

Issues (4)

Where is the ceiling of optimization?– If we can know the future, everything will be easy.

(sorry, I don’t know how to get the data from oracle)

Issues (5)

How about the efficiency of MaxBIPS in general cases?

Issues (6)

Advance and Drawback

Advance– Refer to the contribution

Drawback– Some important details are skipped, such as how to get

data of oracle policy– How to keep power saving / performance degradation ratio

3:1– The authors fail to reveal the relationship between number

of power mode and power management efficiency

Link btween this paper and the cse520

Explore the power and performance relationship in a CMP system

The optimization thought can extend to the architecture design

Project

My project plans to explore how the number of power modes can influence the efficiency of power management policy

Q&A

The End

Documents

An Analysis of Efficient Multi-Core Global Power Management Policies Authors: Canturk Isci†, Alper Buyuktosunoglu†, Chen-Yong Cher†, Pradip Bose† and Margaret