26
1 Wattch: A Framework for Wattch: A Framework for Architecture-Level Power Architecture-Level Power Analysis and Optimizations Analysis and Optimizations Author: D. Brooks, V.Tiwari and M. Author: D. Brooks, V.Tiwari and M. Martonosi Martonosi Reviewer: Junxia Ma Reviewer: Junxia Ma University of Connecticut University of Connecticut School of Engineering School of Engineering Department of Electrical & Computer Department of Electrical & Computer Engineering Engineering May, 1 May, 1 st st 2008 2008

1 Wattch: A Framework for Architecture- Level Power Analysis and Optimizations Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma University

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

1

Wattch: A Framework for Wattch: A Framework for Architecture-Level Power Architecture-Level Power

Analysis and OptimizationsAnalysis and Optimizations

Author: D. Brooks, V.Tiwari and M. MartonosiAuthor: D. Brooks, V.Tiwari and M. Martonosi

Reviewer: Junxia MaReviewer: Junxia Ma

University of ConnecticutUniversity of ConnecticutSchool of EngineeringSchool of Engineering

Department of Electrical & Computer EngineeringDepartment of Electrical & Computer Engineering

May, 1May, 1stst 2008 2008

2

ContentsContents

Overview of this WorkPower Modeling MethodologyModel ValidationCase StudiesConclusions

3

Overview: motivationOverview: motivation

Power is increasingly important in modern processorsThe need that power/performance tradeoffs be made more visible Circuit level power analysis tools are slow and late in the design process

4

Overview: contributionOverview: contribution

Wattch: a framework for analyzing and optimizing CPU power consumption at the architecture level;Achieve 1000X or more faster than layout-level power tools;Maintain accuracy within 10% of estimates from layout-level power toolsBased on parameterized power models + per-cycle resource usage counts

5

Overview: contributionOverview: contribution

App

Binary

Config 1 Config 2

SimPowerSimPower

Watts-1 Watts-2

Scenario A: Microachitectural tradeoffs

App

Binary 1 Binary 2

Common Cofig

SimPowerSimPower

Watts-1 Watts-2

Scenario B: Compiler Optimizations

6

Overview: contributionOverview: contribution

Scenario C: Hardware Optimizations

App

Binary

Config 1 Additional Hardware?

CustomStructure?SimPower

Watts-1

Watts-2

ArrayStructure?

Use CurrentModels

Estimate Powerof Structure

SimPower

7

Power Modeling Power Modeling Methodology-1Methodology-1

Cycle-Level Performance

Simulator

Parameterizable Power Models

Cycle-by-Cycle Hardware Access

Counts

Binary

HardwareConfig

PowerEstimate

PerformanceEstimate

Overall Structure of the Power Simulator

Foundation of this work

8

Power Modeling Power Modeling Methodology-2Methodology-2

Main processor units: Array StructureFully Associative Content-Addressable MemoriesCombinational Logic and WiresClocking

2d ddP CV af

Load Cap Clock frequency

Switch activity

Supply Voltage

9

Power Modeling Power Modeling Methodology-3Methodology-3

Equations for Capacitance of Critical Nodes

10

Array StructureArray Structure

PreCharge

From Decoder

Wordline Driver

Cell AccessTransistors

To Sense amps

To Sense amps

Num. ofWordlines

Num. ofBitlines

11

CAM StructureCAM Structure

Key sizing Parameters in this CAM: The issue/commit width of the machine (W)The instruction window size (impacts CAM’s height)Physical register tag size (impacts CAM’s width)

12

Complex Logic BlocksComplex Logic Blocks

Two larger complex logic blocks considered:

i): instruction selection logic; ii): dependency check logic

For result buses: model the power consumption of result buses by estimating the lengthFor ALU: scale based on previous research results

13

ClockingClocking

Clocking network can be the most significant source of power consumptionSources of clock power consumption:

i) Global clock metal lines ii) Global clock buffers ii) clock loading

14

Common CPU hardwareCommon CPU hardwarestructures and model type structures and model type

usedused

Use SimpleScalar’s hardware configuration Parameters as inputs

15

SimpleScalar InterfaceSimpleScalar Interface

The power models are interfaced with SimpleScalar

keeps track of which units are accessed per cycle records the total energy consumed for an application

SimpleScalar provides simulation environment with out-of-order processors with 5-stage pipelines.Speculative execution is supportedThis work extended SimpleScalar to provide variable # of additional pipestages between fetch and issue.Assume 7 cycles of mispredict penalty

16

Conditional Clocking Conditional Clocking StylesStyles

Power consumption of benchmarks with conditional clocking on multi ported hardwares

17

Simulation SpeedSimulation Speed

For lower-level tools, running Power MillPower Mill on a 64-bit adder for 100 test vectors takes ~1 hrIn the same amount of time, WattchWattch can simulate a full CPU running roughly 280M SimpleScalar instrucitons and generate both power and performance estimates!

18

Model Validation-1Model Validation-1

Three methods of validation

Validation 1: Model Capacitance vs. Physical Schematics

Total capacitance values are within 6~11%

19

Model Validation-2Model Validation-2

Validation 2: Relative power consumption by structure

Comparison for Pentium Pro Comparison for Alpha 21264

The clock power model used by Wattch is based on H-tree style

which was used in Alpha 21264, not in Intel processors

20

Model Validation-3Model Validation-3Validation 3: Max power consumption for three CPUs

Configuration of Processors

Maximum power, modeled vs reported

Average 30% lower than reported; reflect systematic

underestimation

21

Validation SummaryValidation Summary

For capacitance estimates: ~10% (validation 1)Relative accuracy: 10~13% (validation 2)Limitations

Don’t model all of the miscellaneous logic in real microprocessorsDifferent circuit design styles can lead to different resultsMost up-to-date industrial fabrication data is unavailable

The model will be most accurate when comparing CPUs of similar fabrication technology

22

Case StudiesCase StudiesBaseline Configuration

Of Simulated Processor Use SPECint95 and SPECfp95 benchmark suites;

Benchmarks are compiled using Compaq Alpha cc compiler

For each program simulate 200M instructions

Metrics used:

i) Power

ii) Performance

iii) Energy

iv) Energy-Delay Product

23

Case Study –Case Study –A Microarchitectural A Microarchitectural

ExplorationExploration

IPC for gcc

IPC for turb3d

Power for gcc

Power for turb3d

Energy-delay product for gcc

Energy-delay product for turb3d

24

Case Study –Case Study –Power Analysis of Loop Power Analysis of Loop

UnrollingUnrolling

Effect of loop unrolling on performance and power

25

Case Study –Case Study –Power Analysis of Loop Power Analysis of Loop

UnrollingUnrolling

Detailed breakdown of power dissipation

26

ConclusionsConclusions

This paper presents a simulator frame work for a wide range of architectural and compiler evaluation;Wattch has the benefit of low-level validation which can help researchers do power modeling at abstraction levels;Wattch can provide feedback to compilers on power consumption — power aware compilerThe design choices are slightly different when power metrics are taken into account; Wattch is intended to help explore these tradeoffs.Wattch has limitations and need improvements