View
215
Download
1
Tags:
Embed Size (px)
Citation preview
1
Wattch: A Framework for Wattch: A Framework for Architecture-Level Power Architecture-Level Power
Analysis and OptimizationsAnalysis and Optimizations
Author: D. Brooks, V.Tiwari and M. MartonosiAuthor: D. Brooks, V.Tiwari and M. Martonosi
Reviewer: Junxia MaReviewer: Junxia Ma
University of ConnecticutUniversity of ConnecticutSchool of EngineeringSchool of Engineering
Department of Electrical & Computer EngineeringDepartment of Electrical & Computer Engineering
May, 1May, 1stst 2008 2008
2
ContentsContents
Overview of this WorkPower Modeling MethodologyModel ValidationCase StudiesConclusions
3
Overview: motivationOverview: motivation
Power is increasingly important in modern processorsThe need that power/performance tradeoffs be made more visible Circuit level power analysis tools are slow and late in the design process
4
Overview: contributionOverview: contribution
Wattch: a framework for analyzing and optimizing CPU power consumption at the architecture level;Achieve 1000X or more faster than layout-level power tools;Maintain accuracy within 10% of estimates from layout-level power toolsBased on parameterized power models + per-cycle resource usage counts
5
Overview: contributionOverview: contribution
App
Binary
Config 1 Config 2
SimPowerSimPower
Watts-1 Watts-2
Scenario A: Microachitectural tradeoffs
App
Binary 1 Binary 2
Common Cofig
SimPowerSimPower
Watts-1 Watts-2
Scenario B: Compiler Optimizations
6
Overview: contributionOverview: contribution
Scenario C: Hardware Optimizations
App
Binary
Config 1 Additional Hardware?
CustomStructure?SimPower
Watts-1
Watts-2
ArrayStructure?
Use CurrentModels
Estimate Powerof Structure
SimPower
7
Power Modeling Power Modeling Methodology-1Methodology-1
Cycle-Level Performance
Simulator
Parameterizable Power Models
Cycle-by-Cycle Hardware Access
Counts
Binary
HardwareConfig
PowerEstimate
PerformanceEstimate
Overall Structure of the Power Simulator
Foundation of this work
8
Power Modeling Power Modeling Methodology-2Methodology-2
Main processor units: Array StructureFully Associative Content-Addressable MemoriesCombinational Logic and WiresClocking
2d ddP CV af
Load Cap Clock frequency
Switch activity
Supply Voltage
9
Power Modeling Power Modeling Methodology-3Methodology-3
Equations for Capacitance of Critical Nodes
10
Array StructureArray Structure
PreCharge
From Decoder
Wordline Driver
Cell AccessTransistors
To Sense amps
To Sense amps
Num. ofWordlines
Num. ofBitlines
11
CAM StructureCAM Structure
Key sizing Parameters in this CAM: The issue/commit width of the machine (W)The instruction window size (impacts CAM’s height)Physical register tag size (impacts CAM’s width)
12
Complex Logic BlocksComplex Logic Blocks
Two larger complex logic blocks considered:
i): instruction selection logic; ii): dependency check logic
For result buses: model the power consumption of result buses by estimating the lengthFor ALU: scale based on previous research results
13
ClockingClocking
Clocking network can be the most significant source of power consumptionSources of clock power consumption:
i) Global clock metal lines ii) Global clock buffers ii) clock loading
14
Common CPU hardwareCommon CPU hardwarestructures and model type structures and model type
usedused
Use SimpleScalar’s hardware configuration Parameters as inputs
15
SimpleScalar InterfaceSimpleScalar Interface
The power models are interfaced with SimpleScalar
keeps track of which units are accessed per cycle records the total energy consumed for an application
SimpleScalar provides simulation environment with out-of-order processors with 5-stage pipelines.Speculative execution is supportedThis work extended SimpleScalar to provide variable # of additional pipestages between fetch and issue.Assume 7 cycles of mispredict penalty
16
Conditional Clocking Conditional Clocking StylesStyles
Power consumption of benchmarks with conditional clocking on multi ported hardwares
17
Simulation SpeedSimulation Speed
For lower-level tools, running Power MillPower Mill on a 64-bit adder for 100 test vectors takes ~1 hrIn the same amount of time, WattchWattch can simulate a full CPU running roughly 280M SimpleScalar instrucitons and generate both power and performance estimates!
18
Model Validation-1Model Validation-1
Three methods of validation
Validation 1: Model Capacitance vs. Physical Schematics
Total capacitance values are within 6~11%
19
Model Validation-2Model Validation-2
Validation 2: Relative power consumption by structure
Comparison for Pentium Pro Comparison for Alpha 21264
The clock power model used by Wattch is based on H-tree style
which was used in Alpha 21264, not in Intel processors
20
Model Validation-3Model Validation-3Validation 3: Max power consumption for three CPUs
Configuration of Processors
Maximum power, modeled vs reported
Average 30% lower than reported; reflect systematic
underestimation
21
Validation SummaryValidation Summary
For capacitance estimates: ~10% (validation 1)Relative accuracy: 10~13% (validation 2)Limitations
Don’t model all of the miscellaneous logic in real microprocessorsDifferent circuit design styles can lead to different resultsMost up-to-date industrial fabrication data is unavailable
The model will be most accurate when comparing CPUs of similar fabrication technology
22
Case StudiesCase StudiesBaseline Configuration
Of Simulated Processor Use SPECint95 and SPECfp95 benchmark suites;
Benchmarks are compiled using Compaq Alpha cc compiler
For each program simulate 200M instructions
Metrics used:
i) Power
ii) Performance
iii) Energy
iv) Energy-Delay Product
23
Case Study –Case Study –A Microarchitectural A Microarchitectural
ExplorationExploration
IPC for gcc
IPC for turb3d
Power for gcc
Power for turb3d
Energy-delay product for gcc
Energy-delay product for turb3d
24
Case Study –Case Study –Power Analysis of Loop Power Analysis of Loop
UnrollingUnrolling
Effect of loop unrolling on performance and power
25
Case Study –Case Study –Power Analysis of Loop Power Analysis of Loop
UnrollingUnrolling
Detailed breakdown of power dissipation
26
ConclusionsConclusions
This paper presents a simulator frame work for a wide range of architectural and compiler evaluation;Wattch has the benefit of low-level validation which can help researchers do power modeling at abstraction levels;Wattch can provide feedback to compilers on power consumption — power aware compilerThe design choices are slightly different when power metrics are taken into account; Wattch is intended to help explore these tradeoffs.Wattch has limitations and need improvements