22
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

Embed Size (px)

Citation preview

Page 1: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

1

Power estimation in the algorithmic and register-

transfer level

September 25, 2006Chong-Min Kyung

Page 2: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

2

Software power analysis

• Objective ; – Compare different programs– Select processors– Optimize software

• Three level of granularity, (acc. to execution speed, availability & accuracy)– Source code level– Instruction level– BFM (Bus Function Model) level

Page 3: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

3

• Execution performed on– 1) Target processor ; Compile source code &

run• measure the heat generated to estimate the

power?• Or monitor (with inserted monitoring instructions,

or some hardware, both with hopefully negligible overhead/disturbance on the power and speed) to count the occurrence of each instruction and compute the total estimated power?

• Dynamic code can be also handled.• Minimal disturbance of the overhead code is the

key to accuracy

Page 4: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

4

• Execution performed on– 2) Another processor ; Run a program

estimating the power consumption with the target-compiled code as input data. • Only the power consumption of the static

code can be estimated.

– 3) Simulator ;• Either in source code level, • Or instruction code level (same as ‘Another

Processor’)

Page 5: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

5

Power estimation of Software

• Simplest approach ; Energy consumption is proportional to the program execution time.

• Instruction set approach ; Energy consumption is different for each instruction class (class of similar power behavior), and each class of instruction pair (inter-instruction dependency).– Measurement done by running long loops of

the same instruction

Page 6: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

6

Power estimation of Software

• Becomes more difficult with more complex processor (multi-thread, out-of-order execution,…) and memory system architecture (cache..)

• Accurate estimation requires software profiling on ISS with bus access pattern.

• A 5% accurate estimation model developed for ARM processor [DAC 99, Simunic, T;Cycle-accurate simulation of energy consumption in embedded systems]

Page 7: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

7

Algorithmic-level power estimation

• Algorithmic-level power estimation consists of – Architecture estimation– Activation estimation– Power model evaluation

• Architecture estimation by High-Level Synthesis (HLS)– Allocation, Scheduling, and Binding (Allocation in narrow

sense is ‘unit selection’, where each operation can be performed by more than one unit.)

– Allocation and Scheduling affect each other.

• HLS considering communication (interconnect) – ASB + floorplanning– Cycle time violation check based on wire delay (based on

wire length estimation)

• (HLS considering interconnect) and power

Page 8: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

8

Target architecture of HLS

• Target architecture of HLS– Datapath <- dataflow of CDFG– Controller <- dataflow and control flow– Clock tree

Page 9: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

9

Target architecture of HLS• Architecture synthesis =

– Schedule the operations under timing & resource constraints, and

– Allocate the required resources (operation units)

• Operation unit can be arithmetic module, logic module or memory module.

– Output of architecture synthesis is• A set of operation units• Registers• Steering logic to transfer data between operation

units and registers, and• Controller having control signals to steer MUX, OU

and Enable signal of registers

• How to integrate power optimization into HLS?

Page 10: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

10

RTL Power Modeling

• RTL Power Modeling = Constructing a model Power=P(X1,X2,…Xn) from n model parameters

Page 11: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

11

Issues of RTL power modeling

• Granularity ;• Choice of model parameters ;

– Activity model or complexity model or both?

• Semantic of the model ; – cumulative or cycle-accurate?

• How to build and store the model ;– Top-down or bottom-up?– Table or equation?

Page 12: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

12

Model granularity

• Model granularity ; – Should not be too big;

• E.g., single monolithic model is too time-consuming to build, inaccurate, and inflexible

– Not too small;– FSMD (FSM with datapath) is a reasonable

choice, as RTL design is an interaction of datapath and controller

• Five main components ;– Controller– Register file– Bus– Memory– Functional blocks

Page 13: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

13

Page 14: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

14

Activity model or Complexity model, or both?

• Model Parameters ;– What parameters are to be included in the

model?– Model parameters must be observable at

the RTL• P total = k AiCi ; Power model decoupled into two

separate models, i.e., activity model and capacitance model

• Activity model or Complexity model, or both?– Complexity model can be just capacitance

model or include transistor count as well to account for the leakage current.

Page 15: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

15

Activity parameters• RTL activity : an approximation of all intra-clock

cycle activities projected to the relevant clock transition point.

• Main parameters are static and transition probabilities– Choose between bit-wise and word-wise probability

according to the desired accuracy and speed• n-input, m-output component has (n+m) bitwise parameters,

while has only two word-wise parameters

• Additional parameters;– Transition density ; average switching rate per second

• Includes non-periodic signals– Correlation measures ; useful for computing switching

power• Spatial correlation• Temporal correlation = transition probability

– Entropy ; somewhat similar to transition probability (2p(1-p)

• plog2(1/p)+(1-p)log2(1/(1-p))

Page 16: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

16

Complexity parameters

• Capacitance ~ gate count, TR count,.• Only complexity parameters available at

RTL are– Width of a component ; # of inputs, outputs– # of states ; applicable for controller

• Architecture-specific model– k12N2 for NxN multiplier

– k2N for ripple carry adder

Page 17: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

17

Model semantics

• Cumulative (average) vs. cycle-accurate ;– Cumulative power = summation of average

(cumulative) power over module– Cycle-accurate power = summation of power over

module for each clock cycle

• Cumulative power is only as good as tracking battery time, average heat dissipation, etc.

• Cycle-accurate power is needed for IP drop, noise, reliability (electromigration) analysis.

• Pseudo-cycle-accurate power estimation may be okay for dynamic power management.

Page 18: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

18

How to build and store the model

• Model construction– Top-down ; good for

• When the implementation follows some predictable template, e.g., memory

• When dealing with a new circuit having no measured data available

– Bottom-up ;• Can be equation-based

– Template for the power model is given first,– Statistical techniques are used to fit the measured

values to the model by adjusting cofficients

• Model storage– Equation-based– Table-based

Page 19: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

19

Accuracy issue

• Metric ; E = lPe-Pl/max(Pe,P)

• Average error• Standard deviation

Page 20: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

20

Macro modeling flow

1. Choose model parameters- Ex) Average switching activity of inputs and/or outputs

2. Design training set– Good coverage, unbiasedness, resembling actual

circumstantial conditions

3. Characterization– Running the power-accurate lower-level simulator

• For example, for RTL training, run a gate-level simulator with good coverage of input/output switching activities

4. Model extraction• For Equation-based, run LMS regression engine• For table-based, merge entries according to the

available table space

Page 21: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

21

Page 22: 1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

22

C. Piguet, ‘Low Power CMOS ; technology, logic design and CAD tools’ CRC until

10/25• 9/4; introduction• 9/18; physics and limits of power dissipation in CMOS (2,3)• 9/20; system-level power analysis and estimation (18)• 9/25; algorithmic and RTL power estimation (18,19)• 9/27, 10/2, 10/4, 10/9, 10/11 ; synthesis for low power circuits

and logic blocks (7,8,9,10,13)• 10/16 ; driving interconnects for low power (14)• 10/18 ; new device candidates (4,5)• 10/23; ultimate low power logic (16)• 10/25; robustness of low power logic (17)• 10/30; low power memory• 11/1; software for low power• 11/6; energy recovery circuits• 11/8; adaptive power supply systems• 11/13,15 ; student presentation• 11/20, 22; low power DSP• 11/27 low power design methodology• 11/29, 12/4,6,11,13 ; student presentation