1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung

1

Power estimation in the algorithmic and register-

transfer level

September 25, 2006Chong-Min Kyung

2

Software power analysis

• Objective ; – Compare different programs– Select processors– Optimize software

• Three level of granularity, (acc. to execution speed, availability & accuracy)– Source code level– Instruction level– BFM (Bus Function Model) level

3

• Execution performed on– 1) Target processor ; Compile source code &

run• measure the heat generated to estimate the

power?• Or monitor (with inserted monitoring instructions,

or some hardware, both with hopefully negligible overhead/disturbance on the power and speed) to count the occurrence of each instruction and compute the total estimated power?

• Dynamic code can be also handled.• Minimal disturbance of the overhead code is the

key to accuracy

4

• Execution performed on– 2) Another processor ; Run a program

estimating the power consumption with the target-compiled code as input data. • Only the power consumption of the static

code can be estimated.

– 3) Simulator ;• Either in source code level, • Or instruction code level (same as ‘Another

Processor’)

5

Power estimation of Software

• Simplest approach ; Energy consumption is proportional to the program execution time.

• Instruction set approach ; Energy consumption is different for each instruction class (class of similar power behavior), and each class of instruction pair (inter-instruction dependency).– Measurement done by running long loops of

the same instruction

6

Power estimation of Software

• Becomes more difficult with more complex processor (multi-thread, out-of-order execution,…) and memory system architecture (cache..)

• Accurate estimation requires software profiling on ISS with bus access pattern.

• A 5% accurate estimation model developed for ARM processor [DAC 99, Simunic, T;Cycle-accurate simulation of energy consumption in embedded systems]

7

Algorithmic-level power estimation

• Algorithmic-level power estimation consists of – Architecture estimation– Activation estimation– Power model evaluation

• Architecture estimation by High-Level Synthesis (HLS)– Allocation, Scheduling, and Binding (Allocation in narrow

sense is ‘unit selection’, where each operation can be performed by more than one unit.)

– Allocation and Scheduling affect each other.

• HLS considering communication (interconnect) – ASB + floorplanning– Cycle time violation check based on wire delay (based on

wire length estimation)

• (HLS considering interconnect) and power

8

Target architecture of HLS

• Target architecture of HLS– Datapath <- dataflow of CDFG– Controller <- dataflow and control flow– Clock tree

9

Target architecture of HLS• Architecture synthesis =

– Schedule the operations under timing & resource constraints, and

– Allocate the required resources (operation units)

• Operation unit can be arithmetic module, logic module or memory module.

– Output of architecture synthesis is• A set of operation units• Registers• Steering logic to transfer data between operation

units and registers, and• Controller having control signals to steer MUX, OU

and Enable signal of registers

• How to integrate power optimization into HLS?

10

RTL Power Modeling

• RTL Power Modeling = Constructing a model Power=P(X1,X2,…Xn) from n model parameters

11

Issues of RTL power modeling

• Granularity ;• Choice of model parameters ;

– Activity model or complexity model or both?

• Semantic of the model ; – cumulative or cycle-accurate?

• How to build and store the model ;– Top-down or bottom-up?– Table or equation?

12

Model granularity

• Model granularity ; – Should not be too big;

• E.g., single monolithic model is too time-consuming to build, inaccurate, and inflexible

– Not too small;– FSMD (FSM with datapath) is a reasonable

choice, as RTL design is an interaction of datapath and controller

• Five main components ;– Controller– Register file– Bus– Memory– Functional blocks

13

14

Activity model or Complexity model, or both?

• Model Parameters ;– What parameters are to be included in the

model?– Model parameters must be observable at

the RTL• P total = k AiCi ; Power model decoupled into two

separate models, i.e., activity model and capacitance model

• Activity model or Complexity model, or both?– Complexity model can be just capacitance

model or include transistor count as well to account for the leakage current.

15

Activity parameters• RTL activity : an approximation of all intra-clock

cycle activities projected to the relevant clock transition point.

• Main parameters are static and transition probabilities– Choose between bit-wise and word-wise probability

according to the desired accuracy and speed• n-input, m-output component has (n+m) bitwise parameters,

while has only two word-wise parameters

• Additional parameters;– Transition density ; average switching rate per second

• Includes non-periodic signals– Correlation measures ; useful for computing switching

power• Spatial correlation• Temporal correlation = transition probability

– Entropy ; somewhat similar to transition probability (2p(1-p)

• plog2(1/p)+(1-p)log2(1/(1-p))

16

Complexity parameters

• Capacitance ~ gate count, TR count,.• Only complexity parameters available at

RTL are– Width of a component ; # of inputs, outputs– # of states ; applicable for controller

• Architecture-specific model– k12N2 for NxN multiplier

– k2N for ripple carry adder

17

Model semantics

• Cumulative (average) vs. cycle-accurate ;– Cumulative power = summation of average

(cumulative) power over module– Cycle-accurate power = summation of power over

module for each clock cycle

• Cumulative power is only as good as tracking battery time, average heat dissipation, etc.

• Cycle-accurate power is needed for IP drop, noise, reliability (electromigration) analysis.

• Pseudo-cycle-accurate power estimation may be okay for dynamic power management.

18

How to build and store the model

• Model construction– Top-down ; good for

• When the implementation follows some predictable template, e.g., memory

• When dealing with a new circuit having no measured data available

– Bottom-up ;• Can be equation-based

– Template for the power model is given first,– Statistical techniques are used to fit the measured

values to the model by adjusting cofficients

• Model storage– Equation-based– Table-based

19

Accuracy issue

• Metric ; E = lPe-Pl/max(Pe,P)

• Average error• Standard deviation

20

Macro modeling flow

1. Choose model parameters- Ex) Average switching activity of inputs and/or outputs

2. Design training set– Good coverage, unbiasedness, resembling actual

circumstantial conditions

3. Characterization– Running the power-accurate lower-level simulator

• For example, for RTL training, run a gate-level simulator with good coverage of input/output switching activities

4. Model extraction• For Equation-based, run LMS regression engine• For table-based, merge entries according to the

available table space

21

22

C. Piguet, ‘Low Power CMOS ; technology, logic design and CAD tools’ CRC until

10/25• 9/4; introduction• 9/18; physics and limits of power dissipation in CMOS (2,3)• 9/20; system-level power analysis and estimation (18)• 9/25; algorithmic and RTL power estimation (18,19)• 9/27, 10/2, 10/4, 10/9, 10/11 ; synthesis for low power circuits

and logic blocks (7,8,9,10,13)• 10/16 ; driving interconnects for low power (14)• 10/18 ; new device candidates (4,5)• 10/23; ultimate low power logic (16)• 10/25; robustness of low power logic (17)• 10/30; low power memory• 11/1; software for low power• 11/6; energy recovery circuits• 11/8; adaptive power supply systems• 11/13,15 ; student presentation• 11/20, 22; low power DSP• 11/27 low power design methodology• 11/29, 12/4,6,11,13 ; student presentation

Documents

1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung