Upload
aleesha-atkinson
View
213
Download
0
Embed Size (px)
Citation preview
1
Power estimation in the algorithmic and register-
transfer level
September 25, 2006Chong-Min Kyung
2
Software power analysis
• Objective ; – Compare different programs– Select processors– Optimize software
• Three level of granularity, (acc. to execution speed, availability & accuracy)– Source code level– Instruction level– BFM (Bus Function Model) level
3
• Execution performed on– 1) Target processor ; Compile source code &
run• measure the heat generated to estimate the
power?• Or monitor (with inserted monitoring instructions,
or some hardware, both with hopefully negligible overhead/disturbance on the power and speed) to count the occurrence of each instruction and compute the total estimated power?
• Dynamic code can be also handled.• Minimal disturbance of the overhead code is the
key to accuracy
4
• Execution performed on– 2) Another processor ; Run a program
estimating the power consumption with the target-compiled code as input data. • Only the power consumption of the static
code can be estimated.
– 3) Simulator ;• Either in source code level, • Or instruction code level (same as ‘Another
Processor’)
5
Power estimation of Software
• Simplest approach ; Energy consumption is proportional to the program execution time.
• Instruction set approach ; Energy consumption is different for each instruction class (class of similar power behavior), and each class of instruction pair (inter-instruction dependency).– Measurement done by running long loops of
the same instruction
6
Power estimation of Software
• Becomes more difficult with more complex processor (multi-thread, out-of-order execution,…) and memory system architecture (cache..)
• Accurate estimation requires software profiling on ISS with bus access pattern.
• A 5% accurate estimation model developed for ARM processor [DAC 99, Simunic, T;Cycle-accurate simulation of energy consumption in embedded systems]
7
Algorithmic-level power estimation
• Algorithmic-level power estimation consists of – Architecture estimation– Activation estimation– Power model evaluation
• Architecture estimation by High-Level Synthesis (HLS)– Allocation, Scheduling, and Binding (Allocation in narrow
sense is ‘unit selection’, where each operation can be performed by more than one unit.)
– Allocation and Scheduling affect each other.
• HLS considering communication (interconnect) – ASB + floorplanning– Cycle time violation check based on wire delay (based on
wire length estimation)
• (HLS considering interconnect) and power
8
Target architecture of HLS
• Target architecture of HLS– Datapath <- dataflow of CDFG– Controller <- dataflow and control flow– Clock tree
9
Target architecture of HLS• Architecture synthesis =
– Schedule the operations under timing & resource constraints, and
– Allocate the required resources (operation units)
• Operation unit can be arithmetic module, logic module or memory module.
– Output of architecture synthesis is• A set of operation units• Registers• Steering logic to transfer data between operation
units and registers, and• Controller having control signals to steer MUX, OU
and Enable signal of registers
• How to integrate power optimization into HLS?
10
RTL Power Modeling
• RTL Power Modeling = Constructing a model Power=P(X1,X2,…Xn) from n model parameters
11
Issues of RTL power modeling
• Granularity ;• Choice of model parameters ;
– Activity model or complexity model or both?
• Semantic of the model ; – cumulative or cycle-accurate?
• How to build and store the model ;– Top-down or bottom-up?– Table or equation?
12
Model granularity
• Model granularity ; – Should not be too big;
• E.g., single monolithic model is too time-consuming to build, inaccurate, and inflexible
– Not too small;– FSMD (FSM with datapath) is a reasonable
choice, as RTL design is an interaction of datapath and controller
• Five main components ;– Controller– Register file– Bus– Memory– Functional blocks
13
14
Activity model or Complexity model, or both?
• Model Parameters ;– What parameters are to be included in the
model?– Model parameters must be observable at
the RTL• P total = k AiCi ; Power model decoupled into two
separate models, i.e., activity model and capacitance model
• Activity model or Complexity model, or both?– Complexity model can be just capacitance
model or include transistor count as well to account for the leakage current.
15
Activity parameters• RTL activity : an approximation of all intra-clock
cycle activities projected to the relevant clock transition point.
• Main parameters are static and transition probabilities– Choose between bit-wise and word-wise probability
according to the desired accuracy and speed• n-input, m-output component has (n+m) bitwise parameters,
while has only two word-wise parameters
• Additional parameters;– Transition density ; average switching rate per second
• Includes non-periodic signals– Correlation measures ; useful for computing switching
power• Spatial correlation• Temporal correlation = transition probability
– Entropy ; somewhat similar to transition probability (2p(1-p)
• plog2(1/p)+(1-p)log2(1/(1-p))
16
Complexity parameters
• Capacitance ~ gate count, TR count,.• Only complexity parameters available at
RTL are– Width of a component ; # of inputs, outputs– # of states ; applicable for controller
• Architecture-specific model– k12N2 for NxN multiplier
– k2N for ripple carry adder
17
Model semantics
• Cumulative (average) vs. cycle-accurate ;– Cumulative power = summation of average
(cumulative) power over module– Cycle-accurate power = summation of power over
module for each clock cycle
• Cumulative power is only as good as tracking battery time, average heat dissipation, etc.
• Cycle-accurate power is needed for IP drop, noise, reliability (electromigration) analysis.
• Pseudo-cycle-accurate power estimation may be okay for dynamic power management.
18
How to build and store the model
• Model construction– Top-down ; good for
• When the implementation follows some predictable template, e.g., memory
• When dealing with a new circuit having no measured data available
– Bottom-up ;• Can be equation-based
– Template for the power model is given first,– Statistical techniques are used to fit the measured
values to the model by adjusting cofficients
• Model storage– Equation-based– Table-based
19
Accuracy issue
• Metric ; E = lPe-Pl/max(Pe,P)
• Average error• Standard deviation
20
Macro modeling flow
1. Choose model parameters- Ex) Average switching activity of inputs and/or outputs
2. Design training set– Good coverage, unbiasedness, resembling actual
circumstantial conditions
3. Characterization– Running the power-accurate lower-level simulator
• For example, for RTL training, run a gate-level simulator with good coverage of input/output switching activities
4. Model extraction• For Equation-based, run LMS regression engine• For table-based, merge entries according to the
available table space
21
22
C. Piguet, ‘Low Power CMOS ; technology, logic design and CAD tools’ CRC until
10/25• 9/4; introduction• 9/18; physics and limits of power dissipation in CMOS (2,3)• 9/20; system-level power analysis and estimation (18)• 9/25; algorithmic and RTL power estimation (18,19)• 9/27, 10/2, 10/4, 10/9, 10/11 ; synthesis for low power circuits
and logic blocks (7,8,9,10,13)• 10/16 ; driving interconnects for low power (14)• 10/18 ; new device candidates (4,5)• 10/23; ultimate low power logic (16)• 10/25; robustness of low power logic (17)• 10/30; low power memory• 11/1; software for low power• 11/6; energy recovery circuits• 11/8; adaptive power supply systems• 11/13,15 ; student presentation• 11/20, 22; low power DSP• 11/27 low power design methodology• 11/29, 12/4,6,11,13 ; student presentation