Upload
cc-yu
View
99
Download
2
Embed Size (px)
Citation preview
Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage Assignment
Chih-Chuan, Yu
Outline
• Introduction• Related Work• Problem Formulation• Proposed Methodology• Experimental Results• Conclusion and Future Work
2
Introduction
• Low Power and High Performance• Mobile device
• Leakage Power Rise• ITRS Roadmap 2009 [33]• Technology scales down
3
Leakage Power Minimization Methods
• Gate Sizing
• Threshold Voltage Assignment
• Low Vth on critical path• High Vth on non-critical path
4
Outline
• Introduction• Related Work• Problem Formulation• Proposed Methodology• Experimental Results• Conclusion and Future Work
5
Related Work
6
Continuous methods Discrete methods
• Linear Programming (LP)• Geometric programming
(GP)
• Sensitivity-based Approach• Slack and delay Budgeting• Dynamic Programming(DP)• Lagrangian Relaxation (LR)• Linear Programming (LP)• Simulated Annealing (SA)
Continuous Methods
• Linear Programming (LP)• Linear delay model• The selection of gates is defined as linear function
• Geometric programming (GP) • Polynomial delay model
7
Discrete Methods
• Sensitivity-based approach• Score and Rank gates according to a defined sensitivity• Iteratively select the best gate for optimization until no improvement can be
made• Slack and delay budgeting• Allocate a slack budget to each gate• Use the slack budget to trade the power for each gate.
• Dynamic Programming (DP)• Use decision stage and cost-to-go function.
8
Discrete Methods (cont.)
• Lagrangian Relaxation (LR)• Covert constrained problem to unconstrained one.• Lagrange multiplier
• Linear Programming (LP)• The selection of gates is implemented by assigning value to a binary variable:
1 is chosen and 0 otherwise. • Simulated Annealing (SA)• Probabilistic method for finding a good approximation to the global optimum
9
Related Work Comparison
Methodology Pros ConsContinuous
SizingLP
Fast
Modeling ErrorMapping IssueGP
DiscreteSizing
Sensitivity Local optimalSlack & Delay
Ignore delay interactionLPDP Solution space explosionLR Large scale Solution Oscillate
SA Global optimalApproximation
Fast solution space exploration
10
Outline
• Introduction• Related Work• Problem Formulation• Proposed Methodology• Experimental Results• Conclusion and Future Work
11
Motivational Example
12
Solution u1 u2 u3 TimingViolation
TotalLeakagePower
Solution 1 s10 s06 s04 -2.32 26
Solution 2 s10 s06 f04 0 86
Solution 3 s10 s06 m04 0 38
n2n1oa oa oa
n3 n4
50ps
u1 u2 u3
Problem Formulation
• Inputs:• Standard Cell Library• Gate-level Netlist• Timing Constraints• Interconnect Parasitics
• Outputs:• The selection of each cell’s sizes and threshold voltage
• Objective:• Satisfy all performance constraints• Minimize total leakage power
13
Performance constraints
• Slack violation: • At PO and DFF inputs, it exists negative slack.
• Slew(Transition time) violation: • At PO and cell input pins, the transition time is larger than the max limit
transition time.• Max-load violation: • At cell output pins, the fan-out load summation is larger than the cell’s max
capacitance.
14
Problem Assumptions
• Interconnect parasitics are modeled as lumped capacitance.• Sequential sizing is not allowed.• Only one selection for sequential cells.
• Ideal clock network• No clock buffer, zero skew, and clock net has zero lumped capacitance.
15
Outline
• Introduction• Related Work• Problem Formulation• Proposed Methodology• Experimental Results• Conclusion and Future Work
16
Proposed Methodology
• Phase I: Iterative Algorithm for Initial Solution• Initial solution that satisfies the timing requirement
• Phase II: Simulated-Annealing-Based Algorithm• Leakage power minimization
17
Phase I: Pseudo Code
Iterative Algorithm: upsize cells for feasible solutionInputs: netlist, cell library, timing constraints, and interconnect parasiticsOutputs: each cell’s size and threshold voltage assignmentStep 1: Count the visited times of the cells traced by negative-slack pathsStep 2: Sort by each cell counterStep 3: Iterative upsizing in above-defined order
18
Phase I: Pseudo Code (Step 1)
Step 1: Count the visited times of the cells traced by negative-slack pathsRun timing engine to calculate each cell’s slack;Initialize each cell’s counter to zero;Initialize each cell’s to smallest type-size;foreach (negative-slack paths)
foreach (cells in the selected path)if (selected cell has negative slack)
Increase selected cell’s counter;
19
Phase I: Pseudo Code (Step 2 & 3)
Step 2: Sort by each cell counterSort cell order by each cell’s counter, from larger to small;Step 3: Iterative upsizing in above-defined orderdo
foreach (cell from above-defined order)if (selected cell has negative slack)
while (selected cell has larger type-size)
if (new Pleakage < old Pleakage)Update type-size;
until (no negative slack)
20
Phase II: Simulated-Annealing-Based
1. Solution representation: • The set of size and type of each cell.
2. Solution perturbation: • Randomly pick a cell and change its size and threshold voltage assignment.
3. Cost function: • Total leakage power.
4. Annealing schedule: (next slide)
21
Phase II: SA — Temperature check
22
IF T > εTHEN NEXT_ITER
ELSE THEN FINISHED
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update T?
Yes
No
Yes
Yes
No
No
Phase II: SA — New solution
23
1. Randomly pick cell2. Randomly pick new type
and size3. Call timer and Recalculate
cost
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update T?
Yes
No
Yes
Yes
No
No
Phase II: SA — Solution acceptance
24
IF Cnew < Clast IF Cnew < Cbest THEN state = UPDELSE state = NEW
ELSE IF A.Prob. > Random THEN state = ACPELSE state = REJ
0,1 exp Prob. Accept. *TKC
old
oldnew
CCCC )(
1,0Random
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update T?
Yes
No
Yes
Yes
No
No
Phase II: SA — Solution update
25
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update T?
Yes
No
Yes
Yes
No
No
IF state = UPD or NEW or ACP
THEN Slast = Snew
ELSETHEN Slast = Slast
Phase II: SA — Temperature update
26
IF γ > φTHEN DROP_TEMP
ELSE THEN NEXT_ITER
γ is the counter of successive state “Reject” φ is a constant variable
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update T?
Yes
No
Yes
Yes
No
No
Outline
• Introduction• Related Work• Problem Formulation• Proposed Methodology• Experimental Results• Conclusion and Future Work
27
Experimental Results
• Experimental Setting• Standard Library• Timing Engine• Acceptance Probability• Benchmark
• The Trend of Leakage Power Minimization• Cost Comparison
28
Standard Library
• Cell Library in Synopsys Liberty format• Combinational cells:• 11 Footprints:
• in01, na02, na03, na04, no02, no03, no04, ao12, ao22, oa12 and oa22• Each cell has 30 options
• 3 threshold voltage type and 10 gate size
• Sequential cells:• 1 Footprints: ms08
29
Power, Capacitance, & Delay LUBs
30
Footprint: in01
Leakage Power(uW)
Capacitance(fF)
Delay Time (ps)
Vt Type
Gate Sizes m f s m f s m f
1 1 4 16 12.8 14.4 16 11.7 10.7 9.13 3 12 48 38.4 43.2 48 8.2 7.2 6.54 4 16 64 51.2 57.6 64 6.5 5.7 5.26 6 24 96 76.8 86.4 96 6.5 5.7 5.28 8 32 128 102.4 115.2 128 6.5 5.7 5.2
Delay time Look-Up Table
• Delay time = f(input slew, output load)• 2D Linear Interpolation
31
Slew(ps)Loads (fF) 5 10 15 20 25 30
0 6.5 7.6 8.8 10.0 11.1 12.3 1 7.8 9.0 10.2 11.4 12.6 13.8 2 9.1 10.3 11.5 12.8 14.0 15.2 3 10.4 11.7 12.9 14.2 15.5 16.7
Timing Engine
Runtime(second/iteration) PrimeTime® Full
Functional Timer
Incremental Update and Full Functional Timer
DMA 10.00 1.50 0.00087 pci_bridge32 11.00 1.90 0.00096 des_perf 33.00 4.60 0.00067 vga_lcd 44.00 6.80 0.00208 b19 63.00 9.70 0.00238 leon3mp 375.00 26.30 0.01582 netcard 393.00 38.40 0.06694
32
Acceptance Probability
0 0.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.80.850.90.9510
0.2
0.4
0.6
0.8
1
ΔC
Acc
epta
nce
Prob
abili
ty
33
High K
Low K
T*KC
exp Prob. Accept.
oldC
oldCnewCC
)(
Benchmark
Design # IO pins # Comb cells # Seq Cells # Total CellsDMA 959 23K 2K 25Kpci_bridge32 361 30K 3K 33Kdes_perf 374 102K 9K 111Kvga_lcd 184 148K 17K 165Kb19 47 213K 7K 219Kleon3mp 333 540K 109K 649Knetcard 1,846 861K 98K 959K
34
The Trend of Leakage Power Minimization
35
1 91 1812713614515416317218119019910
200000400000600000800000
10000001200000
DMA
iteration*18K
leak
age
pow
er (μ
W)
1 109 217 325 433 541 649 757 865 97310810
400000
800000
1200000
1600000
pci_bridge32
iteration*16K
leak
age
pow
er (μ
W)
2 186 370 554 738 922 11061290147416580
1000000
2000000
3000000
4000000
des_perf
iteration*23K
leak
age
pow
er (μ
W)
1 109 217 325 433 541 649 757 865 97310810
200000400000600000800000
1000000120000014000001600000
vga_lcd
iteration*15K
leak
age
pow
er (μ
W)
The Trend of Leakage Power Minimization (cont.)
36
1 79 1572353133914695476257037818599370
200000400000600000800000
1000000120000014000001600000
b19
iteration*16K
leak
age
pow
er (μ
W)
1 9 17 25 33 41 49 57 65 73 81 89 97 1050
100000020000003000000400000050000006000000
netcard
iteration*14K
leak
age
pow
er (μ
W)
1 28 55 82 1091361631902172442712983250
100000020000003000000400000050000006000000
leon3mp
iteration*15K
leak
age
pow
er (μ
W)
Cost Comparison
37
)35
#(*15K
gatesRounduphhRuntime
IR+SA
NTUgs
PowerValve
eOPT
0E+00 1E+06 2E+063.71E+05
1.54E+062.05E+05
1.58E+051.47E+05
2.15E+054.51E+05
3.68E+05
DMA
Total Leakage Power (μWatt)
IR+SA
NTUgs
PowerValve
eOPT
0E+00 1E+06 2E+063.51E+05
1.71E+062.03E+05
1.15E+051.16E+05
6.96E+052.26E+05
2.88E+05
pci_bridge32
Total Leakage Power (μWatt)
IR+SA
NTUgs
PowerValve
eOPT
0E+00 1E+06 2E+06 3E+06 4E+06 5E+06
1.54E+064.15E+06
6.74E+058.84E+05
6.97E+059.47E+05
2.28E+061.13E+06
des_perf
Total Leakage Power (μWatt)
IR+SA
NTUgs
PowerValve
eOPT
0E+00 1E+06 2E+06
4.00E+051.47E+06
4.15E+053.78E+053.91E+05
4.63E+056.44E+05
7.53E+05
vga_lcd
Total Leakage Power (μWatt)
↓ 73%
Cost Comparison (cont.)
38
IR+SA
NTUgs
PowerValve
eOPT
0E+00 1E+06 2E+06 3E+06 4E+06 5E+06 6E+067.32E+05
1.34E+066.27E+056.14E+057.36E+057.58E+058.62E+05
5.02E+06
b19
Total Leakage Power (μWatt)
IR+SA
NTUgs
PowerValve
eOPT
0E+00 1E+06 2E+06 3E+06 4E+06 5E+06 6E+06
3.90E+064.78E+06
1.77E+061.97E+061.94E+06
1.81E+062.10E+06
2.00E+06
netcard
Total Leakage Power (μWatt)
IR+SA
NTUgs
PowerValve
eOPT
0E+00 1E+06 2E+06 3E+06 4E+06 5E+06 6E+06
2.28E+065.40E+06
1.42E+061.79E+06
2.96E+061.47E+06
1.88E+061.92E+06
leon3mp
Total Leakage Power (μWatt)
Outline
• Introduction• Related Work• Problem Formulation• Proposed Methodology• Experimental Results• Conclusion and Future Work
39
Conclusion
• An iterative algorithm is the necessary to initialization. Without using it, the SA approach may not converge in fixed runtime.• Our approach can reach a feasible solution in the same magnitude of
related works in all benchmarks.• In some cases, our approach is resulted in a better solution than
previous work and reduce more than 70 % leakage power from initial solution in sharp time.
40
Future Work
• Much realistic RC network model• The leakage power minimization of the sequential circuit
41
Q&A
Thank you!
42