25
1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering Department King Fahd University of Petroleum & Minerals

1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

1

Enhancing Performance of Iterative Heuristics

for VLSI Netlist Partitioning

Dr. Sadiq M. Sait

Dr. Aiman El-Maleh

Mr. Raslan Al Abaji.

Computer Engineering Department

King Fahd University of Petroleum & Minerals

2

• Introduction

• Problem Formulation

• Cost Functions

• PowerFM

• Experimental Results

• Conclusion

Outline ….

3

Technology 0.1 umTransistors 200 MLogic gates 40 MSize 520 mm2

Clock 2 - 3.5 GHzChip I/O’s 4,000Wiring levels 7 - 8Voltage 0.9 - 1.2Power 160 WattsSupply current ~160 Amps

PerformancePower consumptionNoise immunityAreaCostTime-to-market

Tradeoffs!!!

The VLSI Chip in 2006

4

• Decomposition of a complex system into smaller subsystems

• Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach)

• Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers

• Decomposition scheme has to minimize the interconnections between subsystems

Why we need Partitioning ?

5

System Level Partitioning

Board Level Partitioning

Chip Level Partitioning

System

PCBs

Chips

Sub-circuits/Blocks

Levels of Partitioning

6

Partitioning Algorithms

Group Migration Simulation Based IterativePerformance

Driven

1. Kernighan-Lin

2. Fiduccia-Mattheyeses (FM)

3. Multilevel K-way Partitioning

Others

1. Simulated annealing

2. Simulated evolution

3. Tabu Search

4. Genetic

1. Lawler et al.

2. Vaishnav

3. choi et al.

4. jun’ichiro et al.

1. Spectral

2. Multilevel Spectral

Classification of Partitioning Algorithms

7

Objective: Design a class of iterative algorithms for VLSI multi objective partitioning optimizing Power AND Delay AND Cutset

Constraint: Balanced partitions to a certain tolerance degree (10%)

Problem formulation

8

• Based on hypergraph model H = (V, E)

• c(e) = 1 if e spans more than 1 block

• Cutset = sum of hyperedge costs

cutset = 3

Cutset

9

Delay

• Gate delay: d(v)

• Constant inter-chip wire delay dc :

• Path delay between nodes vi and vj as d(pij)

• Number of nodes cut along path pij as ncut(pij)

• Objective:

)(vddc

)(

iv

ij )()()d(p ij

pVijci pncutdvdMinimize

10

The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by:

iLoadi

cycle

ddaveragei NC

T

VP

2

5.0

Ni : is the number of output gate transitions per cycle ( switching Probability)

LoadiC : is the Load Capacitance

Power

11

extrai

basici

Loadi CCC basiciC : Load Capacitances driven by a cell

before Partitioning

extraiC : additional Load due to off chip

capacitance.( cut net)

ii

extrai

basici

cycle

dd NCCT

VP

2Total Power dissipation of a Circuit:

Power

12

vi

iNMinimizeobjective

:

basici

extrai CC

extraiC : Can be assumed identical for all nets

v :Set of Visible gates Driving a load outside the partition.

Power

13

The Balance as a constraint is expressed as follows:

However balance as a constraint is not appealing because it may prohibit lots of good moves.

Objective : |Cells(block1) – Cells(block2)|

Balance

)2()1(

)2()1(

BlockCellsBlockCells

BlockCellsBlockCells

14

• A good partitioning can be described by the following fuzzy rule

IF solution has

small cutset AND

low power AND

short delay AND

good Balance.

THEN it is a good solution

Fuzzy Cost Function

15

The above rule is translated to AND-like OWA

BDPC

BDPCx

4

11

,,,min)(

Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness.

)(x

BDPC ,,, Respectively (Cutset, Power, Delay , Balance ) Fitness.

Fuzzy cost function

16

Where Oi and Ci are lower bound and actual cost of objective “i”

i(x) is the membership of solution x in set “good ‘i’ ”

gi is the relative acceptance limit for each objective.

Membership functions

Start with a balanced partition P = {X, Y}.

Repeat

For i = 1 to n:

Choose a free cell b XY s.t. moving b to the other side gives the highest Power gain, Pgain(b), and moving b preserves balance in P.

Move and lock b.

Let gi = gain(b).

Find k s.t. G = g1 + g2 + ….. + gk is maximized and move the k cells to their complement partitions

Until G = 0

PowerFM- Algorithm

abc

def

ac

defb

locked

ac

df

be

ac f

be

d

g1

g2

g3

g4

An Example

19

c

f

be

dg5

a

fb

e

dg6

ac

be

d a

cf

If G = g1 + g2 + g3 + g4 is the largest partial sum,the final partition after this pass is:

cde

afb

An Example

20

Power Gain Calculation

i

K

jj

K

jj UjSXijSiPgain

11

)(

2

3

1

4

5

0.2

0.1

0.2

7

0.3

6

0.4

0.1

Partition 1 Partition 2

7.0)4.03.0(0)7( Pgain

1.001.0)1( Pgain

Xi: is the set of cut critical nets.

Ui: is the set of uncut critical net.

21

Experimental Results

ISCAS 85-89 Benchmark Circuits

22

PowerFM Vs SimE For Power

For bigger circuits the performance is degraded.

23

GA from PowerFM vs Random Start

D C P D C PS298 233 19 1013 191 10 921S386 356 36 1529 345 31 1401S641 1043 45 2355 861 43 2343S832 444 45 3034 441 42 3032S953 526 96 2916 465 89 3012S1196 396 123 5443 390 86 4921S1238 475 127 5713 461 91 5702S1488 571 104 5648 541 83 5248S1494 614 102 5474 601 97 5123S2081 302 26 787 260 15 740S3330 571 299 10358 435 203 9296S5378 587 573 18437 442 423 15356S9234 1313 1090 38149 856 375 28305s13207 1399 1683 45611 951 750 39620s15850 1820 2183 51747 1350 851 43680

GA Random Start GA Start From PowerFM

24

TS from PowerFM vs Random Start

D C P D C PS298 197 24 926 189 10 849S386 386 30 1426 333 27 1264S641 889 59 2281 844 48 2476S832 446 50 2731 431 40 3135S953 466 99 2518 430 85 2999S1196 301 106 4920 335 77 4823S1238 408 79 4597 401 74 5190S1488 528 98 5529 521 94 6005S1494 585 101 5339 534 95 5058S2081 225 17 770 244 12 704S3330 533 295 10298 419 257 9288S5378 590 430 16527 432 400 15319S9234 1052 918 34055 835 705 31837s13207 843 1332 41114 823 1310 40235s15850 1411 1671 47480 1210 1332 45320

TS Random Start TS Start From PowerFM

25

Conclusion

• Proposed a modification to the FM algorithm, PowerFM, targeting low power.

• PowerFM results are comparable to SimE but with a faster runtime.

• Investigated the use of PowerFM as a starting solution to iterative algorithms, GA and TS.

• GA performed significantly better when starting from PowerFM.