Implementation and Extensibility of an Analytic Placer Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, qiwang}@cs.ucsd.edu Work partially supported

Implementation and Extensibility of an Analytic Placer

Andrew B. Kahng and Qinke Wang

UCSD CSE Department{abk, qiwang}@cs.ucsd.edu

Work partially supported by Cadence Design Systems, Inc., the California MICRO program, the MARCO Gigascale Silicon Research Center, NSF

MIP-9901174 and the Semiconductor Research Corporation.

Motivation• Automated placement: always critical

– new challenges: larger design sizes, shorter turnaround times, a variety of additional physical and geometrical constraints, etc.

• New analytical methods: simultaneously spread cells and optimize wirelength– force-directed placement [Eisenmann et al. 98]– cell attracting and repelling (ARP)

[Etawil et al. 99]

• Problem: wirelength is easily damaged by improper forces and attractors

Our Contribution• A novel objective function for spreading cells is

proposed recently [Naylor et al. 01]• We implement an analytic placer (APlace)

based on this idea– study characteristics of the objective function– extend the objective function with congestion

information– implement a top-down multi-level placer:

WL outperforms that of QPlace, Dragon and Capo– extend the placer to perform I/O-core co-placement

for area-array I/O designs– extend the placer with constraint handling for mixed-

signal designs

Outline

• Problem Formulation

• Implementation & Results

• Extensions

• Conclusion and Ongoing Work

Outline

• Problem Formulation– cell spreading = density control– wirelength minimization


• Extensions


Cell Spreading (I)

• Common strategy– divide the placement area into grids– equalize the total cell area in each grid

• Penalty of an uneven cell distribution

– not smooth or differentiable– difficult to optimize

Cell Spreading (II)

• A bell-shaped cell potential function [Naylor et al. US Patent 2001]

• Cell c has potential(c, g) with respect to grid g

• Cell c at (CellX, CellY) has area A• Grid point g = (GridX, GridY)• p(d) : bell-shaped function • r : the radius of cells' potential • C : a proportionality factor, s.t.

r

1-2d2/r2

2(r-d)2/r2

r/2r/2r

d

p(d)

Cell Spreading (III)• Penalty function

– conjugate gradient solver– stop when max movement of

any cell between iterations is small

– Discrepancy(A) – max ratio of actual total cell

area to expected cell area over all windows with area A

– measure evenness of cell distribution

– disc = Discrepancy(1% area)

grids disc iter CPU (s) disc iter CPU (s)

10 1.656 50 56 1.890 183 39920 1.159 29 30 2.799 113 23730 1.121 46 47 1.162 46 10840 1.125 83 81 1.089 42 9250 1.079 235 206 1.128 54 11660 1.063 758 695 1.111 56 12370 - - - 1.077 83 19080 - - - 1.083 111 23590 - - - 1.083 167 343100 - - - 1.081 273 577

r = 2 r = 4

EXPERIMENT: Cell distribution results with different number of grids and cell potential radii (r's) for the ibm01-easy circuit.

Outline

• Problem Formulation– cell spreading = density control– wirelength minimization


• Extensions


Wirelength Formulation (I)

• Linear vs. quadratic objective functions

• Approximation of linear objectives– precise– continuously differentiable

• Previous works– Gordian-L objective [Sigl et al. 91]– α-order objective function [Lillis et al. 95]– convex approximations of HPWL

[Alpert et al. 98] [Baldick et al. 99] [Kennings and Markov 00]

Wirelength Formulation (II)

• Approximation of HPWL [Naylor et al. 01]

– log-sum-exp formula: pick the most dominant terms among pin coordinates

: smoothing parameter

Wirelength Formulation (III)

• Experiments– init HPWL = 7.311– 300 iterations– α smaller

wirelength formulation more accurate

– α larger WL minimized more quickly, and smaller final HPWL

EXPERIMENT: Wirelength minimization results with different smoothing parameters

(α's) for the ibm01-easy circuit.

grids alpha init WL final HPWL

10 3336 7.533 0.80320 1668 7.369 0.83630 1112 7.337 0.91340 834 7.326 1.27450 667 7.321 1.40060 556 7.318 1.49970 476 7.316 1.58380 417 7.315 1.65390 370 7.314 1.712

100 333.6 7.314 1.764

Outline


• Implementation & Results– Conjugate gradient optimizer– Control factors– Top-down hierarchical algorithm– Placement results

• Extensions


Conjugate Gradient Optimizer• A series of line minimizations

– one-dimensional function minimization along some search direction

• gk : the gradient f(xk)

• dk : the search direction

• sk : a step length obtained by a Golden Section search algorithm

k : ensures that dk is the conjugate direction when the function is quadratic and the line search finds the exact minimum along the direction

– Polak-Ribiere formula

Control Factors

• Weights of wirelength and density objectives– density weight

• larger: spread the cells out hastily without a good wirelength

– wirelength weight• larger: contract cells together and prevents them from spreading out

Control Factors

• Weights of wirelength and density objectives– density weight: fixed

• larger: spread the cells out hastily without a good wirelength

– wirelength weight• larger: contract cells together and prevents them from spreading out• set to be large in the beginning• divided by 2 when the solver slows down and an optimal solution appears

• repeat until cells are spread evenly over the placement area

• #grids– coarser grids at the beginning: spread out the cells faster– finer grids at the final stages: a more even distribution

Top-Down Multi-Level Algorithm

• A hierarchy of clusters– MLPart [Caldwell et al. 99]

• Coarse grid: average cluster size

• Density penalty– regard each cluster as a macro cell– area of the macro cell = total area of the cluster

• Wirelength– cells: at center of clusters

Discrepancy and Wirelength

Discrepancy as a function of iterations for the ibm01-easy circuit.

HPWL as a function of iterations for the ibm01-easy circuit.

0 500 1000 1500 2000 25000

5

10

15

20

Dis

cre

pa

ncy

Iteration

0 500 1000 1500 2000 25003x107

4x107

5x107

6x107

HP

WL

Iteration

Placement Process

• Iter 100– WL: 4.06E5– disc: 10.69

• Iter 200– WL: 5.05E5– disc: 4.17

• Iter 300– WL: 4.04E5– disc: 2.53

• Iter 400– WL: 4.31E5– disc: 1.86

Legalization

• A simple Tetris legalization algorithm

[Hill 02]– sort cells according to vertical coordinates– from left to right, search the current nearest

available position for each cell– fast– increases WL by 5% on average for IBM-

PLACE 2.0 circuits

• Orientation optimization and row ironing [UCLApack]

Placement Results

• Comparison (HPWL) – Cadence QPlace (SE5.4): 9.0% (4.5% ~ 12.7%)– UCLA Dragon (2002): 4.8% (-6.5% ~ 10.2%)– Capo (v8.7): 8.7% (5.7% ~ 11.4%)

• Comparison (Running Time)– Xeon server (2.4GHz CPU, double-threaded)– faster than Dragon (0.8X), much slower than Capo (13.2X)

Placement results of APlace for eight IBM-PLACE 2.0 circuits.

QPlace Dragon Capo ckts cells nets WL_l WL_l WL_l WL WL_l disc iter CPU (m)

ibm01_easy 12282 11507 0.59 0.57 0.57 0.48 0.52 1.19 1098 12.6ibm01_hard 12028 11507 0.56 0.55 0.56 0.46 0.50 1.18 1006 21.2ibm02_easy 19321 18429 1.56 1.60 1.60 1.41 1.45 1.12 1097 30.3ibm02_hard 19062 18429 1.52 1.47 1.56 1.38 1.44 1.11 1208 32.5ibm07_easy 45135 44394 3.72 3.66 3.71 3.17 3.29 1.14 968 63.8ibm07_hard 44811 44394 3.70 3.44 3.56 3.09 3.24 1.15 968 50.8ibm08_easy 50977 47944 3.95 3.61 3.93 3.51 3.65 1.11 887 75.4ibm08_hard 50672 47944 3.85 3.45 3.90 3.45 3.68 1.11 806 55.3

IBM-Place 2.0 Aplace 1.0

Outline



• Extensions– Congestion-directed placement– IO-core co-placement– Constraint handling


• Accurate bend-based congestion estimator [Kahng and Xu, SLIP-03]

• Congestion-directed placement

– ExpPotential(g) • expected total potential at grid point g• reduced, if g is congested

: congestion adjustment factor

Congestion-Directed Placement (I)

Congestion-Directed Placement (II)

• Experiments– routability

• WL in gcell grid• # over-capacity gcells

– routability 38% better with = 0.05

– routability deteriorates with larger

Placement and global routing results with varying congestion adjustment factors ( 's) for the ibm01-hard circuit.

WL WL_l disc Iter WL over-cap

0.00 0.459 0.502 1.18 1006 0.119 40350.02 0.462 0.505 1.18 997 0.118 34880.04 0.464 0.509 1.23 1006 0.119 32490.05 0.474 0.523 1.26 1086 0.120 24860.06 0.477 0.529 1.28 1086 0.121 25760.08 0.486 0.541 1.36 1006 0.123 28060.10 0.507 0.562 1.56 804 0.129 3350

Placement Global Routing

Experimental Results

• Comparison (Routed WL)– Cadence QPlace (SE5.4): 8.2% – UCLA Dragon (2002): 4.2% – Capo (v8.7): 10.4%

• With orientation optimization and row ironing – Cadence QPlace (SE5.4): 12.0% – UCLA Dragon (2002): 8.1% – Capo (v8.7): 14.1%

Placement and routing results of APlace for eight IBM-PLACE 2.0 circuits with comparison to QPlace, Dragon and Capo.

Placer PlacerWL CPU violations WL vias CPU WL CPU violations WL vias CPU

ibm01e QPlace 0.59 3 0 0.84 138563 58 ibm07e QPlace 3.72 12 0 4.61 572512 98Dragon 0.57 27 0 0.86 141304 60 Dragon 3.66 65 0 4.58 569087 103Capo 0.57 1 587 0.85 146706 1446* Capo 3.71 7 42 4.93 599806 996

APlace 0.53 23 0 0.75 139134 100 APlace 3.30 68 0 3.99 532963 74ibm01h QPlace 0.56 3 0 0.80 138593 82 ibm07h QPlace 3.70 12 0 5.04 617942 184

Dragon 0.55 26 0 0.80 139993 90 Dragon 3.44 66 15 4.63 606561 135Capo 0.56 1 1029 0.84 173715 1446* Capo 3.56 8 1799 5.14 631456 1483*

APlace 0.51 22 0 0.71 138745 82 APlace 3.26 55 0 4.10 547398 96

Ckts CktsRouting RoutingPlacementPlacement

I/O-Core Co-Placement• Peripheral I/O

– constrained clock/power distribution – coupling and power issues for off-chip signaling

• Area-array I/O– improved pad count and reliability– reduced noise coupling

• Simultaneous I/O and core placement – I/Os are spread over the placement area, in the

same way and at the same time as core cells– DensityWeight * DensityPenalty +

IODensityWeight * IODensityPenalty

I/O-Core Co-Placement Results

• Randomly select 400 or 1000 cells and regard them as I/Os

• I/Os: distributed fairly evenly • WL, disc of core cell distribution,

and running times: not seriously impaired

I/O-core co-placement results with different number of I/Os.

I/O-core co-placement with 400 I/Os for ibm01-easy circuit.

Ckls I/Os WL WL_l IO disc disc iter CPU (m)

ibm01e 0 0.48 0.52 - 1.19 1098 12.6400 0.48 0.52 1.50 1.24 1144 24.9

1000 0.50 0.54 1.34 1.31 1274 28.4ibm02e 0 1.41 1.45 - 1.12 1097 30.3

400 1.36 1.41 1.62 1.09 1107 30.21000 1.45 1.50 1.36 1.16 1279 34.4

ibm07e 0 3.17 3.29 - 1.14 968 63.8400 3.36 3.47 1.70 1.14 1049 63.5

1000 3.32 3.43 1.55 1.14 1049 58.8ibm08e 0 3.51 3.65 - 1.11 887 75.4

400 3.59 3.83 1.55 1.12 887 55.51000 3.55 3.73 1.20 1.11 887 61.0

IBM-Place v.2 APlace with IO-Core Co-Placement

Placement with Geometric Constraints

• Mixed-signal ASIC designs: parasitic effects– a large number of constraints

• Constraints in APlace: convert to penalty functions– alignment constraint, e.g.

– spacing constraint, e.g.

– axial symmetry, e.g.

– nodal symmetry, e.g.

Constraint Handling Results

• Average WL increase: 8.2%• Blue: Alignments• Red: Nodal Symmetries + Spacing• Black: Axial Symmetries + Spacing

Placement results of APlace with

90 artificial geometric constraints.

Placement of APlace with 90 artificial geometric constraints for ibm01-easy circuit.

IBM-DragonCkts WL WL_l disc iter CPU (m)

ibm01e 0.54 0.57 1.17 1309 57.3ibm02e 1.50 1.55 1.10 1208 70.2ibm07e 3.36 3.48 1.16 1049 122.8ibm08e 3.78 3.99 1.12 887 129.9

APlace with Constraints

Conclusion and Ongoing Work

• Implemented and conducted in-depth analysis of characteristics and results of APlace– placed and routed wirelengths outperform QPlace,

Capo and Dragon.

• Extended the basic formulation– top-down hierarchical placement, congestion-

directed placement, I/O-core co-placement, and constraint handling

• Ongoing work:– timing-driven placement– mixed-size placement

Prof. C.-K. Cheng, UCSDBo Yao, UCSDProf. Igor Markov, MichiganSaurabh Adya, MichiganShubhyant Chaturvedi, MichiganProf. C.-K. Koh, PurdueChen Li, PurdueProf. Andrew Kennings, Waterloo

Thanks

Thank You !

Documents

Implementation and Extensibility of an Analytic Placer Andrew B. Kahng and Qinke Wang UCSD CSE Department {abk, qiwang}@cs.ucsd.edu Work partially supported