Upload
zariel
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Partition-Driven Placement with Simultaneous Level Processing and Global Net Views. K. Zhong and S. Dutt Department of Electrical Engineering and Computer Science, University of Illinois at Chicago. Zhong & Dutt, UIC, Nov. 2000. Overview. Problem Previous Work - PowerPoint PPT Presentation
Citation preview
Partition-Driven Placement with Partition-Driven Placement with Simultaneous Level Processing and Simultaneous Level Processing and
Global Net ViewsGlobal Net Views
K. Zhong and S. DuttDepartment of Electrical Engineering and
Computer Science, University of Illinois at Chicago
Zhong & Dutt, UIC, Nov. 2000
Overview
• Problem
• Previous Work
• New Partition-Driven Placement Algorithm (SPADE)
• Experimental Evaluation
• Conclusions and Future Work
Zhong & Dutt, UIC, Nov. 2000
Problem
• Placement for Deep Sub-Micron (DSM)– Very large input size (up to tens of millions)
– More optimization objectives (area, delay, power)
– Various heterogeneous constraints (congestion, crosstalk, heat distribution, etc.)
Zhong & Dutt, UIC, Nov. 2000
• Three mainstream placement approaches
• Partition-Driven Placement (PDP) (e.g. [Breuer, DAC ‘77], [Huang et al, ISPD ‘97])
•Simulated Annealing (SA) (e.g. [Sun et al, TCAD ‘95])
• Mathematical programming (e.g. [Eisenmann et al, DAC ‘98])
• Global and detailed placement
• NRG [Wang et al, ICCAD ‘97], Snap-On [Yang et al, ISPD ‘00], etc.
Major Approaches to Placement
Zhong & Dutt, UIC, Nov. 2000
Advantages of PDP• Time-efficient
• divide-and-conquer approach
• Balanced decision with a global view• top-down placement flow
• Can tackle almost any objective function accurately (up to interconnect length model)
• delay, WL, power (in iterative improvement, update cost per move)
• Flexibility in tackling multiple constraints• iterative improvement---check per move
Zhong & Dutt, UIC, Nov. 2000
Previous PDP Work• Sequential level partitioning [Breuer, DAC ‘77]
– regions at the same level are cut sequentially
– may result in sub-optimal wire-length or cutsize
• Terminal propagation [Dunlop et al, TCAD ‘85]
– addresses external connections during partitioning
• Quadrisection [Suaris et al, TCAS ‘88; Huang et al, ISPD ‘97]
– 4-way partitioning better controls wire length in both directions, but run time goes up
Zhong & Dutt, UIC, Nov. 2000
New PDP Techniques--- Rectify Drawbacks of Prior PDP
• Placer SPADE (Simultaneous level PArtitioning with Distributed nEt views)
• Simultaneous Level Partitioning (SLP)---rectifies prior drawback of sequentially-ordered optimization
• Global net views---rectifies prior drawback of localized subcircuit views and cost + inaccuracy of Term. Prop.
• Wire-length based gain computation---rectifies prior drawback of mincut-based gain (not strictly WL)
• Modified CLIP-FM partitioner [Dutt et al, ICCAD ‘96]
• Maximum row length control
• Post-processing (cell swaps)Zhong & Dutt, UIC, Nov. 2000
Simultaneous Level Partitioning
• Simultaneous partitioning of all regions within the same level
• Cell moves are naturally interleaved across all regions based on gains (as shown in the figure)
• Achieves simultaneous optimization across multiple regions
1
2
1
2
3
4
Zhong & Dutt, UIC, Nov. 2000
SLP vs. Sequential Level Partitioning• Sequential level partitioning may not be able
to escape local optima
3
4
11
SLP: only the cell in lower region moved
(1)u
v
u
New Cost = 1
Sequential: sub-optimal move sequence, if upper
region processed first
11
3
4
u
v
(1)
(2)u
v
3
4
New Cost = 3
Initial partitioning: nets labeled with weights
11
cells
pads
3
Orig Cost=8
u
v
4
3
4
Zhong & Dutt, UIC, Nov. 2000
Global Net View vs. Terminal Propagation
• Terminal propagation may be inaccurate for wire length reduction
• With a global net view we can do better (e.g., moving left is better in the figure shown as it can shrink the BB, while the right move expands BB)
Dummy
Possible moves: dummy position does not help
Zhong & Dutt, UIC, Nov. 2000
De-coupled Regions: a Caveat• Suitable for row-based designs• Property: For a hor. cut, WL
change due to cell moves in regions in one side of the previous-level cutline does not affect WL of the subcircuits in regions on the other side
• Sequential partitioning of regions separated by previous-level horizontal cutlines justified
• Reduced run time at NO cost of wire length
Two segments can be shrunk separately; Regions spanning
cutline c is de-coupled from those spanning c’ by previous cutline d
c
c’
d
Zhong & Dutt, UIC, Nov. 2000
Wire-length Based Gain
• Pin coordinates (x or y) of each net along the direction orthogonal to current cutline are stored in a binary search tree
• SPADE-FM: A cell move can have non-zero gain only when it changes global bounding-boxes of connected nets
Zhong & Dutt, UIC, Nov. 2000
Illustration of Gain Computation
SPADE-FM: gain(u) = gain(w) = 0; since neither move can change bounding box by itself; only gain(v)=5L is positive and
all others have gain zero as “internal” nodes.
u
SPADE-PROP: gain(u) = (d'-d)•p(u)•p(w)/p(u) + (d'' - d')•p(x), where p(y) is the probability of y. The gain is of two parts: single-step
PROP gain of moving u and w, and multi-step gain for moving cells not on the boundary of BB (e.g., x) from same side as u.
v
w
3L
8L
g(v)=5L
xd
d
d'
d''
u
Zhong & Dutt, UIC, Nov. 2000
Global Gain Update
• Every move may entail out-of-region update of cell gains
• Total time taken for such update per pass is bounded by O(p*log(p)), where p is the pin number
cell move
Gain update needed
1 0 10
Zhong & Dutt, UIC, Nov. 2000
Maximum Row Length Control• A decisive factor in die-area utilization• Gradually increase row-balance deviations w/ partitioning
tree levels to max allowable– cannot use the prescribed max. row-length devn, as it can
freeze moves for future cuts (see figure below)
• Row devn assigned inversely proportional to logarithm of # of rows of target regions
Initial devn set as max allowed value
Max devn reached, further partitioning badly hampered
Devn
avail.
Zhong & Dutt, UIC, Nov. 2000
Local Region Balance Control• Relaxed local balance but strict row-balance control
• Local Deviation (from closest possible balance to 50-50) = Row Deviation overconstrains the problem
• Allow Local Deviation = (Row Deviation), > 1, but maintain overall row deviation
Zhong & Dutt, UIC, Nov. 2000
Circuit Partitioning Engine• CLIP-FM variation (SHRINK-FM) or SHRINK-PROP
algorithm at the core– shrinking initial gain helps cluster removal– iterative mode: shrink factor gradually enlarged to get
independent gains after most clusters are removed through earlier passes
• Two-level gain tree structure– local binary search tree for each region– top-gain cells of local trees sorted into global tree
• Efficient global cell selection strategy– row-balance violation: search opposite global tree– local violation: switch to opposite local tree– tie-breaking: following latest move
Zhong & Dutt, UIC, Nov. 2000
Post-processing• Intra-row horizontal neighbor swap• Intra-row clustering based on int/ext nets ratio• Inter-row vertical swap
– some cells have to be shifted due to cell overlap• Results in about 1-2% improvement
Horizontal neighbor swap Vertical cell swap
Zhong & Dutt, UIC, Nov. 2000
Experimental Evaluation
• MCNC standard cell benchmarks: up to 100k cells• Compared with prior methods
– TimberWolf 7.0 [Sun et al, TCAD ‘95]– FD-98 [Eisenmann et al, DAC ‘98]– QUAD [Huang et al, ISPD ‘97]– Snap-On [Yang et al, ISPD ‘00]
• Same number of rows as TimberWolf 7.0• Part of IBM-PLACE circuits also tested (ibm11 -
ibm15) and compared to iTools [internetCAD]• Experiments conducted on 550 MHz Pentium-III
Linux workstations
Zhong & Dutt, UIC, Nov. 2000
Comparison with Previous Methods
Circuit SPADE-FM TW 7.0 FD-98 QUAD Snap-On SPADE-PROPprimary1 0.74 0.83 0.87 0.9 0.95 0.74struct 0.291 0.338 0.378 0.285primary2 3.13 3.53 3.72 3.68 3.66 3.07biomed 1.43 1.61 1.78 1.84 1.38industry2 11.9 13.3 14.6 14.48 12.07industry3 35.37 41.53 45.1 44.7 35.09avqsmall 5.59 5.08 4.91 6.29 5.15 5.31avqlarge 6.16 5.65 5.38 6.59 5.21 5.61golem3 19.84 22.6 19.64Total (8/8 ckts) 84.16 / 64.61 94.13 / / 76.70 82.91/63.56Total (5/7 ckts) 15.94 / 64.32 17.84 / / 75.99 15.02/63.27SPADE-FM imprv. 10.60% 15.80% 10.70% 15.30%SPADE-PROP imprv. 11.92% 17.13% 15.81% 16.74%run time (8 ckts) 15001 7173 57920 18108run time (6 ckts) 14710 19034 18071scaled time ratio 1 0.69 0.26 1.16 1.21
SLP vs Seq. SPADE-FM Sequentail WL imprv.Total WL (6 ckts) 52.86 65.57 19.38%Total time (6 ckts) 7052 1719
Zhong & Dutt, UIC, Nov. 2000
• Results for IBM-PLACE Benchmarks
Circuit SPADE-FM SPADE-PROP iToolsibm11 37.27 36.28 39.76ibm12 66.52 64.92 69.56ibm13 42.94 42.4 49.11ibm14 121.38 121.17 118.8ibm15 134.68 130.45 130.6Total WL 402.79 395.22 407.83imprv. vs. itools 1.24% 3.10%
Other Experimental Results
Trade-off SPADE-FM/8 SPADE-FM/16 Best WL 16 vs 8Total WL 89.65 84.45 82.87 5.81%Total time 29117 37738 1.3 x
• Trade-off between run time and solution quality of SPADE-FM with 8 and 16 runs for the MCNC suite
Zhong & Dutt, UIC, Nov. 2000
Conclusions and Future Work• Introduced novel concepts of:
– SLP– global net view– bounding-box based gain computation
• PDP alone can be competitive (in fact better)– up to 15.8% better in aggregate result than s-of-art– among large circuits:
• best-known result for largest MCNC ckt - golem3• best-known results for ibm11-ibm13
• Run time reasonable, but can be reduced– early-stop per pass– multilevel clustering
• On-going work– timing-driven PDP– multi-constraint PDP (congestion, thermal distr, mult obj)
Zhong & Dutt, UIC, Nov. 2000