APLACE: A General and Extensible Large-Scale Placer
Andrew B. Kahng Sherief Reda Qinke Wang
VLSICAD lab
University of CA, San Diego
Goals and Plan
Goals:• Scalable and robust implementation • Leave no stone unturned / QOR on the table• Leave nothing for competitors
Plan and Schedule:• Use APlace as an initial framework• One month for coding + one month for tuning
Implementation Framework
APlace weaknesses:• Weak clustering• Poor legalization / detailed placement
Clustering
Adaptive APlace engine
WS arrangement
Cell order polishing
Unclustering
Global moving
Legalization
GlobalPhase
DetailedPhase
New APlace Flow
New APlace:1. New clustering2. Adaptive parameter
setting for scalability3. New legalization +
iterative detailed placement
Clustering/Unclustering A multi-level paradigm with clustering ratio = 10
Top-level clusters 2000
Similar in spirit to [HuM04] and [AlpertKNRV05]
For each clustering level:
Algorithm Sketch
Calculate the clustering score of each node to its neighbors based on the number of connections Sort all scores and process nodes in order as long as cluster size upper bounds are not violated If a node’s score needs updating then update score and insert in order
Adaptive Tuning / Legalization
Adaptive Parameterization:
Legalization:1. Sort all cells from left to right: move each cell in order
(or a group of cells) to the closest legal position(s)2. Sort all cells from right to left: move each cell in order
(or a group of cells) to the closest legal position(s)3. Pick the best of (1) and (2)
1. Automatically decide the initial weight for the wirelength objective according to the gradients
2. Decrease wirelength weight based on the current placement process
Whitespace Compaction: For each layout row:
Optimally arrange whitespace to minimize wirelength while maintaining relative cell order. [KahngTZ99], [KahngRM04].
Cell Order Polishing: For a window of neighboring cells
Optimally arrange cell orders and whitespace to minimize wirelength
Detailed Placement
Global Moving:
Optimally move a cell to a better available position to minimize wirelength
Parameterization and ParallelizingTuning Knobs:
Clustering ratio, # top-level clusters, cluster area constraints Initial wirelength weight, wirelength weight reduction ratio Max # CG iterations for each wirelength weight Target placement discrepancy Detailed placement parameters, etc.
Resources: SDSC ROCKS Cluster: 8 Xeon CPUs at 2.8GHz Michigan Prof. Sylvester’s Group: 8 various CPUs UCSD FWGrid: 60 Opteron CPUs at 1.6GHz UCSD VLSICAD Group: 8 Xeon CPUs at 2.4GHz
Wirelength Improvement after Tuning : 2-3%
Artificial Benchmark Synthesis
Created a number of artificial benchmarks to test code scalability and performance
Used statistics of benchmarks to create synthesized versions of bigblue3 and bigblue4
Mimicked fixed blocks layout diagrams in the artificial benchmark synthesis
Proved useful since we identified a problem in clustering if there are many fixed blocks
Results
CircuitGP
HPWLLeg
HPWLDP
HPWL CPU (h)
adaptec1 80.20 81.80 79.50 3
adaptec2 84.70 92.18 87.31 3
adaptec3 218.00 230.00 218.00 10
adaptec4 182.90 194.75 187.71 13
bigblue1 93.67 97.85 94.64 5
bigblue2 140.68 147.85 143.80 12
bigblue3 357.28 407.09 357.89 22
bigblue4 813.91 868.07 833.21 50