Upload
terri
View
29
Download
0
Embed Size (px)
DESCRIPTION
Taming the Complexity of Coordinated Place and Route. EECS 527. Layout Synthesis and Optimization. Taming the Complexity of Coordinated Place and Route. By Jin Hu, Myung-Chul Kim and Igor Markov Presented By: Alvin Li. Taming the Complexity of Coordinated Place and Route. Introduction - PowerPoint PPT Presentation
Citation preview
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
Taming the Complexity of Coordinated Place and Route
By Jin Hu, Myung-Chul Kim and Igor Markov
Presented By: Alvin Li
EECS 527. Layout Synthesis and Optimization
Taming the Complexity of Coordinated Place and Route
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
Taming the Complexity of Coordinated Place and Route
1. Introduction
2. Background
3. LIRE: Routing Estimation
4. Congestion Relief
5. Coordinated Place and Route
6. Empirical Validation
7. Comparison to Prior Arts
8. Conclusions
2
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
1. Introduction
3
Interconnects
- 3 layers- Uniform pitch
- More than 3 layers- Non-uniform pitch
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
1. Introduction
•Interconnect complexities increased since 1980s• Increased to 9-12 layers(non-uniform pitch) from 3
• Longer routing times
• Lower quality of IC circuits
4
Interconnects (From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits)
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
1. Introduction
•Interconnects Dominate• IC Performance
• Power Dissipation
• Size
• Signal Integrity
5
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
1. Introduction: Significance of the Paper
•Global Placement & Global Routing• Standalone vs. integrated
- Signal integrity and coupling capacitances in interconnect
A set of individual optimizations or one simultaneous optimization?
•Streamlined System: Coordinated Place-and-Route(CoPR)•Routing estimation during placement
•Placement technique that addresses three types of routing congestion
•Interface to congestion elimination
6
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – Dijkstra’s Algorithm
Also known as Maze Routing
Finds shortest path from source node to target node
•Graph with non-negative edge
7
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – Dijkstra’s Algorithm
8
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – Dijkstra’s Algorithm
9
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – Dijkstra’s Algorithm
10
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – Dijkstra’s Algorithm
11
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – Dijkstra’s Algorithm
12
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – A* Search Algorithm
Extension of Dijkstra’s Algorithm, but faster Estimates distance to target Node priority:
Group 2 label in Dijkstra’s Algorithm
+
Distance estimate, including vias, to the target node
31 Nodes vs. 6 Nodes visited13
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Background – Key Characteristics of A* Search Algorithm
Characteristic Detail Effect
Captures Detours
Includes history cost and congestion
Speed
Priority Queue Selects the best path Complexity
Pointer-Based Algorithm Cache Miss
History Cost Used to determine optimal path along with congestion
Overshadows functions based on straight-line distance
Admissible Considers the fewest nodes Cannot leverage incrementality, no incremental improvement
14
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
2. Coordinated Place-and-Route
Proposed Improvement to A* Search Algorithm: Streamlined System: Coordinated Place-and-Route(CoPR)
• Cache-friendly routing primitives: estimate routing congestion
• Leverages incrementality in routing and congestions updates
• New categorization of congestion
• New congestion-relief techniques
15
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3. LIRE: Routing Estimation
Lightweight Incremental Routing Estimator
•Congestion maps like global router
•75K nets per second (can tradeoff between quality and run time)
16
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3. LIRE: Routing Estimation
17
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing
Traditional Global Routing: Maze Routing
•Priority queue complex and slow
•Large history based cost
•Lacks incrementality
Linear-time cache-friendly routing
•Avoid priority-queue-based approaches
•Avoid pointers to improve cache hit rate
Bellman-Ford Algorithm
18
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Bellman – Ford Algorithm(1958)
Slower than Dijkstra’s Algorithm
E * O(1) relaxation steps
Goes through all nodes
Relaxes all edges instead of greedily selecting minimum weight node not yet processed to relax
Calculates all path and repeat (N-1) times (N = number of vertices)
Visits nodes randomly
19
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Bellman – Ford Algorithm(1958)
20
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Monotonic Routing with One Linear-Time BF Pass
Consider only forward edges
Only consider the space bounded by S and T
Visit in order, going through each node once
runtime complexity is O(N)
(N = number of nodes in the space bounded by S and T)
21
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
Duplex-edge relaxation: relaxation in both directions
Echo-relaxation: propagate smaller cost through all recently relaxed edge incident to the point
Effective in detouring short nets (majority of nets are short)
22
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
23
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
24
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
25
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
26
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
27
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
28
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
Non-monotonic Routing with One Linear-Time BF Pass
3.1 Faster Routing – Bellman Ford Algorithm
29
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
30
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
31
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
32
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Non-monotonic Routing with One Linear-Time BF Pass
33
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing
Bellman-Ford with Yen’s improvement (1970)
•J.Y. Yen suggested reversing the node ordering between BF passes• Reduces the number of passes required to find optimal path• BFY finds optimal paths faster than A*-search for most nets
in the experiment (Theorem 1)
34
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Bellman-Ford with Yen’s improvementFirst forward pass finds optimal monotonic path
35
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Bellman-Ford with Yen’s improvementBackward pass finds a detour
36
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Bellman-Ford with Yen’s improvementSecond forward pass finds optimal path
37
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing
Bellman-Ford with Yen’s improvement (1970)
•With m passes, runtime complexity is O(mN)
(N = number of nodes in the space bounded by S and T)
•Limit m to reduce runtime• Small loss of optimality• Focus on incremental calls to BFY
• Incremental Routing with BFY
•Records partial costs along an existing route to reduce runtime(rip-up-and-reroute and repeated invocations of LIRE during placement)
•Faster!
38
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Incremental Routing with BFYInitial route with BFY
39
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
3.1 Faster Routing – Bellman Ford Algorithm
Incremental Routing with BFYThrough relaxation, BFY preserve part of the routeand find a better partial segment
40
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
4. Congestion Relief
Main Goal: To increase the porosity of placement regions with high routing congestion
How?
i. After global placement, shift cell locations and use congestion driven detailed placement
ii. During global placement, inflate cells based on early congestion estimates and pin density
41
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
4. Congestion Relief
Traditional ways are insufficient:
After global placement, shift cell locations and use congestion driven detailed placement
Must preserve the structure of resulting placement or risk unbearable deterioration of interconnect length
During global placement, inflate cells based on early congestion estimates and pin density
When they move outside the congest region, new cells must be inflated, which may consume all whitespace without solving root cause
42
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
4. Congestion Relief – Further Analysis
3 Types of Routing Congestion:
i. Cell based congestion caused by cell-to-cell proximity
ii. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities
iii. Remotely-induced layout based congestion attributed to non-local factors such as long net
43
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
4. Congestion Relief – Further Analysis
1. Cell based congestion caused by cell-to-cell proximity• Mitigated by cell inflation(only top5% most congested GCells
to avoid exhausting whitespace)
2. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities
• Locally inject whitespace(move cells out of congested region)
3. Remotely-induced layout based congestion attributed to non-local factors such as long net
• Enforce non-uniform target density by:
i) Creating a packing peanut(fixed cell) at the center of every GCell
ii) Modify its size based on congestion
44
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
5. Coordinated Place and Route
Integration of Routing and Placement
Incremental placement updates
•After its first invocation, LIRE maintains the overall congestion map and keeps track of the GCells traversed by each point by point connection
•In next invocation, if the endpoints remain the same, it is left unchanged
•Has pronounced effect in later iterations and during detailed placement, when locations are stabilized
45
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
5. Coordinated Place and Route
Integration of Routing and Placement
Incremental-routing updates
•When invoked for first time, LIRE generates routes from scratch.
•After that, it reuses existing routes where possible
•Nets whose terminals relocated to different Gcells are rerouted using the original net ordering
•Remaining nets are checked if their routes are congested, and it is mitigated by single incremental BFY passes
• Replicates accuracy of maize router, but a better runtime
46
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
6. Empirical Validation
Verifying Result
Implemented in CoPR in C++ using the OpenMP library, compiled with g++4.7.0
Global placer derived from SimPL
Used by three of the top four teams at the ICCAD 2012 Contest
Reported on the ICCAD 2012 benchmark by IBM researchers
47
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
6. Empirical Validation
Based on same run-time, CoPR outperforms the finalists of ICCAD 2012 Contest by 7% and 2% in quality metrics. It is 5.7 faster than another contestant with same quality.
With respect to scoring formulas used at the ICCAD 2012 Contest, CoPR outperforms the winner.
48
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
7. Comparisons to Prior Art
Fast Routing:“A Fast Maze-free Routing Congestion Estimator With Hybrid Unilateral Monotonic Routing” by W.-H. Liu, Y.-L. Li and C.-K. Kok
Replaces A* - Search with fast linear-time routing algorithms that exploit a different notion of monotonic routes
Uses multiple passes to find non-monotonic routes and does not claim optimality
Doesn’t consider CPU cache effects and the connection with BFY
Not used to drive competitive global placer in comparison to the successful results for coordinated place-and-route by CoPR
CoPR’s authors completed their work before this paper was published or made available
49
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
7. Comparisons to Prior Art
Fast Routing:“BonnTools: Mathematical Innovation for Layout and Timing Closure of Systems on a Chip” by B. Korte, D. Rautenbach and J. Vygen
Speeds up Dijkstra’s algorithm with sophisticated data structures and algorithms
Uses more memory for advanced data structure and requires significant up-front set-up
Singled-threaded version of LIRE takes <15% of runtime in the entire place-and-route flow
CoPR’s authors avoided sophisticated routing algorithms and data structures
50
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
7. Comparisons to Prior Art
Incremental Routing Techniques
All modern routability-driven technique use built-in congestion estimation to construct new estimates from scratch every invocation
Unnecessarily time-consuming, especially when placement has not changed significantly
51
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
7. Comparisons to Prior Art
Incremental Routing Techniques
“GDRouter: Interleaved Global Routing and Detailed Routing for Ultimate Routability”by Y. Zhang and C. Chu
Rip-up and reroute some congested nets
Assume static routing and placement instance
CoPR: Accounts for dynamic placement and routing instances Takes advantage of previous partial routes Updates routes on an as-needed basis
52
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
8. Conclusions
• Interconnects are playing dominance roles in IC Design:
Area
Volume
Delay
Power
Signal integrity
• Threatening to render Moore’s Law irrelevant
• Solution? Reduce interconnect demand
• IBM researchers:
•Design flows with separate placement and routing steps have become ineffective for modern ICs
•Combining the two brings tangible and significant benefits in IC cost 53
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
8. Conclusions
Why isn’t there more research on integrated optimizations?
•Sophisticated data structures
•Elaborate multistep optimizations used by state-of-the-art algorithms
•Unmaintainable source-code bases that are unnecessarily entangled
•Large sets of tuning parameters
•Significant runtime
54
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
8. Conclusions
Coordinated Place-and-Route(CoPR)
•Dramatic acceleration of constructive routing estimation through linear-time cache-friendly algorithms that do not require sophisticated data structures
•Significant reductions in the amount of work through pervasive incrementality at the interface between placement and routing
•Identification of two new types of routing congestion, as well as mechanism by which a global placer can diagnose them and respond effectively
•Strong empirical results on the most recent benchmarks from IBM research
55
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
8. Conclusions
Impact of this paper:
•More compact and less costly IC layouts
•Reduce back-end turn-around-time so IC designers can evaluate a greater number of micro-architectural configurations
•Provide an algorithm framework:• Integrates routing and placement• Enhances performance
This paper will be presented at the Paper Sessions of DAC 2013(Design Automation Conference)in June 6th at Austin,Texas
“One Small Step for Placement, One Big Leap for Routability! ”56
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
ICCAD
• Annual CAD Contest in Taiwan since 2000
• Boost EDA research momentum in Taiwan
• ICCAD started in 2012 sponsored by IEEE CEDA and Taiwan MoE
• Designed for university students
57
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
ICCAD
• The quality metrics are determined by the problem specifications
•Correctness
•Runtime
•Memory usage
• Evaluated by the announced benchmarks and hidden benchmarks
• Language: Standard C/C++ Library, MATLAB prohibited
• System Platform (Machine type & Linux/GNU libc/Gcc version) is announced in each problem
58
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
ICCAD 2013
• 2012 Contest:
•>50 Teams from 7 regions
•Problems:1. Finding the minimal logic difference for functional ECO
(contributed by Cadence Design Systems Inc., Taiwan)2. Design hierarchy aware routability-driven placement
(contributed by IBM Corp., USA)3. Fuzzy pattern matching for physical verification
(contributed by Mentor Graphics Corp., USA)
First Place of Problem 2Myung-Chul Kim & Jin Hu –University of MichiganAdvisor: Prof. Igor L. Markov
59
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
ICCAD 2013
• 2013 Contest:
•Problems:1. Technology Mapping for Macro Blocks contributed
(contributed by Taiwan Cadence Design Systems, Inc.)2. Placement Finishing – Detailed Placement and Legalization
(contributed by IBM Research, Austin, TX)3. Mask Optimization contributed
(contributed by IBM Research, East Fishkill, NY)
Registration Deadline: May 15, 2013
• http://cad_contest.cs.nctu.edu.tw/CAD-contest-at-ICCAD2013/default.html
60
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
END
Thank you very much!
61
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction
© K
LMH
Lien
ig
Proof of Theorem 1
62