Taming the Complexity of Coordinated Place and Route

VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 1: Introduction

© K

LMH

Lien

ig

Taming the Complexity of Coordinated Place and Route

By Jin Hu, Myung-Chul Kim and Igor Markov

Presented By: Alvin Li

EECS 527. Layout Synthesis and Optimization



© K

LMH

Lien

ig


1. Introduction

2. Background

3. LIRE: Routing Estimation

4. Congestion Relief

5. Coordinated Place and Route

6. Empirical Validation

7. Comparison to Prior Arts

8. Conclusions

2


© K

LMH

Lien

ig

1. Introduction

3

Interconnects

- 3 layers- Uniform pitch

- More than 3 layers- Non-uniform pitch


© K

LMH

Lien

ig

1. Introduction

•Interconnect complexities increased since 1980s• Increased to 9-12 layers(non-uniform pitch) from 3

• Longer routing times

• Lower quality of IC circuits

4

Interconnects (From Fig. 6.17 Chapter 6 VLSI Physical Design of Integrated Circuits)


© K

LMH

Lien

ig

1. Introduction

•Interconnects Dominate• IC Performance

• Power Dissipation

• Size

• Signal Integrity

5


© K

LMH

Lien

ig

1. Introduction: Significance of the Paper

•Global Placement & Global Routing• Standalone vs. integrated

- Signal integrity and coupling capacitances in interconnect

A set of individual optimizations or one simultaneous optimization?

•Streamlined System: Coordinated Place-and-Route(CoPR)•Routing estimation during placement

•Placement technique that addresses three types of routing congestion

•Interface to congestion elimination

6


© K

LMH

Lien

ig

2. Background – Dijkstra’s Algorithm

Also known as Maze Routing

Finds shortest path from source node to target node

•Graph with non-negative edge

7


© K

LMH

Lien

ig


8


© K

LMH

Lien

ig


9


© K

LMH

Lien

ig


10


© K

LMH

Lien

ig


11


© K

LMH

Lien

ig


12


© K

LMH

Lien

ig

2. Background – A* Search Algorithm

Extension of Dijkstra’s Algorithm, but faster Estimates distance to target Node priority:

Group 2 label in Dijkstra’s Algorithm

+

Distance estimate, including vias, to the target node

31 Nodes vs. 6 Nodes visited13


© K

LMH

Lien

ig

2. Background – Key Characteristics of A* Search Algorithm

Characteristic Detail Effect

Captures Detours

Includes history cost and congestion

Speed

Priority Queue Selects the best path Complexity

Pointer-Based Algorithm Cache Miss

History Cost Used to determine optimal path along with congestion

Overshadows functions based on straight-line distance

Admissible Considers the fewest nodes Cannot leverage incrementality, no incremental improvement

14


© K

LMH

Lien

ig

2. Coordinated Place-and-Route

Proposed Improvement to A* Search Algorithm: Streamlined System: Coordinated Place-and-Route(CoPR)

• Cache-friendly routing primitives: estimate routing congestion

• Leverages incrementality in routing and congestions updates

• New categorization of congestion

• New congestion-relief techniques

15


© K

LMH

Lien

ig


Lightweight Incremental Routing Estimator

•Congestion maps like global router

•75K nets per second (can tradeoff between quality and run time)

16


© K

LMH

Lien

ig


17


© K

LMH

Lien

ig

3.1 Faster Routing

Traditional Global Routing: Maze Routing

•Priority queue complex and slow

•Large history based cost

•Lacks incrementality

Linear-time cache-friendly routing

•Avoid priority-queue-based approaches

•Avoid pointers to improve cache hit rate

Bellman-Ford Algorithm

18


© K

LMH

Lien

ig

3.1 Faster Routing – Bellman Ford Algorithm

Bellman – Ford Algorithm(1958)

Slower than Dijkstra’s Algorithm

E * O(1) relaxation steps

Goes through all nodes

Relaxes all edges instead of greedily selecting minimum weight node not yet processed to relax

Calculates all path and repeat (N-1) times (N = number of vertices)

Visits nodes randomly

19


© K

LMH

Lien

ig


Bellman – Ford Algorithm(1958)

20


© K

LMH

Lien

ig


Monotonic Routing with One Linear-Time BF Pass

Consider only forward edges

Only consider the space bounded by S and T

Visit in order, going through each node once

runtime complexity is O(N)

(N = number of nodes in the space bounded by S and T)

21


© K

LMH

Lien

ig


Non-monotonic Routing with One Linear-Time BF Pass

Duplex-edge relaxation: relaxation in both directions

Echo-relaxation: propagate smaller cost through all recently relaxed edge incident to the point

Effective in detouring short nets (majority of nets are short)

22


© K

LMH

Lien

ig



23


© K

LMH

Lien

ig



24


© K

LMH

Lien

ig



25


© K

LMH

Lien

ig



26


© K

LMH

Lien

ig



27


© K

LMH

Lien

ig



28


© K

LMH

Lien

ig



29


© K

LMH

Lien

ig



30


© K

LMH

Lien

ig



31


© K

LMH

Lien

ig



32


© K

LMH

Lien

ig



33


© K

LMH

Lien

ig

3.1 Faster Routing

Bellman-Ford with Yen’s improvement (1970)

•J.Y. Yen suggested reversing the node ordering between BF passes• Reduces the number of passes required to find optimal path• BFY finds optimal paths faster than A*-search for most nets

in the experiment (Theorem 1)

34


© K

LMH

Lien

ig


Bellman-Ford with Yen’s improvementFirst forward pass finds optimal monotonic path

35


© K

LMH

Lien

ig


Bellman-Ford with Yen’s improvementBackward pass finds a detour

36


© K

LMH

Lien

ig


Bellman-Ford with Yen’s improvementSecond forward pass finds optimal path

37


© K

LMH

Lien

ig

3.1 Faster Routing

Bellman-Ford with Yen’s improvement (1970)

•With m passes, runtime complexity is O(mN)

(N = number of nodes in the space bounded by S and T)

•Limit m to reduce runtime• Small loss of optimality• Focus on incremental calls to BFY

• Incremental Routing with BFY

•Records partial costs along an existing route to reduce runtime(rip-up-and-reroute and repeated invocations of LIRE during placement)

•Faster!

38


© K

LMH

Lien

ig


Incremental Routing with BFYInitial route with BFY

39


© K

LMH

Lien

ig


Incremental Routing with BFYThrough relaxation, BFY preserve part of the routeand find a better partial segment

40


© K

LMH

Lien

ig


Main Goal: To increase the porosity of placement regions with high routing congestion

How?

i. After global placement, shift cell locations and use congestion driven detailed placement

ii. During global placement, inflate cells based on early congestion estimates and pin density

41


© K

LMH

Lien

ig


Traditional ways are insufficient:

After global placement, shift cell locations and use congestion driven detailed placement

Must preserve the structure of resulting placement or risk unbearable deterioration of interconnect length

During global placement, inflate cells based on early congestion estimates and pin density

When they move outside the congest region, new cells must be inflated, which may consume all whitespace without solving root cause

42


© K

LMH

Lien

ig

4. Congestion Relief – Further Analysis

3 Types of Routing Congestion:

i. Cell based congestion caused by cell-to-cell proximity

ii. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities

iii. Remotely-induced layout based congestion attributed to non-local factors such as long net

43


© K

LMH

Lien

ig

4. Congestion Relief – Further Analysis

1. Cell based congestion caused by cell-to-cell proximity• Mitigated by cell inflation(only top5% most congested GCells

to avoid exhausting whitespace)

2. Local layout-based congestion caused by static design properties, such as blockages and reduced routing capacities

• Locally inject whitespace(move cells out of congested region)

3. Remotely-induced layout based congestion attributed to non-local factors such as long net

• Enforce non-uniform target density by:

i) Creating a packing peanut(fixed cell) at the center of every GCell

ii) Modify its size based on congestion

44


© K

LMH

Lien

ig


Integration of Routing and Placement

Incremental placement updates

•After its first invocation, LIRE maintains the overall congestion map and keeps track of the GCells traversed by each point by point connection

•In next invocation, if the endpoints remain the same, it is left unchanged

•Has pronounced effect in later iterations and during detailed placement, when locations are stabilized

45


© K

LMH

Lien

ig


Integration of Routing and Placement

Incremental-routing updates

•When invoked for first time, LIRE generates routes from scratch.

•After that, it reuses existing routes where possible

•Nets whose terminals relocated to different Gcells are rerouted using the original net ordering

•Remaining nets are checked if their routes are congested, and it is mitigated by single incremental BFY passes

• Replicates accuracy of maize router, but a better runtime

46


© K

LMH

Lien

ig


Verifying Result

Implemented in CoPR in C++ using the OpenMP library, compiled with g++4.7.0

Global placer derived from SimPL

Used by three of the top four teams at the ICCAD 2012 Contest

Reported on the ICCAD 2012 benchmark by IBM researchers

47


© K

LMH

Lien

ig


Based on same run-time, CoPR outperforms the finalists of ICCAD 2012 Contest by 7% and 2% in quality metrics. It is 5.7 faster than another contestant with same quality.

With respect to scoring formulas used at the ICCAD 2012 Contest, CoPR outperforms the winner.

48


© K

LMH

Lien

ig

7. Comparisons to Prior Art

Fast Routing:“A Fast Maze-free Routing Congestion Estimator With Hybrid Unilateral Monotonic Routing” by W.-H. Liu, Y.-L. Li and C.-K. Kok

Replaces A* - Search with fast linear-time routing algorithms that exploit a different notion of monotonic routes

Uses multiple passes to find non-monotonic routes and does not claim optimality

Doesn’t consider CPU cache effects and the connection with BFY

Not used to drive competitive global placer in comparison to the successful results for coordinated place-and-route by CoPR

CoPR’s authors completed their work before this paper was published or made available

49


© K

LMH

Lien

ig


Fast Routing:“BonnTools: Mathematical Innovation for Layout and Timing Closure of Systems on a Chip” by B. Korte, D. Rautenbach and J. Vygen

Speeds up Dijkstra’s algorithm with sophisticated data structures and algorithms

Uses more memory for advanced data structure and requires significant up-front set-up

Singled-threaded version of LIRE takes <15% of runtime in the entire place-and-route flow

CoPR’s authors avoided sophisticated routing algorithms and data structures

50


© K

LMH

Lien

ig


Incremental Routing Techniques

All modern routability-driven technique use built-in congestion estimation to construct new estimates from scratch every invocation

Unnecessarily time-consuming, especially when placement has not changed significantly

51


© K

LMH

Lien

ig


Incremental Routing Techniques

“GDRouter: Interleaved Global Routing and Detailed Routing for Ultimate Routability”by Y. Zhang and C. Chu

Rip-up and reroute some congested nets

Assume static routing and placement instance

CoPR: Accounts for dynamic placement and routing instances Takes advantage of previous partial routes Updates routes on an as-needed basis

52


© K

LMH

Lien

ig

8. Conclusions

• Interconnects are playing dominance roles in IC Design:

Area

Volume

Delay

Power

Signal integrity

• Threatening to render Moore’s Law irrelevant

• Solution? Reduce interconnect demand

• IBM researchers:

•Design flows with separate placement and routing steps have become ineffective for modern ICs

•Combining the two brings tangible and significant benefits in IC cost 53


© K

LMH

Lien

ig

8. Conclusions

Why isn’t there more research on integrated optimizations?

•Sophisticated data structures

•Elaborate multistep optimizations used by state-of-the-art algorithms

•Unmaintainable source-code bases that are unnecessarily entangled

•Large sets of tuning parameters

•Significant runtime

54


© K

LMH

Lien

ig

8. Conclusions

Coordinated Place-and-Route(CoPR)

•Dramatic acceleration of constructive routing estimation through linear-time cache-friendly algorithms that do not require sophisticated data structures

•Significant reductions in the amount of work through pervasive incrementality at the interface between placement and routing

•Identification of two new types of routing congestion, as well as mechanism by which a global placer can diagnose them and respond effectively

•Strong empirical results on the most recent benchmarks from IBM research

55


© K

LMH

Lien

ig

8. Conclusions

Impact of this paper:

•More compact and less costly IC layouts

•Reduce back-end turn-around-time so IC designers can evaluate a greater number of micro-architectural configurations

•Provide an algorithm framework:• Integrates routing and placement• Enhances performance

This paper will be presented at the Paper Sessions of DAC 2013(Design Automation Conference)in June 6th at Austin,Texas

“One Small Step for Placement, One Big Leap for Routability! ”56


© K

LMH

Lien

ig

ICCAD

• Annual CAD Contest in Taiwan since 2000

• Boost EDA research momentum in Taiwan

• ICCAD started in 2012 sponsored by IEEE CEDA and Taiwan MoE

• Designed for university students

57


© K

LMH

Lien

ig

ICCAD

• The quality metrics are determined by the problem specifications

•Correctness

•Runtime

•Memory usage

• Evaluated by the announced benchmarks and hidden benchmarks

• Language: Standard C/C++ Library, MATLAB prohibited

• System Platform (Machine type & Linux/GNU libc/Gcc version) is announced in each problem

58


© K

LMH

Lien

ig

ICCAD 2013

• 2012 Contest:

•>50 Teams from 7 regions

•Problems:1. Finding the minimal logic difference for functional ECO

(contributed by Cadence Design Systems Inc., Taiwan)2. Design hierarchy aware routability-driven placement

(contributed by IBM Corp., USA)3. Fuzzy pattern matching for physical verification

(contributed by Mentor Graphics Corp., USA)

First Place of Problem 2Myung-Chul Kim & Jin Hu –University of MichiganAdvisor: Prof. Igor L. Markov

59


© K

LMH

Lien

ig

ICCAD 2013

• 2013 Contest:

•Problems:1. Technology Mapping for Macro Blocks contributed

(contributed by Taiwan Cadence Design Systems, Inc.)2. Placement Finishing – Detailed Placement and Legalization

(contributed by IBM Research, Austin, TX)3. Mask Optimization contributed

(contributed by IBM Research, East Fishkill, NY)

Registration Deadline: May 15, 2013

• http://cad_contest.cs.nctu.edu.tw/CAD-contest-at-ICCAD2013/default.html

60


© K

LMH

Lien

ig

END

Thank you very much!

61


© K

LMH

Lien

ig

Proof of Theorem 1

62

Documents

Taming the Complexity of Coordinated Place and Route