38
F.F. Dragan F.F. Dragan (Kent State) (Kent State) A.B. Kahng A.B. Kahng (UCSD) (UCSD) I. Mandoiu I. Mandoiu (UCLA/UCSD) (UCLA/UCSD) S. Muddu S. Muddu (Sanera Systems) (Sanera Systems) A. Zelikovsky A. Zelikovsky (Georgia State) (Georgia State) Practical Approximation Algorithms for Separable Packing LPs

Practical Approximation Algorithms for Separable Packing LPs

Embed Size (px)

DESCRIPTION

Practical Approximation Algorithms for Separable Packing LPs. F.F. Dragan (Kent State) A.B. Kahng (UCSD) I. Mandoiu (UCLA/UCSD) S. Muddu (Sanera Systems) A. Zelikovsky (Georgia State). Outline. VLSI design motivation Global routing via buffer-blocks Separable packing ILP formulations - PowerPoint PPT Presentation

Citation preview

Page 1: Practical Approximation Algorithms for Separable Packing LPs

F.F. Dragan F.F. Dragan (Kent State)(Kent State)

A.B. Kahng A.B. Kahng (UCSD)(UCSD)

I. Mandoiu I. Mandoiu (UCLA/UCSD)(UCLA/UCSD)

S. Muddu S. Muddu (Sanera Systems)(Sanera Systems)

A. Zelikovsky A. Zelikovsky (Georgia State)(Georgia State)

Practical Approximation Algorithms for Separable Packing LPs

Practical Approximation Algorithms for Separable Packing LPs

Page 2: Practical Approximation Algorithms for Separable Packing LPs

2

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 3: Practical Approximation Algorithms for Separable Packing LPs

3

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 4: Practical Approximation Algorithms for Separable Packing LPs

4

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 5: Practical Approximation Algorithms for Separable Packing LPs

5

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 6: Practical Approximation Algorithms for Separable Packing LPs

6

VLSI Global Routing

Page 7: Practical Approximation Algorithms for Separable Packing LPs

7

VLSI Global RoutingBuffered

Buffer Blocks

Page 8: Practical Approximation Algorithms for Separable Packing LPs

8

Problem Formulation

Global Routing via Buffer-Blocks (GRBB) ProblemGiven:

• BB locations and capacities

• List of multi-pin nets– upper-bound on #buffers for each source-sink path

• L/U bounds on the wirelength b/w consecutive buffers/pins

Find:

• Buffered routing of a maximum number of nets subject to the given constraints

Page 9: Practical Approximation Algorithms for Separable Packing LPs

9

Integer Program Formulation

}],[)(:)({

BlocksBuffer terminals

:),(graph Routing

ULu,vdistu,vE

V

EVG

otherwisecapacity BB terminal,is vif 1 cap(v)

otherwise 0 , if 1 ),(

}1,0{)(

)(cap)(),(..

)(max

TvvT

Tf

vTfvTts

Tf

T

T

Page 10: Practical Approximation Algorithms for Separable Packing LPs

10

Enforcing Parity Constraints

• Inverting buffers change the polarity of the signal• Each sink has a given polarity requirement

Parity constraints for the #buffers on each routed source-sink path A path may use two buffers in the same buffer block

)(cap)()]'',()',([ rTfrTrTT

Integer program changes• Split each BB vertex r of G into two copies, r’ and r’’• Impose capacity constraint on the sets of vertices {r’,r’’}

Page 11: Practical Approximation Algorithms for Separable Packing LPs

11

Combining with compaction

Page 12: Practical Approximation Algorithms for Separable Packing LPs

12

Combining with compaction

Page 13: Practical Approximation Algorithms for Separable Packing LPs

13

Combining with compaction

Set capacity constraints: cap(BB1) + cap(BB2) const.

Page 14: Practical Approximation Algorithms for Separable Packing LPs

14

GRBB with Buffer Library

• Discrete buffer library: different buffer sizes/driving strengths Need to allocate BB capacity between different buffer types

)(cap)()'()',()('

rTfrsizerTT rXr

Integer program changes• Replace each BB vertex r of G by a set X(r) of vertices (one

for each buffer type)• Modify edge set of G to take into account non-uniform

driving strengths• Impose capacity constraint on the sets of vertices X(r):

Page 15: Practical Approximation Algorithms for Separable Packing LPs

15

“Relax+Round” Approach to GRBB

1. Solve the fractional relaxation

– Exact linear programming algorithms are impractical for large instances

– KEY IDEA: use an approximation algorithm

• allows fine-tuning the tradeoff between runtime and solution quality

2. Round to integer solution

– Provably good rounding [RT87]

– Practical runtime (random-walk based)

Page 16: Practical Approximation Algorithms for Separable Packing LPs

16

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing LP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 17: Practical Approximation Algorithms for Separable Packing LPs

17

Separable Packing LP

vZcap

vvsizeRVsize

EVG

V inalevery termfor 1cap({v}) s.t. 2:function Capacity

inalevery termfor 1)( s.t. :function Size

),(graph Routing

X

T

T

vsizevTXT

Tf

XTfXTts

Tf

)(),( ),(

0)(

)(cap)(),(..

)(max

Page 18: Practical Approximation Algorithms for Separable Packing LPs

18

Previous Work

• MCF and packing/covering LP approximation: [FGK73,SM90, PST91,G92,GK94,KPST94,LMPSTT95,R95,Y95,GK98,F00,…]

• Exponential length function to model flow congestion [SM90]

• Shortest-path augmentation + final scaling [Y95]

• Modified routing increment [GK98]

• Fewer shortest-path augmentations [F00]

• We extend speed-up idea of [F00] to separable packing LPs

Page 19: Practical Approximation Algorithms for Separable Packing LPs

19

Separable Packing LP Algorithm

w(X) , f 0, = For i = 1 to N do For k = 1, …, #nets do Find min weight feasible Steiner tree T for net k While weight(T) < min{ 1, (1+) } do f(T)= f(T) + 1 For every X do w(X) ( 1 + (T,X)/cap(X) ) * w(X) End For Find min weight feasible Steiner tree T for net k End While End For = (1+) End ForOutput f/N

Page 20: Practical Approximation Algorithms for Separable Packing LPs

20

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 21: Practical Approximation Algorithms for Separable Packing LPs

21

Runtime

0)(

1)(),(..

)(cap)(min

Xf

XwXTts

XXw

X

X

Dual LP:

• Choose #iterations N such that all feasible trees have weight 1 after N iterations (i.e., 1)

• Tree weight lower bound is initially, and is multiplied by (1+) in each iteration

1

log 1N

Page 22: Practical Approximation Algorithms for Separable Packing LPs

22

Approximation Guarantee

)log)nets(#( 2 LTO tree

Theorem: For every <.15, the algorithm finds factor

1/(1+4 ) approximation by choosing

where L is the maximum number of vertices in a

feasible Steiner tree. For this value of , the running

time is

1

))1)((1(

L

Page 23: Practical Approximation Algorithms for Separable Packing LPs

23

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

Page 24: Practical Approximation Algorithms for Separable Packing LPs

24

Implementation choices

2-Pin 3,4-pin Multi-pin

Decomposition Star,

Minimum Spanning tree

Matching,

3-restricted Steiner tree

Not needed

Min-weight DRST Shortest path (exact)

Try all Steiner pts

+ shortest paths (exact)

Very hard!

heuristics

Rounding Random-walk Backward random-walks

Page 25: Practical Approximation Algorithms for Separable Packing LPs

25

1. Store fractional flows f(T) for every feasible Steiner tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T feasible for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Problem: Impractical to store all non-zero flow trees

Provably Good Rounding

Page 26: Practical Approximation Algorithms for Separable Packing LPs

26

1. Store fractional flows f(T) for every valid routing tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T routing for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Random-Walk 2-TMCF Rounding

use random walk from source to sink

Practical: random walk requires storing only flows on edges

Page 27: Practical Approximation Algorithms for Separable Packing LPs

27

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

Page 28: Practical Approximation Algorithms for Separable Packing LPs

28

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

Page 29: Practical Approximation Algorithms for Separable Packing LPs

29

The MTMCF Rounding Heuristic

1. Round each net k with probability f(k), using backward

random walks

– No scaling-down, approximate MTMCF < OPT

2. Resolve capacity violations by greedily deleting routed paths

– Few violations

3. Greedily route remaining nets using unused BB capacity

– Further routing still possible

Page 30: Practical Approximation Algorithms for Separable Packing LPs

30

Implemented Heuristics

• Greedy buffered routing:1. For each net, route sinks sequentially along shortest paths to

source or node already connected to source

2. After routing a net, remove fully used BBs

• Generalized MCF approximation + randomized rounding– G2TMCF – G3TMCF (3-pin decomposition)– G4TMCF (4-pin decomposition)– GMTMCF (no decomposition, approximate DRST)

Page 31: Practical Approximation Algorithms for Separable Packing LPs

31

Experimental Setup

• Test instances extracted from next-generation SGI microprocessor

• Up to 5,000 nets, ~6,000 sinks • U=4,000 m, L=500-2,000 m• 50 buffer blocks• 200-400 buffers / BB

Page 32: Practical Approximation Algorithms for Separable Packing LPs

32

% Routed Nets vs. Runtime

93

94

95

96

97

98

99

0.1 1 10 100 1000 10000 100000

CPU Seconds

% r

ou

ted

ne

ts

MT-Greed

G2TMCF

G3TMCF

G4TMCF

GMTMCF

Page 33: Practical Approximation Algorithms for Separable Packing LPs

33

Conclusions and Ongoing Work

• Provably good algorithms and practical heuristics based on separable packing LP approximation– Higher completion rates than previous algorithms

• Extensions:– Combine global buffering with BB planning– Buffer “site” methodology tile graph– Routing congestion (channel capacity constraints)– Simultaneous pin assignment

Page 34: Practical Approximation Algorithms for Separable Packing LPs

34

Page 35: Practical Approximation Algorithms for Separable Packing LPs

35

% Sinks Connected

#sinks/

#netsGreed

G2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

92.2 93.8 95.5 96.2 97.8 96.6 98.3 96.7 97.4

3077/ 2438

92.3 93.9 96.5 96.4 98.5 96.9 98.8 97.6 99.3

3099/ 2784

92.1 93.6 95.5 96.4 98.0 96.6 98.1 97.3 98.7

6038/ 4764

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

6296/ 4925

93.6 96.2 97.6 97.0 98.6 97.7 99.1 97.7 98.4

6321/ 4938

93.3 96.2 97.5 96.8 98.4 97.7 98.9 97.7 98.2

Page 36: Practical Approximation Algorithms for Separable Packing LPs

36

Runtime (sec.)

#sinks/ #nets

Greed

G2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

.30 1.63 357 9.16 2,090 98.91 29,190 2.33 947

3077/ 2438

.33 2.35 350 11.10 2,356 128.38 37,970 2.87 846

3099/ 2784

.33 1.80 392 12.56 2,364 132.81 38,341 2.86 877

6038/ 4764

.53 2.84 600 16.57 3,166 182.55 60,450 4.98 1,866

6296/ 4925

.55 4.35 690 19.5 3,721 265.78 77,671 5.38 1,828

6321/ 4938

.54 3.37 730 18.99 3,813 255.37 79,123 5.43 1,833

Page 37: Practical Approximation Algorithms for Separable Packing LPs

37

Resource Usage

GreedG2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04# Conn. Sinks

5,645 5,725 5,842 5,779 5,896 5,827 5,942 5,813 5,897

% Conn. Sinks

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

WL (meters)

42.22 45.18 47.80 44.48 47.66 44.18 47.49 45.33 47.51

WL/sink (microns)

7,479 7,891 8,182 7,697 8,083 7,582 7,992 7,798 8,057

#Buff 9037 9,860 10,676 9,591 10,610 9,497 10,507 9,860 10,647

#Buff/sink 1.60 1.72 1.83 1.66 1.80 1.63 1.77 1.70 1.81

#nets = 4,764 #sinks = 6,038 400 buffers/BB

Page 38: Practical Approximation Algorithms for Separable Packing LPs

38

Resource Usage for 100% Completion

Greed 4TMCF, =.04=.04

#buffers/BB 1,000 or INF 500 600 1,000 INF

WL (meters) 47.89 49.46 49.58 49.98 51.40

WL/sink (microns)

7,931 8,191 8,212 8,278 8,513

#Buff 10,330 11,079 11,115 11,373 11.803

#Buff/sink 1.71 1.83 1.84 1.88 1.95

#nets = 4,764 #sinks = 6,038 MTMCF wastes routing resources!