Practical Approximation Algorithms for Separable Packing LPs

F.F. Dragan F.F. Dragan (Kent State)(Kent State)

A.B. Kahng A.B. Kahng (UCSD)(UCSD)

I. Mandoiu I. Mandoiu (UCLA/UCSD)(UCLA/UCSD)

S. Muddu S. Muddu (Sanera Systems)(Sanera Systems)

A. Zelikovsky A. Zelikovsky (Georgia State)(Georgia State)

Practical Approximation Algorithms for Separable Packing LPs

Practical Approximation Algorithms for Separable Packing LPs

2

Outline

• VLSI design motivation

– Global routing via buffer-blocks

– Separable packing ILP formulations

• PTAS for separable packing LPs

• Analysis

• Experimental results

3

Outline





• Analysis


4

Outline





• Analysis


5

Outline





• Analysis


6

VLSI Global Routing

7

VLSI Global RoutingBuffered

Buffer Blocks

8

Problem Formulation

Global Routing via Buffer-Blocks (GRBB) ProblemGiven:

• BB locations and capacities

• List of multi-pin nets– upper-bound on #buffers for each source-sink path

• L/U bounds on the wirelength b/w consecutive buffers/pins

Find:

• Buffered routing of a maximum number of nets subject to the given constraints

9

Integer Program Formulation

}],[)(:)({

BlocksBuffer terminals

:),(graph Routing

ULu,vdistu,vE

V

EVG

otherwisecapacity BB terminal,is vif 1 cap(v)

otherwise 0 , if 1 ),(

}1,0{)(

)(cap)(),(..

)(max

TvvT

Tf

vTfvTts

Tf

T

T

10

Enforcing Parity Constraints

• Inverting buffers change the polarity of the signal• Each sink has a given polarity requirement

Parity constraints for the #buffers on each routed source-sink path A path may use two buffers in the same buffer block

)(cap)()]'',()',([ rTfrTrTT

Integer program changes• Split each BB vertex r of G into two copies, r’ and r’’• Impose capacity constraint on the sets of vertices {r’,r’’}

11

Combining with compaction

12


13


Set capacity constraints: cap(BB1) + cap(BB2) const.

14

GRBB with Buffer Library

• Discrete buffer library: different buffer sizes/driving strengths Need to allocate BB capacity between different buffer types

)(cap)()'()',()('

rTfrsizerTT rXr

Integer program changes• Replace each BB vertex r of G by a set X(r) of vertices (one

for each buffer type)• Modify edge set of G to take into account non-uniform

driving strengths• Impose capacity constraint on the sets of vertices X(r):

15

“Relax+Round” Approach to GRBB

1. Solve the fractional relaxation

– Exact linear programming algorithms are impractical for large instances

– KEY IDEA: use an approximation algorithm

• allows fine-tuning the tradeoff between runtime and solution quality

2. Round to integer solution

– Provably good rounding [RT87]

– Practical runtime (random-walk based)

16

Outline



– Separable packing LP formulations


• Analysis


17

Separable Packing LP

vZcap

vvsizeRVsize

EVG

V inalevery termfor 1cap({v}) s.t. 2:function Capacity

inalevery termfor 1)( s.t. :function Size

),(graph Routing

X

T

T

vsizevTXT

Tf

XTfXTts

Tf

)(),( ),(

0)(

)(cap)(),(..

)(max

18

Previous Work

• MCF and packing/covering LP approximation: [FGK73,SM90, PST91,G92,GK94,KPST94,LMPSTT95,R95,Y95,GK98,F00,…]

• Exponential length function to model flow congestion [SM90]

• Shortest-path augmentation + final scaling [Y95]

• Modified routing increment [GK98]

• Fewer shortest-path augmentations [F00]

• We extend speed-up idea of [F00] to separable packing LPs

19

Separable Packing LP Algorithm

w(X) , f 0, = For i = 1 to N do For k = 1, …, #nets do Find min weight feasible Steiner tree T for net k While weight(T) < min{ 1, (1+) } do f(T)= f(T) + 1 For every X do w(X) ( 1 + (T,X)/cap(X) ) * w(X) End For Find min weight feasible Steiner tree T for net k End While End For = (1+) End ForOutput f/N

20

Outline





• Analysis


21

Runtime

0)(

1)(),(..

)(cap)(min

Xf

XwXTts

XXw

X

X

Dual LP:

• Choose #iterations N such that all feasible trees have weight 1 after N iterations (i.e., 1)

• Tree weight lower bound is initially, and is multiplied by (1+) in each iteration

1

log 1N

22

Approximation Guarantee

)log)nets(#( 2 LTO tree

Theorem: For every <.15, the algorithm finds factor

1/(1+4 ) approximation by choosing

where L is the maximum number of vertices in a

feasible Steiner tree. For this value of , the running

time is

1

))1)((1(

L

23

Outline





• Analysis


24

Implementation choices

2-Pin 3,4-pin Multi-pin

Decomposition Star,

Minimum Spanning tree

Matching,

3-restricted Steiner tree

Not needed

Min-weight DRST Shortest path (exact)

Try all Steiner pts

+ shortest paths (exact)

Very hard!

heuristics

Rounding Random-walk Backward random-walks

25

1. Store fractional flows f(T) for every feasible Steiner tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T feasible for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Problem: Impractical to store all non-zero flow trees

Provably Good Rounding

26

1. Store fractional flows f(T) for every valid routing tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T routing for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Random-Walk 2-TMCF Rounding

use random walk from source to sink

Practical: random walk requires storing only flows on edges

27

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

28

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

29

The MTMCF Rounding Heuristic

1. Round each net k with probability f(k), using backward

random walks

– No scaling-down, approximate MTMCF < OPT

2. Resolve capacity violations by greedily deleting routed paths

– Few violations

3. Greedily route remaining nets using unused BB capacity

– Further routing still possible

30

Implemented Heuristics

• Greedy buffered routing:1. For each net, route sinks sequentially along shortest paths to

source or node already connected to source

2. After routing a net, remove fully used BBs

• Generalized MCF approximation + randomized rounding– G2TMCF – G3TMCF (3-pin decomposition)– G4TMCF (4-pin decomposition)– GMTMCF (no decomposition, approximate DRST)

31

Experimental Setup

• Test instances extracted from next-generation SGI microprocessor

• Up to 5,000 nets, ~6,000 sinks • U=4,000 m, L=500-2,000 m• 50 buffer blocks• 200-400 buffers / BB

32

% Routed Nets vs. Runtime

93

94

95

96

97

98

99

0.1 1 10 100 1000 10000 100000

CPU Seconds

% r

ou

ted

ne

ts

MT-Greed

G2TMCF

G3TMCF

G4TMCF

GMTMCF

33

Conclusions and Ongoing Work

• Provably good algorithms and practical heuristics based on separable packing LP approximation– Higher completion rates than previous algorithms

• Extensions:– Combine global buffering with BB planning– Buffer “site” methodology tile graph– Routing congestion (channel capacity constraints)– Simultaneous pin assignment

34

35

% Sinks Connected

#sinks/

#netsGreed

G2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

92.2 93.8 95.5 96.2 97.8 96.6 98.3 96.7 97.4

3077/ 2438

92.3 93.9 96.5 96.4 98.5 96.9 98.8 97.6 99.3

3099/ 2784

92.1 93.6 95.5 96.4 98.0 96.6 98.1 97.3 98.7

6038/ 4764

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

6296/ 4925

93.6 96.2 97.6 97.0 98.6 97.7 99.1 97.7 98.4

6321/ 4938

93.3 96.2 97.5 96.8 98.4 97.7 98.9 97.7 98.2

36

Runtime (sec.)

#sinks/ #nets

Greed

G2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

.30 1.63 357 9.16 2,090 98.91 29,190 2.33 947

3077/ 2438

.33 2.35 350 11.10 2,356 128.38 37,970 2.87 846

3099/ 2784

.33 1.80 392 12.56 2,364 132.81 38,341 2.86 877

6038/ 4764

.53 2.84 600 16.57 3,166 182.55 60,450 4.98 1,866

6296/ 4925

.55 4.35 690 19.5 3,721 265.78 77,671 5.38 1,828

6321/ 4938

.54 3.37 730 18.99 3,813 255.37 79,123 5.43 1,833

37

Resource Usage

GreedG2TMCF G3TMCF G4TMCF GMTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04# Conn. Sinks

5,645 5,725 5,842 5,779 5,896 5,827 5,942 5,813 5,897

% Conn. Sinks

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

WL (meters)

42.22 45.18 47.80 44.48 47.66 44.18 47.49 45.33 47.51

WL/sink (microns)

7,479 7,891 8,182 7,697 8,083 7,582 7,992 7,798 8,057

#Buff 9037 9,860 10,676 9,591 10,610 9,497 10,507 9,860 10,647

#Buff/sink 1.60 1.72 1.83 1.66 1.80 1.63 1.77 1.70 1.81

#nets = 4,764 #sinks = 6,038 400 buffers/BB

38

Resource Usage for 100% Completion

Greed 4TMCF, =.04=.04

#buffers/BB 1,000 or INF 500 600 1,000 INF

WL (meters) 47.89 49.46 49.58 49.98 51.40

WL/sink (microns)

7,931 8,191 8,212 8,278 8,513

#Buff 10,330 11,079 11,115 11,373 11.803

#Buff/sink 1.71 1.83 1.84 1.88 1.95

#nets = 4,764 #sinks = 6,038 MTMCF wastes routing resources!

Documents

Practical Approximation Algorithms for Separable Packing LPs