EE4271 VLSI Design Interconnect Optimizations Buffer Insertion

EE4271 VLSI DesignInterconnect Optimizations

Buffer Insertion

Moore’s law

Twice the number of transistors, approximately every two years, so double clock frequency accordingly

3

0.18

Source: Gordon Moore, Chairman Emeritus, Intel Corp.

0

50

100

150

200

250

300

Technology generation (m)

Del

ay (

pse

c)

Transistor/Gate delay

Interconnect delay

0.8 0.5 0.250.25

0.150.35

Interconnects Dominate

This is why Moore’s law is not true anymore.This is why Moore’s law is not true anymore.

Objectives

• What have we learned?– Compute circuit delay on wires and gates– Gate delay optimization

• What are we going to learn?– Interconnect delay optimization: buffer

insertion• Why reducing delay• How to perform it

– This is the most important optimization in circuit design

5

0.18

Source: Gordon Moore, Chairman Emeritus, Intel Corp.

0

50

100

150

200

250

300

Technology generation (m)

Del

ay (

pse

c)

Transistor/Gate delay

Interconnect delay

0.8 0.5 0.250.25

0.150.35

Why is this trend?

A scaling primer

• Ideal process scaling:– Device geometries shrink by S= 0.7x)

• Device delay shrinks by s

– Wire geometries shrink by • Unit resistance R/ : /(ws.hs) = r/s2

• Unit coupling capacitance

Cc/ : (hs)/(Ss)

• Resistance doubled, capacitance roughly unchanged for unit length

• How about the change in wire length?

SS

GG

DD

h

w

l

S

l

h

Sw

Technology scaling

• Global (long) interconnect lengths don’t shrink– Global interconnect link cells far apart

• Local (short) interconnect lengths shrink by s– Local interconnects link cells nearby

Interconnect delay scaling• Delay of a wire of length l :

int = (rl)(cl) = rcl2 (a quadratic function of length)

• Local interconnects : int : (r/s2)(c)(ls)2 = rcl2

– Local interconnect delay unchanged

• Global interconnects : int : (r/s2)(c)(l)2 = (rcl2)/s2

– Global interconnect delay doubled – unsustainable!

• Interconnect delay increasingly more dominant

Buffer Insertion For Delay Reduction

Elmore Delay for Wire

x

C

unit wire capacitance c

unit wire resistance r

Elmore Delay for Buffer

v

C

u

Driving resistanceInput capacitance

Elmore Delay for A Circuit

• Delay = all Ri all Cj downstream from Ri Ri*Cj

• Elmore delay to n1 R(B)*(C1+C2)• Elmore delay to n2 R(B)*(C1+C2)+R(w)*C2

R(B)C1 R(w) C2

n1

B

n2

R

Buffers Reduce Wire Delay

x/2

cx/4 cx/4rx/2

t_unbuf = R( cx + C ) + rx( cx/2 + C )

t_buf = 2R( cx/2 + C ) + rx( cx/4 + C )

t_buf – t_unbuf = RC – rcx2/4

x/2

cx/4 cx/4rx/2

C

C R

x

∆t

Buffered global interconnects: Intuition

Interconnect delay = r.c.l2/2

Interconnect delay = r.c.li2 /2 < r.c.l2 /2 (where l = lj )

since (lj 2) < (lj )2

(Of course, we need to consider buffer delay as well)

l1 lnl3l2

l

Optimal Buffer Insertion on A Wire

• Delay before buffer insertion = rcL2/2

• Assume N identical buffers with equal inter-buffer length l (L = Nl)

• For minimum delay,

gddg

ggd

CRl

cRrCrclL

clCrlclCRNT

12/

2/

0dldT

02 2

opt

gd

l

CRrcL

rc

CRl gdopt

2

L

Rd – On resistance of inverterCg – Gate input capacitancer,c – unit resistance and capacitance

… …

l

Optimal interconnect delay

• Substituting lopt back into the interconnect delay expression:

rc

CR

CRcRrC

rc

CRrcL

CRl

cRrCrclLT

gd

gddg

gd

gdopt

dgoptopt

2

2

12/

cRrCrcCRLT dggdopt 2

Delay grows linearly with L (instead of quadratically)

Total buffer count

• Ever-increasing fractions of total cell count will be buffers– 70% in 32nm– 25% is widely observed

0

10

20

30

40

50

60

70

80

90nm 65nm 45nm 32nm

% c

ells

use

d t

o b

uff

er n

ets

clk-buf

buf

tot-buf

Source: ITRS, 2003Source: ITRS, 20030.1

1

10

100

250 180 130 90 65 45 32

Feature size (nm)Relative

delay

Gate delayLocal interconnect (M1,2)Global interconnect with repeatersGlobal interconnect without repeaters

ITRS projections

Exercise 1

• Given a wire of length 10 with r=2, c=2, what is its delay?

• Given a buffer with Rd =10, Cg=20, after optimally buffering the wire, what is the delay?

• What if wire length is 100?

• Any conclusion?

Exercise 2

• Relationship with gate sizing– If we can size the buffer, what is the best

buffer size?

– Let R0 denote the unit size buffer driving resistance, and C0 denote the unit size buffer input capacitance. Thus, Rd=R0/h and Cg=C0h

– What is best h leading to smallest delay?

Analogy

Analogy

• Advancing technology = period of city expansion, more transistors = larger city

• Interconnects = streets

• Buffers = gas stations

• Signal delay (timing) = time to cross the city

• Buffer insertion = gas station construction

Previous Result is Only Theoretical: Discrete Buffer Locations

Candidate buffer locations

RAT: Required Arrival TimeRAT = 100

Wire delay = 80

AT = 0

RAT = 100

Wire delay = 80

AT = 0

RAT = 20 AT = 80

Slack: RAT - AT

RAT = 100

Wire delay = 80

AT = 0

RAT = 20 AT = 80

Slack = 20 Slack = 20

Minimizing circuit delay = maximizing RAT at driver = maximizing slack at driver

Motivation for Problem Formulation

RAT = 300AT = 350Slack = RAT-AT= -50

RAT = 700AT = 600Slack = 100

RAT = 300AT = 250Slack = 50

RAT = 700AT = 400Slack = 300

slack = -50

slack = 50Decouple capacitive load from critical path

RAT = Required Arrival Time

Slack = RAT - AT

We need to maximum slack or RAT at driver

Timing Driven Buffering Problem Formulation

• Given– A Steiner tree– RAT at each sink– A buffer type– RC parameters– Candidate buffer locations

• Find buffer insertion solution such that the slack (or RAT) at the driver is maximized

An Example for Buffer Insertion

(v1, 1, 20)22

v1 v1

(v2, 3, 16)

• r = 1, c = 1• Rb = 1, Cb = 1• Rd = 1

(v2, 1, 13)

v1

(v3, 5, 8)

v1

(v3, 3, 9)

slack = 6

slack = 3

Add wire

Add wire

Insert buffer Add wire

Add driver

Add driver

C Q

Candidate Buffering Solution

• Definition• Each candidate

solution is associated with– vi: a node

– ci: downstream capacitance

– qi: RAT

vi is a sinkci is sink capacitance

v is an internal node

Van Ginneken’s Algorithm

Candidate solutions are propagated toward the source

Solution Propagation: Add Wire

• c2 = c1 + cx

• q2 = q1 – rcx2/2 – rxc1

• r: wire resistance per unit length

• c: wire capacitance per unit length

(v1, c1, q1)(v2, c2, q2)x

32

Solution Propagation: Insert Buffer

• c1b = Cb

• q1b = q1 – Rbc1

• Cb: buffer capacitance

• Rb: buffer resistance

(v1, c1, q1)(v1, c1b, q1b)

Solution Propagation: Add Driver

• q0d = q0 – Rdc0

• Rd: driver resistance

• Pick solution with max slack

(v0, c0, q0)(v0, c0d, q0d)

Exercise

(20,400)22

Unit Wire Cap = 5Unit Wire Res = 3Buffer C=5, R=1Perform buffer insertion to maximize the slack at driver

2

Exponential Runtime

2 solutions

4 solutions

8 solutions

16 solutions

n candidate buffer locations lead to 2n solutions

Solution Pruning

• Two candidate solutions– (v, c1, q1)

– (v, c2, q2)

• Solution 1 is inferior if – c1 c2 : larger load

– and q1 q2 : tighter timing

LOAD

An Analogy - 1

Faster -> Smaller Delay -> Larger RAT (since RAT = RAToutput - Delay)

Larger Load -> Larger Capacitance

LOAD

LOAD

Faster & smaller load(larger RAT, smaller

capacitance):Good

Slower & larger load(smaller RAT, larger

capacitance):Inferior

END

An Analogy - 2

END

Who will be the winner?Cannot tell at this moment,

so keep both of them.

An Analogy - 3

END

Who will be the winner?Cannot tell at this moment,

so keep both of them.

An Analogy - 4

Pruning When Insert Buffer

They have the same load cap Cb, only the one with max q is kept

42

Generating Candidates

(1)

(2)

(3)

From Dr. Charles Alpert

43

Pruning Candidates

(3)

(a) (b)

Both (a) and (b) “look” the same to the source.Throw out the one with the worse slack

(4)

44

Candidate Example Continued

(4)

(5)

45

Candidate Example ContinuedAfter pruning

(5)

At driver, compute which candidate maximizesslack. Result is optimal.

46

Example

(20,400)

(20,400)(30,250)(5, 220)

(20,400)(30,250)(5, 220)

(40, 40)(5, 0)(15,160)(5, 145)

Unit Wire Cap = 5Unit Wire Res = 3Buffer C=5, R=1

2 2 2

47

Example Cont’d

(20,400)(30,250)(5, 220)

(40, 40)(5, 0)(15,160)(5, 145)

(5,0) is inferior to (5,145). (45,40) is inferior to (15,160)

(20,400)(30,250)(5, 220)

(15,160)(5, 145)(5,15)

(5,70)

Pick solution with largest slack, follow arrows to get solution

Exercise

• Without pruning, there will be exponential number of candidate solutions (i.e., given n candidate buffer locations, there will be 2n solutions). With pruning, how many solutions will we have?

Exercise

Unit Wire Cap = 1Unit Wire Res = 1Buffer C=1, R=1

2 2

(10,40)(8,50)(5,10)(15,40)(7,10)(9,30)(12,20)

• Continue the following buffer insertion process. Assume that all partial candidate buffering solutions are as shown.

Summary

• Interconnect delay increases with technology scaling

• Linear interconnect delay with buffer insertion

• Buffer insertion with candidate buffer locations

• Pruning for accelerating buffer insertion technique

Documents

EE4271 VLSI Design Interconnect Optimizations Buffer Insertion