84
On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Embed Size (px)

Citation preview

Page 1: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

On-Chip Interconnect Trend and Design Optimization

Chung-Kuan ChengUC San Diego, La Jolla, CA

Page 2: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Outlines• Global Interconnect Technologies

– RC Trees and Transmission Lines

• Prefix Adder Synthesis– Modeling

• FPGA Interconnect Architecture– Modeling

• Interconnect Architecture– Non-Manhattan Wire Arrangement

2

Page 3: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Interconnect Technologies• Introduction• On-Chip Global Interconnection • Global Wire Modeling• Performance Comparison

3

Page 4: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

4

Introduction – Performance Impact Interconnect delay determines the

system performance [ITRS08] 542ps for 1mm minimum pitch Cu global

wire w/o repeater @ 45nm ~150ps for 10 level FO4 delay @ 45nm

[Ho2001] “Future of Wire”

Page 5: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Introduction – Power Dissipation• Interconnects consume a significant portion of power

– 1-2 order larger in magnitude compared with gates• Half of the dynamic power dissipated on repeaters to minimize latency [Zhang07]

– Wires consume 50% of total dynamic power for a 0.13um microprocessor [Magen04]• About 1/3 burned on the global wires.

5

Page 6: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

6

Introduction – Technology Trend• On-Chip Interconnect Scaling

– Dimension shrinks • Wire resistance increases -> RC delay

• Increasing capacitive coupling -> delay, power, noise, etc.

– Performance of global wires decreases w/ technology scaling.

Wire Category Technology Node

90nm 45nm 22nm

M1 Wire

Rw(kohm/mm) 1.914 8.860 34.827

Cw(pF/mm) 0.183 0.157 0.129

Global Wire

Rw(kohm/mm) 0.532 2.970 11.000

Cw(pF/mm) 0.205 0.179 0.151

Copper resistivity versus wire width Scaling trend of PUL wire resistance and capacitance

Page 7: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Organization of On-Chip Global Interconnections

7

Page 8: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Multi-Dimensional Design Consideration

8

Preliminary analysis results assuming 65nm CMOS process.

Application-oriented choice Low LatencyT-TL or UT-TL T-TL or UT-TL -> Single-Ended T-lines-> Single-Ended T-lines High ThroughputR-RCR-RC Low PowerPE-TL or UE-TLPE-TL or UE-TL Low NoisePE-TL or UE-TLPE-TL or UE-TL Low Area/CostR-RCR-RC

Differential T-linesDifferential T-lines

For each architecture, the more area the pentagon covers, the better overall performance is achieved.

Page 9: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

On-Chip Global Interconnect Schemes (1)

9

Repeated RC wires (R-RC)

Un-TerminatedUn-Terminated andand Terminated T-Line Terminated T-Line

((UT-TLUT-TL andand T-TL T-TL))

R-RC structure Repeater size/Length of segments Adopt previous design methodology

[Zhang07] UT-TL structure

Full swing at wire-end Tapered inverter chain as TX

T-TL structure Optimize eye-height at wire-end Non-Tapered inverter chain as TX

Page 10: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

On-Chip Global Interconnect Schemes (2)

10

Un-Equalized Un-Equalized andand Passive-Equalized T-LinePassive-Equalized T-Line

((UE-TLUE-TL andand PE-TLPE-TL))

Driver side: Tapered differential driver Receiver side: Termination resistance, Sense-Amplifier (SA) + inverter chain Passive equalizer: parallel RC network Design Constraint: enough eye-opening (50mV) needed at the wire-end

Page 11: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Effects of driver impedance and termination resistance on step response

11

Larger driver impedance leads to slower rise edge and lower saturation voltage Larger termination resistance causes sharper rise edge but with larger reflection

Optimal Rload

Page 12: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Bit-rate: 50Gbps

Rs=11.06ohm, Rd=350ohm, Cd=0.38pF,

RL=107.69ohm

12

Page 13: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Global Wire Modeling – Single-Ended & Differential On-Chip T-lines

13

Determine the bit rate Smallest wire dimensions that satisfy eye constraint Notice PE-TL needs narrower wire -> Equalization helps to increase density.

Orthogonal layers replaced by ground planes -> 2D cap extraction, accurate when loading density is high.

Top-layer thick wires used -> dimension maintains as technology scales. LC-mode behavior dominant

Page 14: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Global Wire Modeling – RC wires and T-lines• RC wire modeling

• T-line 2D-R(f)L(f)C parameter extraction

• T-line Modeling– R(f)L(f)C Tabular model -> Transient simulation to estimate eye-height.

– Synthesized compact circuit model [Kopcsay02] -> Study signal integrity issue.14

2D-C Extraction Template2D-C Extraction Template 2D-R(f)L(f) Extraction Template2D-R(f)L(f) Extraction Template

Distributed Π model composed of wire resistance and capacitance

Closed-form equations [Sim03] to calculate 2D wire capacitance

Page 15: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

15

Performance Analysis – Definitions • Normalized delay (unit: ps/mm)

– Propagation delay includes wire delay and gate delay.

• Normalized energy per bit (unit: pJ/m)

– Bit rate is assumed to be the inverse of propagation delay for RC wires

• Normalized throughput (unit: Gbps/um)

Page 16: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Performance Analysis – Latency

16

Variables: technology-defined parameters Supply voltage: Vdd (unit: V) Dielectric constant: Min-sized inverter FO4 delay: (unit: ps)

r

R-RC structure (min-d)

is roughly constant

FO4 delay scales w/ scaling factor S

0r

Increasing w/ technology scaling!Increasing w/ technology scaling!

T-line structures Sum of wire delay and TX delay Wire delay TX delay improved w/ FO4 delay

Decreasing w/ technology scaling!Decreasing w/ technology scaling!

21/ , ,nmos w w rc S r S c

r

1/ S

Page 17: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Performance Analysis – Energy per Bit

17

Same variables defined before

R-RC structure (min-d)

Vdd reduces as technology scales reduces as technology scales

Energy decreases w/ technology scaling!Energy decreases w/ technology scaling!

T-line structures

Sum of power consumed on wire and TX. Power of T-line Power of TX circuit

FO4 delay reduces exponentially

Energy decreases w/ larger slope!!Energy decreases w/ larger slope!!

r

2DDV

2DDfCV

Constant !

Page 18: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Performance Analysis – Throughput

18

Same variables defined before

R-RC structure (min-d)

Assuming wire pitch

FO4 delay reduces exponentially

Throughput increases by Throughput increases by

20% per generation!20% per generation!

T-line structures

TX bandwidth Neglect the minor change of wire pitch

K1 = 0, for UT-TL

FO4 delay reduces exponentially

Throughput increases by Throughput increases by

43% per generation !!43% per generation !!

1/1/ S

Page 19: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Design Framework for On-Chip T-line Schemes

19

Proposed framework can be applied to design UT-TL/T-TL/UE-TL/PE-TL by changing wire configuration and circuit structure.

Different optimization routines (LP/ILP/SQP, etc) can be adopted according to the problem formulation.

Page 20: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Experimental Settings• Design objective: min-d• Technology nodes: 90nm-22nm• Five different global interconnection structures• Wire length: 5mm • Parameter extraction

– 2D field solver CZ2D from EIP tool suite of IBM– Tabular model or synthesized model

• Transistor models– Predictive transistor model from [Uemura06]– Synopsys level 3 MOSFET model tuned according to ITRS roadmap

• Simulation– HSPICE 2005

• Modeling and Optimization– Linear or non-linear regression/SQP routine– MATLAB 2007

20

Page 21: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Performance Metric: Normalized Delay – Results and Comparison

21

Technology trends R-RC ↑ T-line schemes ↓

T-line structures Outperform R-RC beyond 90nm Single-ended: lowest delay

At 22nm node R-RC: 55ps/mm T-lines: 8ps/mm (85%

reduction) Speed of light: 5ps/mm

Linear model < 6% average percent error

Page 22: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Performance Metric: Normalized Energy per Bit – Results and Comparison

22

Technology trends R-RC and T-lines ↓ T-lines reduce more quickly

T-line structures Outperform R-RC beyond 45nm Differential: lowest energy. Single-ended similar to R-RC.

T-TL > UT-TL

At 22nm node R-RC: 100pJ/m Single-ended: 60% reduction Differential: 96% reduction

Linear model < 12% average percent error Error for T-TL and PE-TL

RL and passive equalizers.

Page 23: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Performance Metric: Normalized Throughput – Results and Comparison

23

Technology trends R-RC and T-lines ↑ T-lines increase more quickly

T-line structures Outperform R-RC beyond 32nm Differential better than single-ended

At 22nm node R-RC: 12Gbps/um T-TL: 30% improvement UE-TL: 75% improvement PE-TL: ~ 2X of R-RC

Linear model < 7% average percent error

Page 24: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Signal Integrity – single-ended T-lines

24

Worst-case switching pattern for peak noise simulationWorst-case switching pattern for peak noise simulation

UT-TL structure 380mV peak noise at 1V supply voltage w/ 7ps rise time SI could be a big issue as supply voltage drops

T-TL less sensitive to noise At the same rise time, ~ 50% reduction of peak noise Peak noise ↓ as technology scales

Using w.c. pattern

Using single or multiple PRBS patterns

Page 25: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Signal Integrity – differential T-lines

25

More reliable Termination resistance Common-mode noise reduction

Peak noise Within ~10mV range

Eye-Heights UE-TL

Eye reduces as bit rate ↑ Harder to meet constraint.

PE-TL > 70mV eye even at 22nm node Equalization does help!

Worst-case switching pattern for peak noise simulationWorst-case switching pattern for peak noise simulation

Page 26: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Summary (cont’)

26

90nm90nm 65nm65nm 45nm45nm 32nm32nm 22nm22nm

R-RC 3/35 1/42 1/46 1/55 1/55

UT-TL 5/15 5/13 5/10 5/9 5/8

T-TL 5/15 5/13 5/10 5/9 5/8

UE-TL 1/37 3/25 3/16 3/12 5/8

PE-TL 1/37 3/25 3/16 3/12 5/8

Tech Tech NodeNode

SchemesSchemes

90nm90nm 65nm65nm 45nm45nm 32nm32nm 22nm22nm

R-RC 5/5 5/6 3/8 3/10 2/12

UT-TL 2/3.3 1/3.3 1/3.3 1/3.3 1/3.3

T-TL 1/3 2/3.4 2/6 2/9 3/16

UE-TL 3/3 3/5 4/9 4/13 4/21

PE-TL 4/4 4/5.3 5/9 5/15 5/24

Tech Tech NodeNode

SchemesSchemes

90nm90nm 65nm65nm 45nm45nm 32nm32nm 22nm22nm

R-RC 2/150 2/140 1/130 1/100 1/100

UT-TL 3/140 3/110 3/70 3/50 2/40

T-TL 1/260 1/200 2/100 2/60 3/40

UE-TL 4/60 4/36 4/20 4/10 5/4

PE-TL 5/26 5/16 5/8 5/5 5/2

Tech Tech NodeNode

SchemesSchemes

90nm90nm 65nm65nm 45nm45nm 32nm32nm 22nm22nm

R-RC 1 1 1 1 1

UT-TL 1 1 1 1 1

T-TL 3 3 3 3 3

UE-TL 5 5 4 4 4

PE-TL 4 4 5 5 5

Tech Tech NodeNode

SchemesSchemes

Low-Latency Application (ps/mm) Low-Energy Application (pJ/m)

High-Throughput Application (Gbps/um) Low-Noise Application

Item in the table: score/value. Score: the higher, the better in terms of given metric, max. score is 5. The best structure in each column marked using red color.

Page 27: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Summary of Global Interconnect

27

Compare five different global interconnections in terms of latency, energy per bit, throughput and signal integrity from 90nm to 22nm.

A simple linear model provided to link Architecture-level performance metrics Technology-defined parameters

Some observations from experimental results T-line structures have potential to replace R-RC at future node Differential T-lines are better than single-ended

Low-power/High-throughput/Low-noise Equalization could be utilized for on-chip global interconnection

Higher throughput density, improve signal integrity Even w/ lower energy dissipation (passive equalizations)

Page 28: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Prefix Adder Synthesis

• Motivation• Prefix Adder Formulation

– Area/Timing/Power Models– Mixed-Radix (2,3,4) Adders– ILP Formulation

• Experimental Results

28

Page 29: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Motivation: Prefix Adder• Increasing impact of physical design• and concern of power.

29

Logical Levels

Wire Tracks

Fanouts

Area

Physical placement

Detail routing

Timing

Gate Cap

Wire Cap

Gate sizingBuffer insertion

Signal slope

Input arrival time

Output require time

Power

Static power

Dynamic power

Power gating

Activity Probability

Page 30: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Prefix Adder Formulation• Input: two n-bit binary numbers

and , one bit carry-in• Output: n-bit sum and one bit

carry out • Prefix Addition: Carry generation &

propagation

011... aaan

011... bbbn

30

0c

011... sssn

nc

)(

:Propagate

:Generate

1

iiii

iiii

iii

iii

bacs

cpgc

bap

bag

Page 31: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Prefix Addition – Formulation

iiiiii bapbag

31

Pre-processing:

Post-processing:

Prefix Computation:

iii

iii

cps

cPGc

0]0:[]0:[1

]:1[]:[]:[

]:1[]:[]:[]:[

kjjiki

kjjijiki

PPP

GPGG

Page 32: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Prefix Adder – Prefix Structure Graph

32

1234

12:13:14:1

gpi

pi

G[i:0]

si

biai

GP[i, j] GP[j-1, k]

GP[i, k]

gp generator

sum generator

GP cell

Pre-processing

Post-processing

Prefix Computation

Page 33: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Area Model

• Distinguish physical placement from logical structure, but keep the bit-slice structure.

33

Logical view Physical view

Bit position

Lo

gica

l leve

l

Bit position

Ph

ysical le

vel

Compact placement

12345678 12345678

Page 34: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Timing Model

• Cell delay calculation:pfd

34

Effort Delay Intrinsic Delay

hgf

Logical EffortElectrical Effort = Cout/Cin=(fanouts+wirelength) / size

Intrinsic properties of the cell

Page 35: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Power Model

• Total power consumption: Dynamic power + Static Power

• Static power: leakage current of devicePsta = *#cells

• Dynamic power: current switching capacitancePdyn = Cload

• is the switching probability = j (j is the logical level*)

35

cellsCjPPP loadstadyntotal # * Vanichayobon S, etc, “Power-speed Trade-off in Parallel Prefix Circuits”

Page 36: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Interval Adjacency Constraint

H1H2H3H4H5H6H7H8

12345678

(7,3): Interval [7,1]

(3,2): Interval [3,1]

(7,2): Interval [7,4]

Must be adjacent,i.e. 4 = 3 + 1

36(column id, logic level)

Page 37: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Linearization for Interval Adjacency Constraint

(i, j)

(i, h) (k1, l1) (k2, l2)

wl wr1 wr2

],[ ),(),(R

hiL

hi yy

37

],[ )1,1()1,1(R

lkL

lk yy ],[ )2,2()2,2(R

lkL

lk yy

],[ ),(),(R

jiL

ji yy

11 if 1),(),( (i,j,k,l) wrwl(i,j,h) yy Llk

Rhi

1 if 1),,,(1),(

),( wl(i,j,h) lkjiwrkylk

Rhi

11 ),,,(1),(

),( wl(i,j,h))(nlkjiwrkylk

Rhi

11 ),,,(1),(

),( wl(i,j,h))(nlkjiwrkylk

Rhi

iyLji ),(

Linearize

Pseudo Linear

Left interval bound equal to column index

Page 38: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

ILP Formulation Overview

38

Structure variables: •GP cells•Connections (wires)•Physical positions

Capacitance variables: •Gate cap•Vertical wire cap•Horizontal wire cap

Timing variables: •Input arrival time•Output arrival time

Power Objective

ILPILOG CPLEX

Optimal Solution

Page 39: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Experiments – 16-bit Uniform Timing

39

Page 40: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Experiments – 16-bit Uniform Timing

40

Page 41: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Min-Power Radix-2 Adder (delay= 22, power = 45.5FO4 )

41

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

Page 42: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Min-Power Radix-2&4 Adder (delay=18, power = 29.75FO4 )

42

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

Radix-2 Cell Radix-4 Cell

Page 43: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Min-Power Mixed-Radix Adder (delay=20, power = 28.0FO4)

43

1

1

2

2

3

3

4

4

5

5

6

6

7

7

8

8

9

9

10

10

11

11

12

12

13

13

14

14

15

15

16

16

Radix-2 Cell Radix-4 Cell Radix-3 Cell

Page 44: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Experiments – 64-bit Hierarchical Structure (Mixed-Radix)

• Handle high bit-width applications• 16x4 and 8x8

ILP Block ILP Block ILP Block ILP Block

ILP Block

a1b1a16b16a17b17a32b32a33b33a48b48a49b49a64b64

…... …... …... …...

Level 1

Level 2

…... …... …... …...

…... …... …... …...

GP*[64:50]GP*[48:34] GP*[32:18] GP*[16:2]

GP*[1:1]GP*[17:17]GP*[33:33]GP*[49:49]

…... …... …...H64 H49 H48 H33 H32 H17 H16 H1

44

Page 45: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

FPGA Global Routing Architecture

• Synthesis Flow• Formulation• Experimental Results

45

Page 46: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

46

Synthesis Flow

Page 47: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Formulation

Latency

PowerArea

cost

Architecture Design Tradeoffs

47

Page 48: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

FPGA Global Routing Architecture

48

Page 49: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Energy Model: Wires • 0.18um tech node, grid length = 0.5mm• 4 types of wires: RC wires with spacing and

transmission

Pw: Per-Bit Wire Energy

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8

Wire Length ( x Grid Length)

En

erg

y (p

J/sw

itch

)

RC 1x

RC 2x

RC 4x

T-line 10x

49

Page 50: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Energy and Area Model: Switch Box

1

2s u s u sP P f P N f P F f 50

Switch Area Model Fs: Number of switches

connected to each wire entering a switch box

f: Total flow incoming a switch box

Ns: Per-bit number of switches inside a switch box

Energy Model Pu: energy of a single switch Ps: Per-bit switch energy

1

2s sN N f F f

W

Page 51: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Topology Generation• Candidate topologies are required for MCF interconnection synthesis

– MCF optimizes flow distribution, but not topology• Huge number of different topologies exists

– A row of 10 cells has 2^C(10, 2) = 2^45 different connections– A 1010 FPGA has (2^45)^20 = 2^900 different topologies!

• Our assumptions– Each row and column has the same connection– Wire lengths are given (e.g. wire length = 1, 2, 4, 8…)– A certain wire length repeats itself till the end of the chip

51

Page 52: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Representative Netlist Generation• Properties of Representative Netlist

– Matches the size of the benchmark netlists• Geometry Distribution Function

– The probability of the distance between two pins decreases exponentially when distance increases

– k: distance between pins – p: probability of distance-1 links– P(k): probability of distance-k links

1( ) (1 ) , 1,2,....kP k p p k

52

Page 53: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

MCF Interconnection Synthesis • Integrate multiple wire styles to MCF formulation• Notations

– Wire style parameter: (Pe, Ae), Pe=Pw+Ps

– Area Ar: Routing area on vertical and horizontal dimension

– dj:Communication demand for net j, dj=1

– Flow f(t): flow amount on a steiner tree t

53

Page 54: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

MCF Formulation: Energy Optimization

54

Routability constr.

Routing Area constr.

Obj: Min Energy

Page 55: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Experiment Settings• Seven of MCNC benchmark circuits

– Technology mapped to 4-LUTs, each logic block contains 16 4-LUTs

– Size of 10x10 to 11x11 switch boxes, 500 ~ 1000 nets

• Candidate topologies– Available segment length = 1, 2, 4, 8– Total number of candidate topologies: 93

alu4 apex4 diffeq dsip ex5p misex3 tseng

size 11x11 10x10 11x11 11x11 10x10 11x11 10x10

# of nets 621 798 945 593 745 771 788

55

Page 56: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Energy Optimization: Optimized FPGA Routing Architectures

56

Energy Impv:19%Energy Impv:27%Energy Impv:28%

Energy:6.46 x10^3 pJEnergy:5.24 x10^3 pJEnergy:4.74 x10^3 pJEnergy: 4.63 x10^3 pJ

Routing Area: 1500 mRouting Area: 2500 mRouting Area: 3500 mRouting Area: 4500 m

RC 1x

RC 2x

RC 4x

T-Line 10x

Page 57: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Energy Optimization: Impact of Routing Area

• Total energy of the 7 benchmarks with optimized FPGA routing architectures

1.2

1.7

2.2

2.7

3.2

3.7

4.2

4.7

1500 2000 2500 3000 3500 4000 4500

Routing Area (um)

En

erg

y (

x1

0^

3 p

J) alu4

apex4

diffeq

dsip

ex5p

misex3

tseng

57

Page 58: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Interconnect Architecture1. Wire Directions (M, Y, X, E)2. Layout Region (M, D, Y, X)3. Power Ground and Clock Distributions4. Layer Assignment5. Via Arrangement

Comparison1. Wire Length2. Throughput3. Grid vs No-grid

58

Page 59: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

(a) A 7 by 7 mesh with Y-architecture

(b) A 7 by 7 mesh with Manhattan-architecture (c) A 7 by 7 mesh with X-architecture

7 by 7 meshes with different interconnect architectures

1. Wire Directions and Models

59

Page 60: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

(a) A level 2 hexagonal mesh (b) A level 2 octagonal mesh

(c) A level 2 Diamond mesh

Fig. 10 Meshes with symmetrical structures

2. Layout Regions and Models

60

Page 61: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Length of 2 pin-nets to extend an area

LengthShape

Man. Y-Arch X-Arch Euclidean

M: Diamond

1.250 1.118 1.066 1.016

Y: Hexagon

1.101

X: Octagon

1.055

E: Circle 1.273 1.103 1.055 1.000

E (worst) 1.414 1.155 1.082 1.000

Page 62: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Throughput : concurrent flow demand

ThroughputShape

Manhattan Y-Arch X-Arch*

M: Square 1.000 1.225 1.346

M (Bound) 1.241 1.356

M: Diamond

1.195

Y: Hexagon 1.315

X: Octafon 1.420

*ratio of 0-90 planes and 45-135 planes is not fixed

Page 63: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Flow congestion map for uniform 90 Degree meshes

63

Page 64: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

12 by 12 13 by 13

Congestion map of square chip using X-architecture

64

Page 65: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

12 by 12 13 by 13

Congestion map of square chip using Y-architecture

65

Page 66: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Explanation For Throughput Increasing

(a) 90-degree routing (b) 45-degree routing

d

d

Number of lines across the vertical center cut-line:

d/D for 90 degree routing

for 45 degree routingDd /2

66

Page 67: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

67

Page 68: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

68

Page 69: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

69

Page 70: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Global Grids (Power/Ground Mesh)

(http://www.xinitiative.org/img/062102forum.pdf)

X-Architecture Y-Architecture

Page 71: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

3. Clock Tree on Square Mesh• N-level clock tree:

– path distance =

21% less than H-tree– total wire length =

9% less than H tree, 3% less than X tree

• No self-overlapping between parallel wire segments

71

Page 72: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

4. Layer Assignment

I II III IVAssignment

Layer 1

Layer 2

Layer 3

Layer 4

Different routing direction assignment

72

Page 73: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

N z(I) z(II) z(III) z(IV)

5 1.02 0.83 0.83 1.01

6 0.97 0.73 0.74 0.97

7 0.94 0.71 0.71 0.93

8 0.90 0.69 0.69 0.90 

Normalized throughput of mixed 45-degree and 90-degree mesh with different routing layer assignments

73

Page 74: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Why interleaving Manhattan Layer and Diagonal Layer Improves Throughput?

Shortest path between two points on the plane are always a concatenation of a Manhattan line and a Diagonal line.

(2,0)

(0,3)

Wirelength = 5.0

Wirelength = 3.82

74

Page 75: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Observations

• Routing Direction Assignment Strategies Can Affect the Communication Throughput.

• Interleaving the Manhattan Routing Layers and Diagonal Routing Layers can produce better Throughput

75

Page 76: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

5. Via Arrangement: Banks and Tunnels• Use tunnels to detour around vias• Use banks of tunnels to maximize the

throughput• Use bottom k layers to perform intra-cell

routing• Use top n-k layers to distribute signals to the

banks

76

Page 77: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Via-Oriented Interconnect Planning

77

Page 78: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Via-Oriented Interconnect Planning

tunnel

78

Page 79: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Via-Oriented Interconnect Planning

Full bandwidth

k+2 overhead

#vias= kLOverhead=k+2 verticalTracksL: dimension of the bank

Bank of tunnels

79

Page 80: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Blocking 5 tracks on the layer of 60-degree direction

Tunnel of Y Arch.

80

Page 81: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Tunnels of Y Arch.

81

Page 82: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

3.2 Via-Oriented Interconnect Planning

Bank of tunnels

#vias= c1kL

Overhead=k+c2 tracks

82

Page 83: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Conclusion• Global Interconnect Technologies

– EM waves + Devices

• Prefix Adder Synthesis– Formulation + ILP

• FPGA Interconnect Architecture– Formulation + LP

• Interconnect Architecture– Lambda Geometry + Vias

83

Page 84: On-Chip Interconnect Trend and Design Optimization Chung-Kuan Cheng UC San Diego, La Jolla, CA

Thank you!Q & A

84