29
A Bus Architecture for A Bus Architecture for Crosstalk Elimination Crosstalk Elimination in High Performance in High Performance Processor Design Processor Design Wen-Wen Hsieh Wen-Wen Hsieh Advisor : Ting Ting Hwang Advisor : Ting Ting Hwang

A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Embed Size (px)

Citation preview

Page 1: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

A Bus Architecture for A Bus Architecture for Crosstalk Elimination Crosstalk Elimination in High Performance in High Performance

Processor DesignProcessor Design

Wen-Wen HsiehWen-Wen HsiehAdvisor : Ting Ting HwangAdvisor : Ting Ting Hwang

Page 2: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

OutlineOutline

IntroductionIntroduction Motivation and ObservationMotivation and Observation The Proposed Bus ArchitectureThe Proposed Bus Architecture Experimental ResultsExperimental Results ConclusionConclusion

Page 3: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

IntroductionIntroduction

Crosstalk is the effect due to the Crosstalk is the effect due to the coupling coupling capacitancescapacitances..

Crosstalk causes Crosstalk causes additional delayadditional delay, , power power consumptionconsumption and and incorrect resultincorrect result of a of a circuit. circuit.

Crosstalk effect becomes much more Crosstalk effect becomes much more serious in serious in long on-chip buslong on-chip bus..

Page 4: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

3C3C

TTi-1i-1 TTii

WWj-1j-1

WWjj

WWj+1j+1

4C4C

TTi-1i-1 TTii

WWj-1j-1

WWjj

WWj+1j+1

Crosstalk TypeCrosstalk Type

Crosstalk is classified into 4 types Crosstalk is classified into 4 types [Duan2001][Duan2001]

1C1C

TTi-1i-1 TTii

WWj-1j-1

WWjj

WWj+1j+1

2C2C

TTi-1i-1 TTii

WWj-1j-1

WWjj

WWj+1j+1

Page 5: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Delay with / without Delay with / without CrosstalkCrosstalk

Delay comparison for bus length 10mm in 100 Delay comparison for bus length 10mm in 100 nmnm process process [Duan2001][Duan2001]

0

100

200

300

400

500

600

700

800

900

1000

0C 1C 2C 3C 4C

Tim

e (

ps

)

Page 6: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Bit Ratio of 3C and 4CBit Ratio of 3C and 4C

3C 3C and and 4C 4C types of crosstalk cause types of crosstalk cause serious serious delay penaltydelay penalty but take only a but take only a small portionsmall portion of of the total transmitted data.the total transmitted data.

benchmarkbenchmarkbits of

instruction bits of

3C and 4C

ratio of

3C and 4C (%)

multiply 180736 6430 3.63.6

update 576480 20256 3.53.5

convolution 168192 5914 3.53.5

dot_product 108256 4070 3.83.8

fir2dim 195296 7500 3.83.8

fir 134016 5048 3.83.8

irr_nsection 301440 10698 3.53.5

matrix 107424 3983 3.63.6

lms 2036064 73427 3.63.6

Page 7: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Fetch Rate and Commit Fetch Rate and Commit RateRate

In superscalar architecture, the instruction In superscalar architecture, the instruction fetch ratefetch rate is much is much higherhigher than instruction than instruction commit ratecommit rate in bus transmission. in bus transmission.

0%

20%

40%

60%

80%

100%

multiply dot_product update matrix convolution irr_nsection fir lms fir2dim

average 36.03%

commit rate

Page 8: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Basic ArchitectureBasic Architecture

MemoryMemory ProcessorProcessorPrefetch Prefetch unit unit

mmbb

busbus

de-de-assemblerassembler

bbb+nb+nassemblerassembler

bb

busbus

Page 9: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

bus width = 128, channel number = 4, channel size = 32bus width = 128, channel number = 4, channel size = 32

Bus StructureBus Structure

Memory

bus

Prefetch unit

channel1

channel2

channel3

channel4

dataT, 1dataT, 1

dataT, 2dataT, 2

dataT, 3dataT, 3

dataT, 4dataT, 4

Page 10: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

An Example at Cycle An Example at Cycle tt

crosstalkMemory

channel1

channel2

channel3

channel4

Prefetch unit

data sent at cycle t-1 are recorded

crosstalk

datat, 3datat, 3

no crosstalk

datat, 4datat, 4

datat, 4datat, 4datat, 3datat, 3

no crosstalk

datat-1, 1datat-1, 1

datat-1, 2datat-1, 2

datat-1, 3datat-1, 3

datat-1, 4datat-1, 4

datat, 2datat, 2

datat, 1datat, 1

datat, 3datat, 3

datat, 2datat, 2

crosstalk?

NOPNOP

datat, 1datat, 1

NOPNOP

datat, 2datat, 2

bus width = 128, channel number = 4, channel size = 32bus width = 128, channel number = 4, channel size = 32

Page 11: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

datadatat, i+1t, i+1datadatat, i+1t, i+1

Separation BitsSeparation Bits

Crosstalk elimination between adjacent data Crosstalk elimination between adjacent data segments.segments.

Distinguish data segment from NOP Distinguish data segment from NOP segment.segment.

datadatat, it, idatadatat, it, i ?XX

XX

00 000 0

data or NOP ?data or NOP ?

?NO crosstalkNO crosstalk

Page 12: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Crosstalk Free Crosstalk Free ConnectionConnection

00 00 00

X

datadatat, it, i

XX

datadatat, i+1t, i+1

?? XX

Page 13: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Crosstalk Free Crosstalk Free ConnectionConnection

1 1 1

0 0 0

0 1 1

1 1 0

0 1 0

0 0 1

1 0 1

1 0 0

00 11 00

11 11 00

11 11 11

11 00 00

11 00 1100 00 00

00 00 1100 11 11

Page 14: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

1 1 1

0 0 0

0 1 1

1 1 0

0 1 0

0 0 1

1 0 1

1 0 0

00 11 00

11 11 00

11 11 11

11 00 00

11 00 1100 00 00

00 00 1100 11 11

Crosstalk Free CyclicCrosstalk Free Cyclic Any pairs in Any pairs in crosstalk free cycliccrosstalk free cyclic incur no crosstalk incur no crosstalk

Page 15: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Data Segment Data Segment CombinationCombination

00

0 0 00 0 000

datadatat, it, i is REAL DATA is REAL DATA datadatat, it, i is NOP is NOP

11

00

11

00

11

11

0 0 00 0 000

11

datadatat, it, idatadatat, it, i datadatat, i+1t, i+1

datadatat, i+1t, i+1 datadatat, it, idatadatat, it, i datadatat, i+1t, i+1

datadatat, i+1t, i+1

Page 16: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Separation Bits Separation Bits AssignmentAssignment

00

0 0 00 0 000

datadatat, it, i is REAL DATA is REAL DATA datadatat, it, i is NOP is NOP

datadatat, it, idatadatat, it, i

11

00

11

00

11

11

0 0 00 0 000

11

1 01 0

1 01 0

1 01 0

1 01 0

0 00 0

0 00 0

separation bits isseparation bits is 1 01 0 separation bits isseparation bits is 0 00 0

datadatat, i+1t, i+1datadatat, i+1t, i+1 datadatat, it, i

datadatat, it, i datadatat, i+1t, i+1datadatat, i+1t, i+1

Page 17: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

De-Assembler De-Assembler ArchitectureArchitecture

NOPNOP regreg regreg regreg regreg

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

Sel_logicSel_logic

MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1

MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2

separation separation unitunit

[134:103][134:103]channelchannel11

[100:69][100:69]channelchannel22

[66:35][66:35]channelchannel33

[32:1][32:1]channelchannel44

separation separation bitsbits

[102:101][102:101]

separation separation bitsbits

[68:67][68:67]

separation separation bitsbits

[34:33][34:33]

separation separation bitsbits

[0][0]

datadata11

[127:96][127:96]datadata22

[95:64][95:64]datadata33

[63:32][63:32]datadata44

[31:0][31:0]

Page 18: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

De-Assembler De-Assembler ArchitectureArchitecture

NOPNOP regreg regreg regreg regreg

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

Sel_logicSel_logic

MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1

MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2

separation separation unitunit

[134:103][134:103]channelchannel11

[100:69][100:69]channelchannel22

[66:35][66:35]channelchannel33

[32:1][32:1]channelchannel44

separation separation bitsbits

[102:101][102:101]

separation separation bitsbits

[68:67][68:67]

separation separation bitsbits

[34:33][34:33]

separation separation bitsbits

[0][0]

datadata11

[127:96][127:96]datadata22

[95:64][95:64]datadata33

[63:32][63:32]datadata44

[31:0][31:0]

Page 19: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

De-Assembler De-Assembler ArchitectureArchitecture

NOPNOP regreg regreg regreg regreg

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

Sel_logicSel_logic

MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1

MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2

separation separation unitunit

[134:103][134:103]channelchannel11

[100:69][100:69]channelchannel22

[66:35][66:35]channelchannel33

[32:1][32:1]channelchannel44

separation separation bitsbits

[102:101][102:101]

separation separation bitsbits

[68:67][68:67]

separation separation bitsbits

[34:33][34:33]

separation separation bitsbits

[0][0]

datadata11

[127:96][127:96]datadata22

[95:64][95:64]datadata33

[63:32][63:32]datadata44

[31:0][31:0]

Page 20: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

De-Assembler De-Assembler ArchitectureArchitecture

NOPNOP regreg regreg regreg regreg

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

Sel_logicSel_logic

MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1

MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2

separation separation unitunit

[134:103][134:103]channelchannel11

[100:69][100:69]channelchannel22

[66:35][66:35]channelchannel33

[32:1][32:1]channelchannel44

separation separation bitsbits

[102:101][102:101]

separation separation bitsbits

[68:67][68:67]

separation separation bitsbits

[34:33][34:33]

separation separation bitsbits

[0][0]

datadata11

[127:96][127:96]datadata22

[95:64][95:64]datadata33

[63:32][63:32]datadata44

[31:0][31:0]

Page 21: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

De-Assembler De-Assembler ArchitectureArchitecture

NOPNOP regreg regreg regreg regreg

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

Sel_logicSel_logic

MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1

MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2

separation separation unitunit

[134:103][134:103]channelchannel11

[100:69][100:69]channelchannel22

[66:35][66:35]channelchannel33

[32:1][32:1]channelchannel44

separation separation bitsbits

[102:101][102:101]

separation separation bitsbits

[68:67][68:67]

separation separation bitsbits

[34:33][34:33]

separation separation bitsbits

[0][0]

datadata11

[127:96][127:96]datadata22

[95:64][95:64]datadata33

[63:32][63:32]datadata44

[31:0][31:0]

Page 22: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

De-Assembler De-Assembler ArchitectureArchitecture

NOPNOP regreg regreg regreg regreg

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

cross-cross-detectordetector

Sel_logicSel_logic

MUX1MUX1 MUX1MUX1 MUX1MUX1 MUX1MUX1

MUX2MUX2 MUX2MUX2 MUX2MUX2 MUX2MUX2

separation separation unitunit

[134:103][134:103]channelchannel11

[100:69][100:69]channelchannel22

[66:35][66:35]channelchannel33

[32:1][32:1]channelchannel44

separation separation bitsbits

[102:101][102:101]

separation separation bitsbits

[68:67][68:67]

separation separation bitsbits

[34:33][34:33]

separation separation bitsbits

[0][0]

datadata11

[127:96][127:96]datadata22

[95:64][95:64]datadata33

[63:32][63:32]datadata44

[31:0][31:0]

Page 23: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Assembler ArchitectureAssembler Architecture

Prefetch unit (buPrefetch unit (buffer queue)ffer queue)

MUXMUX11 MUXMUX22 MUXMUX33 MUXMUX44

DSel_logicDSel_logic

separation separation bitsbits

[102:101][102:101]

separation separation bitsbits

[68:67][68:67]

separation separation bitsbits

[34:33][34:33]

separation separation bitsbits

[0][0][134:103][134:103]channelchannel11

[100:69][100:69]channelchannel22

[66:35][66:35]channelchannel33

[32:1][32:1]channelchannel44

Page 24: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Performance Performance Improvement Improvement

techtech 100nm100nm 70nm70nm

busbus

lengthlength10mm10mm 15mm15mm 20mm20mm 10mm10mm 15mm15mm 20mm20mm

0C0C 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00 1.001.00

1C1C 1.941.94 1.891.89 1.731.73 1.611.61 1.571.57 1.741.74

2C2C 5.915.91 6.086.08 5.215.21 4.284.28 4.494.49 4.844.84

3C3C 6.646.64 7.147.14 6.626.62 5.115.11 6.396.39 7.587.58

4C4C 7.577.57 8.508.50 7.667.66 5.875.87 8.048.04 9.869.86

deassemblerdeassembler 0.510.51 0.240.24 0.120.12 0.260.26 0.120.12 0.080.08

assemblerassembler 0.220.22 0.100.10 0.050.05 0.110.11 0.050.05 0.030.03

improvement improvement ratio (%)ratio (%) 12.1512.15 24.4024.40 29.7329.73 20.8320.83 41.9841.98 49.7749.77

Page 25: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Extra Wires Number Extra Wires Number ComparisonComparison

The number of extra wires compares with The number of extra wires compares with Victor’s work. [Victor2001]Victor’s work. [Victor2001]

bus bus widthwidth

OursOurs

Victor’sVictor’s

theoreticaltheoretical

Victor’sVictor’s

practicalpractical

Channel sizeChannel size

44 88 1616 3232

3232 1515 77 33 11 1414 2121

6464 3131 1515 77 33 2828 4545

128128 6363 3131 1515 77 5959 8585

Page 26: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

Cycle Count Overhead Cycle Count Overhead RatioRatio

Channel SizeChannel Size

44 88 1616 3232

complex_multiplycomplex_multiply 0.17%0.17% 0.04%0.04% 0.09%0.09% 0.26%0.26%

complex_updatecomplex_update 0.13%0.13% 0.04%0.04% 0.17%0.17% 0.50%0.50%

ConvolutionConvolution 0.13%0.13% 0.32%0.32% 0.06%0.06% 0.28%0.28%

dot_productdot_product 0.08%0.08% 00 0.13%0.13% 0.21%0.21%

Fir2dimFir2dim 0.03%0.03% 0.08%0.08% 0.06%0.06% 0.18%0.18%

FirFir 0.11%0.11% 0.01%0.01% 0.02%0.02% 0.08%0.08%

iir_Nsectioniir_Nsection 0.14%0.14% 0.11%0.11% 0.06%0.06% 0.37%0.37%

iir_1sectioniir_1section 0.17%0.17% 0.08%0.08% 0.13%0.13% 0.43%0.43%

LmsLms 0.09%0.09% 0.07%0.07% 0.12%0.12% 0.15%0.15%

MatrixMatrix 0.01%0.01% 0.01%0.01% 0.06%0.06% 0.02%0.02%

Matrix1x3Matrix1x3 0.14%0.14% 0.07%0.07% 0.11%0.11% 0.18%0.18%

n_complex_updaten_complex_update 0.05%0.05% 0%0% 0.07%0.07% 0.21%0.21%

n_real_updaten_real_update 0.08%0.08% 0.02%0.02% 0.08%0.08% 0.28%0.28%

real_updatereal_update 0.05%0.05% 0.88%0.88% 0.18%0.18% 0.39%0.39%

averageaverage 0.10%0.10% 0.12%0.12% 0.09%0.09% 0.25%0.25%

Page 27: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

ConclusionConclusion A novel bus structure to eliminate 3C A novel bus structure to eliminate 3C

and 4C crosstalk.and 4C crosstalk.

49.77% performance improvement ratio 49.77% performance improvement ratio in the best case. in the best case.

With only 7 extra wires as compared With only 7 extra wires as compared with 85 [Victor2001with 85 [Victor2001].].

Page 28: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

AppendixAppendix

The area overhead for 128-bit bus width wiThe area overhead for 128-bit bus width with channel size 32th channel size 32

area typearea type OursOurs Victor’sVictor’s

logic logic circuitcircuit

deassembler /deassembler /

encoderencoder

gate countgate count 97949794 885885

area (area (μm μm )) 14792.9714792.97 2359.392359.39

# storage # storage element element

(bits)(bits)128128 00

assembler /assembler /

decoderdecoder

gate countgate count 879879 14021402

area (area (μm μm )) 2053.852053.85 3381.223381.22

# extra wires (bits)# extra wires (bits) 77 8585

Page 29: A Bus Architecture for Crosstalk Elimination in High Performance Processor Design Wen-Wen Hsieh Advisor : Ting Ting Hwang

AppendixAppendix

The overall improvement on bus transmission.

0

0.5

1

1.5

2

2.5

0.1 0.07

The

impr

ovem

ent r

ate

μm