25
Round-robin Arbiter Design and Generation Eung S. Shin Prof. Vincent J. Mooney III Prof. George F. Riley Electrical and Computer Engineering Georgia Institute of Technology

Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

Round-robin Arbiter Design and Generation

Eung S. ShinProf. Vincent J. Mooney III

Prof. George F. RileyElectrical and Computer Engineering

Georgia Institute of Technology

Page 2: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

2

Outline

IntroductionTerminologyRelated WorkBus Arbiter (BA) DesignSwitch Arbiter (SA) DesignRound-robin Arbiter Generator (RAG)Comparison with other Switch ArbitersConclusion

Page 3: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

3

Introduction

As the number of bus masters increases in asingle chip, the importance of fast and powerful arbiters commands more attention.A fast arbiter is one of thedominant factors to achieveterabit switching speeds.To design with high per-formance and fairness inarbitration is a tediousand error-prone task.Our goal is to provide afast and fair arbiter designwith a tool for automaticgeneration.

Network Switch (16x16)

CrossbarSwitch Fabric

(16x16)x16

16 (16x16 arbiter)s

… … …

VOQ(0,16)

VOQ(0,0)

.

.

.input port 0

VOQ(16,0)

.

.

.

VOQ(16,16)

input port 16

.

.

.

.

.

.

output port 16

output port 0

...

.

.

.

...

req(0, 0)

req(16, 16)

grant(0, 0-16) grant(16, 0-16)

Page 4: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

4

Terminology

MxN Switch: M-input by N-output switch.• Example: A 32x32 switch is a 32-input by 32-output switch

with 1024 (322) possible connections between input ports and output ports.

Virtual Output Queues (VOQs): there are VOQs in a switch to remove possible output port contention (Head of Line (HOL) blocking).VOQ (m, n): m is the input port index and n is the output port index.• Example: VOQ (1, 0) is the VOQ of input port 1and queues

packets destined to output port 0.

Page 5: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

5

HOL Blocking Example

01

0

input port 0

input port 1

output port 0

output port 1

Without VOQs

Page 6: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

6

HOL Blocking Example

0

1

0

input port 0

VOQ (0, 0)

VOQ (0, 1)

input port 1

VOQ (1, 0)

VOQ (1, 1)

output port 0

output port 1

With VOQs

Page 7: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

7

Terminology (Continued)

(MxV)xN Switch: • M is the number of

input ports of an MxN switch.

• V is the number of VOQs per input port.

• N is the number of output ports of an MxN switch.

• Typically, V is equal to N.

• The total number of VOQs in an MxN switch is M∗N.

Network Switch (32x32)

CrossbarSwitch Fabric

(32x32)x32

32 (32x32 arbiter)s

… … …

VOQ(0,31)

VOQ(0,0)

.

.

.input port 0

VOQ(31,0)

.

.

.

VOQ(31,31)

input port 31

.

.

.

.

.

.

output port 31

output port 0

...

.

.

.

...

req(0, 0)

req(31, 31)

grant(0, 0-31) grant(31, 0-31)

Page 8: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

8

Terminology (Continued)

(MxV)xN crossbar switch fabric: • There are connections

between (MxV) inputs (from VOQ (0, 0) to VOQ (M-1, V-1)) and N outputs, the number of output ports in the switch fabric.

MxM Switch Arbiter (SA):• An MxM SA controls M

specific transmission gates between M VOQs and a particular output port.

• There are N MxM SAs in an MxN switch.

32 x 32 SA_0

. . .

gran

t (0,

0)

gran

t (1,

0)

gran

t (31

, 0)

VOQ (0, 0)

VOQ (1, 0)

VOQ (31, 0)

.

.

.

.

.

.

32 x 32 SA_31

. . .

gran

t (0,

31)

gran

t (1,

31)

gran

t (31

, 31)

VOQ (0, 31)

VOQ (1, 31)

VOQ (31, 31)

.

.

.

.

.

.

(32x32)x32 Crossbar Switch Fabric

Thirty-two 32x32 SAs

output port 0

output port 31

. . .

.

.

.

.

.

.

Page 9: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

9

Terminology (Continued)

MxM distributed SA (MxM hierarchical SA): plays the same role as an MxM SA.

• Consists of smaller switch arbiter in the form of a hierarchical tree structure.

Bus Arbiter (BA): resolves bus conflicts when multiple bus masters request a bus in the same cycle.

req0[0]req0[1]req0[2]req0[3]

ack0[0]ack0[1]

req1[0]req1[1]req1[2]req1[3]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

req0req1

8x8 hierarchical SAclock

4x4ack-req

SA 0

counterD

4x4ack-req

SA 1

counterD

ack

PriorityLogic 0

req[0]req[1]req[2]req[3]

Ring Counter

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

PriorityLogic 2

PriorityLogic 3

PriorityLogic 1

EN

EN

EN

EN

grant[0]

grant[1]

grant[2]

grant[3]

4x4 BA

ackreset

output[0]output[1]output[2]output[3]

in[0]in[1]in[2]in[3]

D-FFclock

PriorityLogic 0

req[0]req[1]req[2]req[3]

Ring Counter

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

PriorityLogic 2

PriorityLogic 3

PriorityLogic 1

EN

EN

EN

EN

grant[0]

grant[1]

grant[2]

grant[3]

4x4 BA

ackreset

output[0]output[1]output[2]output[3]

in[0]in[1]in[2]in[3]

D-FFD-FFclock

Page 10: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

10

Related Work

Centralized Switch Arbiters:• Dual Round-Robin Matching algorithm (DRRM)

– H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a Larger-capacity Optical ATM Switch,” Proceedings of IEEE ATM Workshop, 1998, pp. 11-16.

• Programmable Priority Encoder (PPE) implementing iterative round-robin algorithm (iSLIP)

– P. Gupta and N. Mckeown, “Designing and Implementing a Fast Crossbar Scheduler,” IEEE Micro, 1999, pp. 20-28.

– N. Mckeown, P. Varaiya, and J. Warland, “The iSLIP Scheduling Algorithm for Input-Queued Switch,” IEEE Transaction on Networks, 1999, pp. 188-201.

Distributed Switch Arbiter:• Ping Pong Arbiter (PPA)

– H. J. Chao, C. H. Lam, and X. Guo, “A Fast Arbitration Scheme for Terabit Packet Switches,” Proceedings of IEEE Global Telecommunications Conference, 1999, pp. 1236-1243.

We will show how our generated SA achieves throughput 2.4X higher than PPE and 1.9X higher than PPA (and thus, at least 1.9X higher than DRRM since PPA outperforms DRRM).

Page 11: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

11

Bus Arbiter Design

Implemented based on ring counter for a token and “priority logic”.Priority Logic for 4 inputs:• output[0] = EN•in[0]• output[1] = EN•in[0]'•in[1]• output[2] = EN•in[0]'•in[1]'•in[2]• output[3] = EN•in[0]'•in[1]'•in[2]'•in[3]

000000001

100010001

0100X1001

0010XX101

0001XXX11

0000XXXX0

output [3]output [2]output [1]output [0]in [3]in [2]in [1]in [0]EN

Page 12: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

12

Example: Bus Arbiter

Condition:• Token=4’b0100 → Processor 2 has the highest priority.• Processor 0 and processor 1 request a bus.

Result:• Only Priority Logic 2 is enabled.• Processor 0 is granted because the higher priority parties

(processor 2 and processor 3) do not request a bus.• Token is rotated to 4’b1000 after the ring counter receives ack

signal.

Processor 2

req[0]

Processor 0 Processor 1 Processor 3

Memory

PL 2

token[2]

ring counter

grant[0]

4x4 BA

req[1]output[2]

ack

Page 13: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

13

Example: Bus Arbiter (Continued)

PriorityLogic 0

Ring Counter

toke

n [0

]

toke

n [1

]

toke

n [2

]

toke

n [3

]

PriorityLogic 2

PriorityLogic 3

PriorityLogic 1

grant[2]

grant[3]

4x4 BA

ackreset

output[0]output[1]output[2]output[3]

req[0]req[1]req[2]req[3]

EN

EN

EN

in[0]in[1]in[2]in[3]

D-FFclock

grant[0]

grant[1]

toke

n [2

]

PriorityLogic 2

EN

grant[0]

Page 14: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

14

Switch Arbiter DesignA hierarchical SA consists of small switch arbiter blocks.There are four types of switch arbiter blocks.• 2x2 ack-req SA.• 4x4 ack-req SA.• 2x2 root SA.• 4x4 root SA.

A root SA placed on the top of a hierarchy.

4x4Bus

Arbiter

ack grant0[0]grant0[1]

grant0[2]grant0[3]req0[3]

req0[0]req0[1]req0[2]

req0

4x4 ack-req SAclockreset

2x2Bus

Arbiter

ack

req0[0]req0[1]

grant0[1]

grant0[2]

req0

2x2 ack-req SAclock

reset

4x4BA

without D flip-flop

ack0ack1ack2ack3

req0req1req2req3

4x4 root SAclock ring counterreset

2x2BA

without D flip-flop

ack0

ack1

req0

req1

2x2 root SAclock ring counterreset

Page 15: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

Key Insight

With TSMC .25µ std. cell library from LEDA Systems, 4x4 is the “sweet spot” of high performance →analogous to std. cell design where using 4-input gates in design speeds up over, say only 2-input gates or 8-input gates.

• Use as many 4x4 as possible.• Use 2x2 if needed.

2x2SA

.24 ns

2x2PPE

.45 ns

2x2PPA

.40 ns

4x4SA

.34 ns4x4 PPA

.65 ns

2x2

2x22x2

4x4PPE

.61 ns

8x8 SA

.53 ns

2x2

4x44x4

8x8 PPA

.85 ns

2x2

2x22x2

2x2

2x22x2

2x28x8PPE

1.12 ns

16x16 SA

.76 ns

4x44x4

4x44x4

4x4

16x16 PPA

1.45 ns

2x2

2x22x2

2x2

2x22x2

2x2

2x2

2x22x2

2x2

2x22x2

2x2

2x2

16x16PPE

1.55 ns

PPE

PPA

Our SA from RAG

Page 16: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

Example: 32x32

hierarchical SA

4x4ack-req

SAl0.sa0

req0[0]req0[1]req0[2]req0[3]

ack0[0]

4x4ack-req

SAl0.sa1

ack0[1]

4x4ack-req

SAl0.sa2

ack0[3]

4x4ack-req

SAl0.sa3

4x4ack-req

SAl1.sa0

req1[0]req1[1]req1[2]req1[3]

req2[0]req2[1]req2[2]req2[3]

req3[0]req3[1]req3[2]req3[3]

ack0[2]

4x4ack-req

SAl0.sa4

req4[0]req4[1]req4[2]req4[3]

ack1[0]

4x4ack-req

SAl0.sa5

ack1[1]

4x4ack-req

SAl0.sa6

ack1[3]

4x4ack-req

SAl0.sa7

4x4ack-req

SAl1.sa1

req5[0]req5[1]req5[2]req5[3]

req6[0]req6[1]req6[2]req6[3]

req7[0]req7[1]req7[2]req7[3]

ack1[2]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant2[0]grant2[1]grant2[2]grant2[3]

grant3[0]grant3[1]grant3[2]grant3[3]

grant4[0]grant4[1]grant4[2]grant4[3]

grant5[0]grant5[1]grant5[2]grant5[3]

grant6[0]grant6[1]grant6[2]grant6[3]

grant7[0]grant7[1]grant7[2]grant7[3]

up_req[0]up_req[1]

req0

req1

req2 req3

req4req5

req6req7

up_ack0

up_ack1

clock

32 x 32 SA Critical Path:

req5[1]

4x4ack-req

SAl0.sa5

req5

up_req[1]

4x4ack-req

SAl1.sa1

2x2rootSA

up_ack1

ack1[1]

grant5[1]

ack1[1]

grant5[1]

clock

2x2rootSA

up_req[0]

up_req[1]

req5[0]

req5[1]req5[2]req5[3]

4x4BA

counterD

req4req5req6req7

4x4BA

counterD

up_ack0

up_ack1

l1.sa1.output[1]

l0.sa5.output[1]

•Travels through two 4-input OR gates in series

req5[1]

up_req[1]

req5

•Then through a 2x2 root SA

2x2rootSA

•Finally through two 2-input AND gates in series

grant5[1]

up_ack1

•Results in 0.94ns delay using a TSMC 0.25µ standard cell library from LEDA Systems.

ack1[1]D

D

up_ack1

ack signals look like feedback path through the same logic block. In fact there is no input to the same logic gates.

Page 17: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

17

Comparison w/32x32 PPE and PPAPPE Critical Path:• output[31] =

in[0]’•in[1]’•…•in[30]’•in[31] plus output encoding.

• Associates with 8 4-input AND gates and 31 inverters.

• Results in 2.17ns delay using a TSMC 0.25µ std. cell library from LEDA Systems.

PPA Critical Path:• Only use 2x2 arbiters.• 2x2 PPA: 0.4ns while our

2x2 SA:0.24ns

• 5 levels in a binary tree structure.

• Associates with 4 serially connected 2-input OR gates for ORed request.

• Associates with 2 acknowledgements from two higher levels →3 3-input AND gates.

• Results in 1.7ns delay using a TSMC 0.25µ std. cell library from LEDA Systems.

req[31]ack (ANDed)req (ORed)

PPA

Page 18: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

18

Round-robin Arbiter Generator (RAG)RAG is preferable to employ as many 4x4 SAs as possible to reduce the number of levels in a hierarchy.A hierarchical 4x4 SA has longer delay (0.46ns) than a 4x4 ack-req SA (0.34ns) in .25µ std. cell library from LEDA Systems.

2x2ack-req

SA

req0[0]

req0[1]

ack0[0]

2x2ack-req

SA

ack0[1]

req1[0]

req1[1]2x2rootSA

grant0[0]

grant0[1]

grant1[0]

grant1[1]

req0req1

4x4 SAclock

rootleaves

Page 19: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

RAG (Continued)

A user specify an arbiter type either a Bus Arbiter or a Switch Arbiter.A user specify the number of masters (M) to be arbitrated.RAG generates synthesizable Verilog code for a Bus Arbiter or a Switch Arbiter at the RTL level.RAG is most efficient when M is a power of two.

User input:1. Type of the arbiter2. Number of masters

generate M x Mbus arbitergen_arb();

Bus Arbiter Switch Arbiter

integrate M x Mhierarchical switch arbiterinteg_arb();

Library2x2 ack-req SA4x4 ack-req SA2x2 root SA4x4 root SA

Bus Arbiter Switch Arbiter

num_level ← 0dividend←num_mastersremainder← 0

dividend=0?

No

dividend ←(integer) (dividend/4)n ←num_levelnum_4by4_level(n) ←0num_2by2_level(n) ←0num_level ++

dividend=0 and remainder=0 ?

dividend←num_mastersn← 0

Yes

dividend>2?

remainder ← dividend mod 4dividend←(integer) (dividend/4)num_4by4_level(n)← dividend

remainder ← dividend mod 2dividend←(integer) (dividend/2)num_2by2_level(n)← dividend

No Yes

remainder=0 ?

remainder>2 ?

No

num_4by4_level(n)++num_2by2_level(n)++

YesNo

n++

remainder=0 ?

No

Yes

No Yes

dividend←num_4by4_level(n)+num_2by2_level(n)n<num_level?

No

Yes Hierarchical SA

Yes

Page 20: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

RAG (Continued)

num_level ← 0dividend←num_mastersremainder← 0

dividend=0?

No

dividend ←(integer) (dividend/4)n ←num_levelnum_4by4_level(n) ←0num_2by2_level(n) ←0num_level ++

dividend=0 and remainder=0 ?

dividend←num_mastersn← 0

Yes

dividend>2?

remainder ← dividend mod 4dividend←(integer) (dividend/4)num_4by4_level(n)← dividend

remainder ← dividend mod 2dividend←(integer) (dividend/2)num_2by2_level(n)← dividend

No Yes

remainder=0 ?

remainder>2 ?

No

num_4by4_level(n)++num_2by2_level(n)++

YesNo

n++

remainder=0 ?

No

Yes

No Yes

dividend←num_4by4_level(n)+num_2by2_level(n)n<num_level?

No

Yes Hierarchical SA

dividend=32

dividend=32n=0

Yes

dividend=8n=0num_4by4_level(0)=0num_2by2_level(0)=0num_level=1

dividend=2n=1num_4by4_level(1)=0num_2by2_level(1)=0num_level=2

dividend=0n=2num_4by4_level(2)=0num_2by2_level(2)=0num_level=3

remainder=0dividend=8num_4by4(0)=8

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SA

4x4

SAn=1

dividend=8

remainder=0dividend=2num_4by4(1)=2

4x4

SA

4x4

SA

n=2

dividend=2

remainder=0dividend=1num_2by2(2)=1

2x2

root

SA

n=3

dividend=1

Page 21: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

21

User input:1. Type of the arbiter2. Number of masters

generate M x Mbus arbitergen_arb();

Bus Arbiter Switch Arbiter

integrate M x Mhierarchical switch arbiterinteg_arb();

Library2x2 ack-req SA4x4 ack-req SA2x2 root SA4x4 root SA

Bus Arbiter Switch Arbiter

RAG (Continued)

Calculate the number

of levels;

Calculate SA blocks for

each level;

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

2x2

roo t

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

4x4

S A

2x2

roo t

S A

integrate M x Mhierarchical switch arbiterinteg_arb();

4x4ack-req

SAl0.sa0

req0[0]req0[1]req0[2]req0[3]

req0[0]req0[1]req0[2]req0[3]

ack0[0]

4x4ack-req

SAl0.sa1

ack0[1]

4x4ack-req

SAl0.sa2

ack0[3]

4x4ack-req

SAl0.sa3

4x4ack-req

SAl1.sa0

req1[0]req1[1]req1[2]req1[3]

req1[0]req1[1]req1[2]req1[3]

req2[0]req2[1]req2[2]req2[3]

req2[0]req2[1]req2[2]req2[3]

req3[0]req3[1]req3[2]req3[3]

req3[0]req3[1]req3[2]req3[3]

ack0[2]

4x4ack-req

SAl0.sa4

req4[0]req4[1]req4[2]req4[3]

req4[0]req4[1]req4[2]req4[3]

ack1[0]

4x4ack-req

SAl0.sa5

ack1[1]

4x4ack-req

SAl0.sa6

ack1[3]

4x4ack-req

SAl0.sa7

4x4ack-req

SAl1.sa1

req5[0]req5[1]req5[2]req5[3]

req5[0]req5[1]req5[2]req5[3]

req6[0]req6[1]req6[2]req6[3]

req6[0]req6[1]req6[2]req6[3]

req7[0]req7[1]req7[2]req7[3]

req7[0]req7[1]req7[2]req7[3]

ack1[2]

2x2rootSA

grant0[0]grant0[1]grant0[2]grant0[3]

grant0[0]grant0[1]grant0[2]grant0[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant1[0]grant1[1]grant1[2]grant1[3]

grant2[0]grant2[1]grant2[2]grant2[3]

grant2[0]grant2[1]grant2[2]grant2[3]

grant3[0]grant3[1]grant3[2]grant3[3]

grant3[0]grant3[1]grant3[2]grant3[3]

grant4[0]grant4[1]grant4[2]grant4[3]

grant4[0]grant4[1]grant4[2]grant4[3]

grant5[0]grant5[1]grant5[2]grant5[3]

grant5[0]grant5[1]grant5[2]grant5[3]

grant6[0]grant6[1]grant6[2]grant6[3]

grant6[0]grant6[1]grant6[2]grant6[3]

grant7[0]grant7[1]grant7[2]grant7[3]

grant7[0]grant7[1]grant7[2]grant7[3]

up_req[0]up_req[1]

req0

req1

req2 req3

req4req5

req6req7

up_ack0

up_ack1

clock

Page 22: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

22

Comparisons with PPE and PPA

Using TSMC 0.25µ std. cell library from LEDA Systems

0

1000

2000

3000

4000

5000

6000

0 50 100 150

MxM arbiter

Area

of a

rbite

r in

the

num

ber o

f inv

erte

r eq

uiva

lent

s

SAPPEPPA

0

0.5

1

1.5

2

2.5

3

3.5

0 50 100 150

MxM arbiter

Dela

y in

arb

iter w

ith T

SMC

.25u

m

SAPPEPPA

Page 23: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

23

Comparisons (Continued)The shortest delay results from• Limiting the size of switch arbiter blocks to 2x2 and 4x4 to

reduce the critical path delay due to the expansion of priority logic blocks compared with Programmable Priority Logic Encoder (PPE), a centralized arbiter.

• Reducing the number of levels in a hierarchy by preferring to use more 4x4 switch arbiter blocks compared with Ping-Pong Arbiter (PPA).

Page 24: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

24

Speedup for a Terabit SwitchAssumptions for comparison

• The speed of switching is wholly determined by the arbitration cycles.

Speedup • Our hierarchical 128x128 SA:

6.16Tbps.• 128x128 PPA: 3.18Tbps.• 128x128 PPE: 2.59Tbps.• Our SA achieves throughput

1.9X higher than PPA and 2.4X higher than PPE.

Commercial Switches• Mindspeed claims up to

.45Tbps for 144x144 switch using multiple chips.

• PetaSwitch claims up to 10.24Tbps for 256x256 switch using multiple chips.

• No details about logic design nor process technology used.

output port 127

Network Switch (128x128)

CrossbarSwitch Fabric

(128x128)x128

128 (128x128 arbiter)s

… … …

VOQ(0,127)

VOQ(0,0)

.

.

.input port 0

VOQ(127,0)

.

.

.

VOQ(127,127)

input port 127

.

.

.

.

.

.

output port 0

...

.

.

.

...

req(0, 0)

req(127, 127)

grant(0, 0-127) grant(127, 0-127)

Page 25: Round-robin Arbiter Design and Generation€¦ · • Dual Round-Robin Matching algorithm (DRRM) – H. J. Chao and J. S. Park, “Centralized Contention Resolution Schemes for a

25

Conclusion

BA logicWe showed how 2x2 and 4x4 BAs are applied to 2x2 and 4x4 switch arbiter blocks.We demonstrated how RAG generate synthesizable Verilog codes for a BA and a SA with the example of 32x32 hierarchical SA.We compared areas and delays with other SAs.We demonstrated how our generated 128x128 hierarchical SA could achieve throughput 1.9X higher than PPA and 2.4X higher than PPE.