Architectural Results in the Optical Router Project

1

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Architectural Resultsin the Optical Router Project

Da Chuang, Isaac Keslassy,Nick McKeown

High Performance Networking Group

http://klamath.stanford.edu

2

Relative performance increase

0

200

400

600

800

1000

1200

2002 2004 2006 2008 2010 2012

Internettraffic x2/yr

Router capacityx2.2/18 months

5x

3

POP with smaller routersPOP with large routers

Interfaces: Price >$200k, Power > 400W About 50-60% of interfaces are used for interconnection

within the POP. Industry trend is towards large, single router per POP.

Fast (large) routers Big POPs need big routers

4

100Tb/s optical router

Objective To determine the best way to incorporate

optics into routers.

Push technology hard to expose new issues.• Photonics, Electronics, System design

Motivating example: The design of a 100 Tb/s Internet router

• Challenging but not impossible (~100x current commercial systems)

• It identifies some interesting research problems

5

Arbitration

160Gb/s

40Gb/s

40Gb/s

40Gb/s

40Gb/s

OpticalSwitch

• Line termination

• IP packet processing

• Packet buffering




160-320Gb/s

160-320Gb/s

Electronic

Linecard #1ElectronicLinecard #625

Request

Grant

(100Tb/s = 625 * 160Gb/s)


6

Research Problems

Linecard Memory bottleneck: Address lookup and packet

buffering.

Architecture Arbitration: Computation complexity.

Switch Fabric Optics: Fabric scalability and speed, Electronics: Switch control and link electronics, Packaging: Three surface problem.

7

Packet Buffering Problem Packet buffers for a 40Gb/s router

linecard

BufferMemory

Write Rate, R

One 40B packetevery 8ns

Read Rate, R


10Gbits

Buffer Manager

8

Memory Technology

Use SRAM?+ Fast enough random access time, but- Too low density to store 10Gbits of data.

Use DRAM? + High density means we can store data, but- Can’t meet random access time.

9

Can’t we just use lots of DRAMs in parallel?

BufferMemory

Write Rate, R


Read Rate, R


Buffer Manager

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

BufferMemory

Read/write 320B every 32ns

40-79Bytes: 0-39 … … … … … 280-319

320B 320B

10

Works fine if there is only one FIFO

Write Rate, R


Read Rate, R

One 40B packetevery 8nsBuffer Manager

40-79Bytes: 0-39 … … … … … 280-319

320B

Buffer Memory

320B40B 320B

320B

40B40B40B 40B40B 40B40B 40B40B

320B320B320B320B320B320B320B320B320B320B

11

In practice, buffer holds many FIFOs

40-79Bytes: 0-39 … … … … … 280-319

320B 320B 320B 320B

320B 320B 320B 320B

320B 320B 320B 320B

1

2

Q

e.g. In an IP Router, Q might be 200. In an ATM switch, Q might be 106.

Write Rate, R


Read Rate, R

One 40B packetevery 8nsBuffer Manager

320B

320B?B 320B

320B

?B

How can we writemultiple packets intodifferent queues?

12

ArrivingPackets

R

Arbiter orScheduler

Requests

DepartingPackets

R

12

1

Q

21234

345

123456

Small head SRAM cache for FIFO heads

SRAM

Hybrid Memory HierarchyLarge DRAM memory holds the body of FIFOs

57 6810 9

79 81011

1214 1315

5052 515354

8688 878991 90

8284 838586

9294 9395 68 7911 10

1

Q

2

Writingb bytes

Readingb bytes

cache for FIFO tails

5556

9697

8788

57585960

899091

1

Q

2

Small tail SRAM

DRAM

13

160Gb/s Linecard: Packet Buffering

Solution Hybrid solution uses on-chip SRAM and off-chip DRAM. Identified optimal algorithms that minimize size of SRAM (12

Mbits). Precisely emulates behavior of 40 Gbit SRAM.

DRAM DRAM DRAM

160 Gb/s 160 Gb/s

Queue Manager

klamath.stanford.edu/~nickm/papers/ieeehpsr2001.pdf

SRAM

14

Research Problems

Linecard Memory bottleneck: Address lookup and packet

buffering.

Architecture Arbitration: Computation complexity.

Switch Fabric Optics: Fabric scalability and speed, Electronics: Switch control and link electronics, Packaging: Three surface problem.

15

Arbitration

160Gb/s

40Gb/s

40Gb/s

40Gb/s

40Gb/s

OpticalSwitch







160-320Gb/s

160-320Gb/s

Electronic

Linecard #1ElectronicLinecard #625

Request

Grant

(100Tb/s = 625 * 160Gb/s)


16

The Arbitration Problem

A packet switch fabric is reconfigured for every packet transfer.

At 160Gb/s, a new IP packet can arrive every 2ns.

The configuration is picked to maximize throughput and not waste capacity.

Known algorithms are too slow.

17

Cyclic Shift?

1

N

1

N

Uniform Bernoulli iid traffic: 100% throughput

Problem: real traffic is non-uniform

18

Two-Stage Switch

External Outputs

Internal Inputs

1

N

ExternalInputs

Load-balancing cyclic shift

Switching cyclic shift

1

N

1

N

11

2

2

100% throughput for broad range of traffic types (C.S. Chang et al., 2001)

19

External Outputs

Internal Inputs

1

N

ExternalInputs

Cyclic Shift Cyclic Shift

1

N

1

N

11

2

2

Problem: mis-sequencing

20

Preventing Mis-sequencing

1

N

1

N

1

N

The Full Frames First algorithm: Keeps packets ordered and Guarantees a delay bound within the optimum

Infocom’02: klamath.stanford.edu/~nickm/papers/infocom02_two_stage.pdf

Small CoordinationBuffers & ‘FFF’ Algorithm

Large CongestionBuffers

Cyclic Shift Cyclic Shift

21

Conclusions

Packet Buffering Emulation of SRAM speed with DRAM

density Packet buffer for a 160 Gb/s linecard is

feasible

Arbitration Developed Full Frames First Algorithm 100% throughput without scheduling

22

Two-Stage SwitchExternal Outputs

Internal Inputs

1

N

ExternalInputs

1

N

1

N

1(t) 2(t)

b(t)

q(t)

a(t)

• Traffic rate: ΛeN

taEtEtatEtbE1

)]([)]([)]()([)]([ 11 ππ

)()()( 1 tattb π• First cyclic shift:

011

1)()(

1lim 2

e

Ne

N

T

tttb

TT

• Long-term service opportunities exceed arrivals:

Documents

Architectural Results in the Optical Router Project