33
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected] http://www.ecse.rpi.edu/Homepages/shivkuma Also based on slides of S. Keshav (Ensim), Douglas Comer (Purdue), Raj Yavatkar (Intel), Cyriel Minkenberg (IBM Zurich), Sonia Fahmy (Purdue) Minkenberg (IBM Zurich), Sonia Fahmy (Purdue) Many slides thanks to Nick McKeown (Stanford),

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute [email protected]

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Shivkumar KalyanaramanRensselaer Polytechnic Institute

1

High Speed Router Design

Shivkumar KalyanaramanRensselaer Polytechnic Institute

[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma

Also based on slides of S. Keshav (Ensim), Douglas Comer (Purdue),Raj Yavatkar (Intel), Cyriel Minkenberg (IBM Zurich), Sonia Fahmy (Purdue)Minkenberg (IBM Zurich), Sonia Fahmy (Purdue)

Many slides thanks to Nick McKeown (Stanford),

Shivkumar KalyanaramanRensselaer Polytechnic Institute

2

Introduction Evolution of High-Speed Routers High Speed Router Components:

Lookup Algorithm Switching Classification, Scheduling

Multi-Tbps Routers: Challenges & Trends

Overview

Shivkumar KalyanaramanRensselaer Polytechnic Institute

3

What do switches/routers look like?

Access routerse.g. ISDN, ADSL

Core routere.g. OC48c POS

Core ATM switch

Shivkumar KalyanaramanRensselaer Polytechnic Institute

4

Dimensions, Power Consumption

Cisco GSR 12416 Juniper M160

6ft

19”

2ft

Capacity: 160Gb/sPower: 4.2kW

3ft

2.5ft

19”

Capacity: 80Gb/sPower: 2.6kW

Shivkumar KalyanaramanRensselaer Polytechnic Institute

5

Where high performance packet switches are used

Enterprise WAN access& Enterprise Campus Switch

- Carrier Class Core Router- ATM Switch- Frame Relay Switch

The Internet Core

Edge Router

Shivkumar KalyanaramanRensselaer Polytechnic Institute

6

Where are routers? Ans: Points of Presence (POPs)

A

B

C

POP1

POP3POP2

POP4 D

E

F

POP5

POP6 POP7POP8

Shivkumar KalyanaramanRensselaer Polytechnic Institute

7

POP with smaller routersPOP with large routers

Interfaces: Price >$200k, Power > 400W Space, power, interface cost economics! About 50-60% of i/fs are used for interconnection within the POP. Industry trend is towards large, single router per POP.

Why the Need for Big/Fast/Large Routers?

Shivkumar KalyanaramanRensselaer Polytechnic Institute

8

Job of router architect

For a given set of features:

3

. . 5

2

Maximize capacity,

Power,

Volume,

C

P kW

V

t

m

s

Shivkumar KalyanaramanRensselaer Polytechnic Institute

9

Performance metrics1. Capacity

“maximize C, s.t. volume < 2m3 and power < 5kW”2. Throughput

Maximize usage of expensive long-haul links. Trivial with work-conserving output-queued routers

3. Controllable Delay Some users would like predictable delay. This is feasible with output-queueing plus weighted fair

queuing (WFQ).

WFQ( , ) ( , )

Shivkumar KalyanaramanRensselaer Polytechnic Institute

10

Relative preformance increase

100%

1000%

10000%

100000%

1996 1998 2000 2002

DWDM Link speedx2/8 months

Router capacityx2.2/18 months

Moore’s lawx2/18 m

DRAM access rate x1.1/18 m

Internetx2/yr

Shivkumar KalyanaramanRensselaer Polytechnic Institute

11

Alt: Memory BandwidthCommercial DRAM

Memory speed is not keeping up with Moore’s Law.

0.0001

0.001

0.01

0.1

1

10

100

1000

1980 1983 1986 1989 1992 1995 1998 2001

Acc

ess

Tim

e (n

s) DRAM1.1x / 18months

Moore’s Law2x / 18 months

Router Capacity2.2x / 18months

Line Capacity2x / 7 months

Shivkumar KalyanaramanRensselaer Polytechnic Institute

12

An Example: Packet buffers40Gb/s router linecard

BufferMemory

Write Rate, R

One 40B packetevery 8ns

Read Rate, R

One 40B packetevery 8ns

10Gbits

Buffer Manager

Use SRAM?+ Fast enough random access time, but- Too low density to store 10Gbits of data.

Use DRAM? + High density means we can store data, but- Can’t meet random access time.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

13

Eg: Problems w/ Output Queuing

Output queued switches are impractical

R

R

RR

DRAMDRAM

NR NR

data

R

R

RR

output1

N

Can’t I just use N separate memory devices per output?

Shivkumar KalyanaramanRensselaer Polytechnic Institute

14

Packet processing is getting harder

1

10

100

1000

1996 1997 1998 1999 2000 2001

CPU Instructions per minimum length packet since 1996

Shivkumar KalyanaramanRensselaer Polytechnic Institute

15

Basic Ideas: Part I

Shivkumar KalyanaramanRensselaer Polytechnic Institute

16

First-Generation IP Routers

Most Ethernet switches and cheap packet routers Bottleneck can be CPU, host-adaptor or I/O bus What is costly? Bus ? Memory? Interface? CPU?

Shared Backplane

Line Interface

CPU

Memory

CPU BufferMemory

LineInterface

DMA

MAC

LineInterface

DMA

MAC

LineInterface

DMA

MAC

Shivkumar KalyanaramanRensselaer Polytechnic Institute

17

First Generation Routers

Shared Backplane

Line Interface

CPU

Memory

RouteTableCPU Buffer

Memory

LineInterface

MAC

LineInterface

MAC

LineInterface

MAC

Fixed length “DMA” blocksor cells. Reassembled on egress

linecard

Fixed length cells or variable length packets

Typically <0.5Gb/s aggregate capacity

Shivkumar KalyanaramanRensselaer Polytechnic Institute

18

Output 2

Output N

First Generation RoutersQueueing Structure: Shared Memory

Large, single dynamically allocated memory buffer:N writes per “cell” timeN reads per “cell” time.

Limited by memory bandwidth.

Input 1 Output 1

Input N

Input 2

Numerous work has proven and made possible:

Fairness Delay Guarantees Delay Variation Control Loss Guarantees Statistical Guarantees

Shivkumar KalyanaramanRensselaer Polytechnic Institute

19

Second-Generation IP Routers

CPU BufferMemory

LineCard

DMA

MAC

LocalLocalBufferBufferMemoryMemory

LineCard

DMA

MAC

LocalLocalBufferBufferMemoryMemory

LineCard

DMA

MAC

LocalLocalBufferBufferMemoryMemory

Port mapping intelligence in line cards Higher hit rate in local lookup cache What is costly? Bus ? Memory? Interface? CPU?

Shivkumar KalyanaramanRensselaer Polytechnic Institute

20

Second Generation Routers

RouteTableCPU

LineCard

BufferMemory

LineCard

MAC

BufferMemory

LineCard

MAC

BufferMemory

FwdingCache

FwdingCache

FwdingCache

MAC

Slow Path

Drop PolicyDrop Policy Or Backpressure

OutputLink

Scheduling

BufferMemory

Typically <5Gb/s aggregate capacity

Shivkumar KalyanaramanRensselaer Polytechnic Institute

21

RouteTableCPU

Second Generation RoutersAs caching became ineffective

LineCard

BufferMemory

LineCard

MAC

BufferMemory

LineCard

MAC

BufferMemory

FwdingTable

FwdingTable

FwdingTable

MAC

ExceptionProcessor

Shivkumar KalyanaramanRensselaer Polytechnic Institute

22

Second Generation RoutersQueuing Structure: Combined Input and Output Queuing (CIOQ)

Bus

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by bus speed

Shivkumar KalyanaramanRensselaer Polytechnic Institute

23

Third-Generation Switches/Routers

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line Interface

Line InterfaceCPU

Memory

Third generation switch provides parallel paths (fabric)

What’s costly? Bus? Memory, CPU?

Shivkumar KalyanaramanRensselaer Polytechnic Institute

24

Third Generation Routers

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interface

CPUMem

ory FwdingTable

RoutingTable

FwdingTable

Typically <50Gb/s aggregate capacity

Shivkumar KalyanaramanRensselaer Polytechnic Institute

25

Arbiter

Third Generation RoutersQueueing Structure

Switch

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Shivkumar KalyanaramanRensselaer Polytechnic Institute

26

Arbiter

Third Generation RoutersQueueing Structure: VOQs

Switch

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Per-flow/class or per-output queues (VOQs)

Per-flow/class or per-input queues

Flow-controlbackpressure

Shivkumar KalyanaramanRensselaer Polytechnic Institute

27

Third Generation Routers: limits

19” or 23”

7’

• Size-constrained: 19” or 23” wide.

• Power-constrained: ~<8kW.

Supply: 100A/200A maximum at 48V

Shivkumar KalyanaramanRensselaer Polytechnic Institute

28

Fourth Generation: Clustering/Multi-stage

Switch Core Linecards

Optical links

100’sof feet

Shivkumar KalyanaramanRensselaer Polytechnic Institute

29

Key: Physically Separating Switch Core and Linecards

Distributes power over multiple racks. Allows all buffering to be placed on the linecard:

Reduces power.Places complex scheduling, buffer mgmt, drop

policy etc. on linecard.

Shivkumar KalyanaramanRensselaer Polytechnic Institute

30

Fourth Generation Routers/Switches

Switch Core Linecards

Optical links

100’sof feet

The LCS Protocol

Shivkumar KalyanaramanRensselaer Polytechnic Institute

31

Linecard

LCS

LCS

1: Req

Physical Separation

3: DataSwitch

Scheduler

Switch

Scheduler

2: Grant/credit

Seq num

Switch

Fabric

Switch

Fabric

Switch Port

Req

Grant

1 RTT

Per Queue Counters

Shivkumar KalyanaramanRensselaer Polytechnic Institute

32

Physical SeparationAligning Cells

Switch

Scheduler

Switch

Scheduler

Switch

Fabric

Switch

Fabric

LCS

LCS

LCS

Switch Core

Linecard

Linecard

Linecard

Shivkumar KalyanaramanRensselaer Polytechnic Institute

33

Fourth Generation Routers/SwitchesQueueing Structure

1 write per “cell” time 1 read per “cell” timeRate of writes/reads determined by switch

fabric speedup

Lookup&

DropPolicy

OutputScheduling

Virtual Output Queues

OutputScheduling

OutputScheduling

SwitchFabric

SwitchArbitration

Linecard Linecard

Switch Core(Bufferless)

Lookup&

DropPolicy

Lookup&

DropPolicy

Typically <5Tb/s aggregate capacity