76
John DeHart and Mike Wilson SPP V2 Router Design

SPP V2 Router Design

  • Upload
    larya

  • View
    36

  • Download
    1

Embed Size (px)

DESCRIPTION

SPP V2 Router Design. John DeHart and Mike Wilson. Revision History. 3 June 2008 Initial release, presentation 25 June 2008 Updates on feedback from presentation 27 July 2009 Current status, changes, Control documentation 24 August 2009 Updates from debugging, simulation. - PowerPoint PPT Presentation

Citation preview

Page 1: SPP V2 Router Design

John DeHart and Mike Wilson

SPP V2 RouterDesign

Page 2: SPP V2 Router Design

2 - Mike Wilson - 04/21/23

Revision History3 June 2008

»Initial release, presentation25 June 2008

»Updates on feedback from presentation27 July 2009

»Current status, changes, Control documentation24 August 2009

»Updates from debugging, simulation

Page 3: SPP V2 Router Design

3 - Mike Wilson - 04/21/23

Current Status: Summary Memory Layout

»Done, may need revisiting Scripts (.ind files) done, missing TCAM initialization NPUA blocks written, simulates, some GPE-to-NPE problems NPUB broken, needs some changes

»Needs RxB SRAM Ring fix»HdrFmt needs internal header fix»Recent changes to LookupB/Copy not yet added»Need some changes to TxB for chained buffers

Recent changes:»Exception, Local Delivery packets omitted in original design

Necessitates changes to Parse

»Changed ResultTable indexing Impacts LookupB/Copy

Page 4: SPP V2 Router Design

4 - Mike Wilson - 04/21/23

SPP VersionsSPP Version 0:

»What we used for SIGCOMM PaperSPP Version 1:

»Bare minimum we would need to release something to PlanetLab Users

SPP Version 2:»What we would REALLY like to release to PlanetLab users.

Page 5: SPP V2 Router Design

5 - Mike Wilson - 04/21/23

Objectives for SPP-NPE version 2 Deal with constraints imposed by switch

»can send to only one NPU; can receive from only one NPU»split processing across NPUs

parsing, lookup on one; queuing on other Provide more resources for slice-specific processing Decouple QM schedulers from links

»collection of largely independent schedulers»may use several to send to the same link

e.g. separate rate classes (1-10M, 10-100M, 100-100M) optionally adjust scheduler rates dynamically

Provide support for multicast»requires addition of next-hop IP address after queueing

Enable single slice to operate at 10 Gb/s Support “slow” code options

»Use separate rate classes to limit rate to slow code options»LCI QMs for Parse, NPUB QMs for HdrFmt

Page 6: SPP V2 Router Design

6 - Mike Wilson - 04/21/23

SPP Version 2 System Architecture

GPE Blade

GPE Blade

SPISwitch

Sw

itch

Bla

de

NPUA

NPUB

LCIngress

RTM

LCEgress

FICSPI

Switch FIC

NPE 7010 BladeLC 7010 Blade

1 10Gb/sOR

10 1Gb/s

DecapParseLookupAddShim

CopyQMHdrFormat

Default Data Path

Page 7: SPP V2 Router Design

7 - Mike Wilson - 04/21/23

SPP Version 2 System Architecture

GPE Blade

GPE Blade

SPISwitch

Sw

itch

Bla

de

NPUA

NPUB

LCIngress

RTM

LCEgress

FICSPI

Switch FIC

NPE 7010 BladeLC 7010 Blade

1 10Gb/sOR

10 1Gb/s

DecapParseLookupAddShim

CopyQMHdrFormat

Fast-Path Data

Page 8: SPP V2 Router Design

8 - Mike Wilson - 04/21/23

SPP Version 2 System Architecture

GPE Blade

GPE Blade

SPISwitch

Sw

itch

Bla

de

NPUA

NPUB

LCIngress

RTM

LCEgress

FICSPI

Switch FIC

NPE 7010 BladeLC 7010 Blade

1 10Gb/sOR

10 1Gb/s

DecapParseLookupAddShim

CopyQMHdrFormat

Exception Data PathLocal Delivery

Page 9: SPP V2 Router Design

9 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM/0

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM/3

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Page 10: SPP V2 Router Design

10 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM/0

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM/3

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

NN Scr/512

NN NN

SRAM

Scr/256Scr/

256

Scr/256

Scr/256

Scr/256

NN

Scr/256

Scr/1024

Page 11: SPP V2 Router Design

11 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Page 12: SPP V2 Router Design

12 - Mike Wilson - 04/21/23

PlanetLab NPE Input Frame from LC or GPE

Ethernet Header:»DstAddr: MAC address of NPE»SrcAddr: MAC address of LC or GPE»VLAN: One VLAN per MR (MR == Slice)

Only use lower 11 bits of Vlan Tag IP Header:

»Dst Addr: IP address of this node How many IP Addresses can a NODE have?

»Src Addr: IP address of previous hop»Protocol: UDP

UDP Header:»Dst Port: Identifies input tunnel»Src Port: with IP Src Addr identifies sending

entity

Type=802.1Q (2B)

PAD (nB)

CRC (4B)

UDP Payload(MN Packet)

Src Addr (4B)

Dst Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

DstAddr (6B)

SrcAddr (6B)

IP Options (0-40B)

Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

VLAN (2B)Type=IP (2B)

Eth

ern

et

Header

IPH

eader

UD

PH

eader

Eth

ern

et

Tra

iler

Indicates 8-Byte BoundariesAssuming no IP Options

Page 13: SPP V2 Router Design

13 - Mike Wilson - 04/21/23

Local Delivery / ExceptionsGPE has separate tunnels for LD and EX

»Standard filters handle these packets»No internal packet headers required, although we can still use internal headers for exceptions

Return path from GPE uses same tunnels

»Standard filters handle re-classify cases»Internal packet headers from GPE to NPE are MNet-specific

Provides filter key for GPE-routed packets Substrate headers unchanged MN frames carry code-option-specific details, filter key

For IPv4, MN frame has IP version 0, payload has 112b lookup key to use. If GPE wants to reclassify, it sends a normal packet.

Page 14: SPP V2 Router Design

14 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Port(4b)

Reserved(12b)

Eth. FrameLen (16b)

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Page 15: SPP V2 Router Design

15 - Mike Wilson - 04/21/23

RxANo change from V1

Page 16: SPP V2 Router Design

16 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Rx UDP DPort (16b)Slice ID (VLAN) (16b)

MN Frm Offset (16b)MN Frm Length(16b)

Rx IP SAddr (32b)

Reserved (12b)

Rx UDP SPort (16b)Code(4b)

Slice Data Ptr (32b)

Port(4b)

Reserved(12b)

Eth. FrameLen (16b)

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1Buffer Handle(24b)Rsv

(3b)Intf(4b)

V1

Page 17: SPP V2 Router Design

17 - Mike Wilson - 04/21/23

Decap Inputs:

»Packet from RxA Outputs:

»Meta-frame (handle, offset and length)»Slice ID (VLAN tag)

Actually, lower 11b of VLAN tag and lower 4b of RX DA in (for RxID)»Metainterface (Rx Saddr, Rx Sport, Rx Dport)»Code Option (4b, only 16 available)»Slice data pointer

Initialization:»VLAN table, NPE MAC Address

Functionality:»Read VLAN tag from DRAM, determine correct code option.»Validate packet. Drop invalid, unmatched packets.

IP Options for NPE dropped in LC, should never arrive here!»Enqueue valid packets to Scratch ring.»Update stats

Status:»Works for valid packets, invalid packet handling untested

Page 18: SPP V2 Router Design

18 - Mike Wilson - 04/21/23

VLAN table

VLAN code_opt slice_data_ptr slice_data_size

0 0 0 0

1 0 0 0

… … … …

0x0aa 1

… … … …

0x7ff 0 0 0

SD data

P data

code_option = 0 implies invalid slice»“on switch” for a slice in the data plane

SD data is currently only counters 56B slice data Only use lower 11b of VLAN tag (2048 VLANs) Only changes from V1:

»No longer need all data on NPUA, drop HF data, per-slice buffer limits

Page 19: SPP V2 Router Design

19 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Rx UDP DPort (16b)Slice ID (VLAN) (16b)

MN Frm Offset (16b)MN Frm Length(16b)

Rx IP SAddr (32b)

Reserved (12b)

Rx UDP SPort (16b)Code(4b)

Slice Data Ptr (32b)

Lookup Key[111-80] DA (32b)

MN Frm Length (16b)MN Frm Offset (16b)

Lookup Key[ 79-48] SA (32b)

Lookup KeyProto/TCP_Flags

[15- 0] (16b)

ExceptionBits (12b)

Lookup Key[143-112] Type(1b)/RxID(4b)/Slice ID(11b)/

Rx UDP DPort (16b)

Code(4b)

Lookup Key[ 47-16] Ports (32b)

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1Buffer Handle(24b)Rsv

(3b)Intf(4b)

V1

Page 20: SPP V2 Router Design

20 - Mike Wilson - 04/21/23

Parse Inputs:

» Meta-frame (handle, offset and length)» Slice ID (VLAN tag, RxID)» Tunnel ID (Rx Saddr, Rx Sport, Rx Dport)» Code Option (4b, only 16 available)» Slice data pointer

Outputs:» Meta-frame (handle, offset and length)» Lookup key (Includes slice ID, Rx UDP dport)» Code Option (4b, only 16 available)» Exception bits (MN-specific) – do we still need these? (Probably)

Initialization:» Slice Data

Functionality:» Slice-specific processing:

Parse meta-frame. Extract lookup key. Raise any relevant exceptions. Can pass slice data to HdrFmt in bytes 16..30 of packet. (0..15 are reserved for AddShim)

» Substrate processing: Add substrate-specific information to lookup key (32b: Lookup type, RxID, Slice ID, Rx UDP dport)

Status:» Needs internal packet handling from GPE for GPE-specified filter keys» Needs to use "special" filter key for exception path, 0x0. Substrate processing should still pre-pend

substrate-specific key information (slice, MiID)» Works for normal (LCI-to-NPE) packets

Page 21: SPP V2 Router Design

21 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

MN Frm Length (16b)MN Frm Offset (16b)

Result Index (32b)

ExceptionBits (12b)

Slice ID (VLAN) (16b)Code(4b)

MN Frm Length (16b)MN Frm Offset (16b)

Rsvd(16b) Stats Index (16b)

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Lookup Key[111-80] DA (32b)

Lookup Key[ 79-48] SA (32b)

Lookup KeyProto/TCP_Flags

[15- 0] (16b)

ExceptionBits (12b)

Lookup Key[143-112] Type(1b)/RxID(4b)/Slice ID(11b)/

Rx UDP DPort (16b)

Code(4b)

Lookup Key[ 47-16] Ports (32b)

Page 22: SPP V2 Router Design

22 - Mike Wilson - 04/21/23

LookupA Inputs:

» Meta-frame (handle, offset and length)» Lookup key (Includes slice ID, RxID, Rx UDP dport)» Code Option (4b, only 16 available)» Exception bits

Outputs:» Meta-frame (handle, offset and length)» Lookup Result (Index into SRAM table on NPUB)

Actual max index is 0x3FFFF (Unicast), with single-bit type flag = 19 bits» Slice ID (VLAN tag)» Code Option (4b, only 16 available)» Exception bits (from Parse)» Stats Index (from TCAM)

Can this fit in the 13 bits leftover from the result index? No, result is bigger now. Initialization:

» Filters set in TCAM by control Functionality:

» Look up key in TCAM» On miss, drop the packet» Local Delivery is now a normal lookup» Lookup result is now just a 32b index (and stats index)

Status:» Written; untested.» Result size currently 48b; would like to reduce to 32b.

Page 23: SPP V2 Router Design

23 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Result Index (32b)

ExceptionBits (12b)

Slice ID (VLAN) (16b)Code(4b)

MN Frm Length (16b)MN Frm Offset (16b)

Rsvd(16b) Stats Index (16b)

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Page 24: SPP V2 Router Design

24 - Mike Wilson - 04/21/23

AddShim Inputs:

» Meta-frame (handle, offset and length)» Lookup Result (Index into SRAM table on NPUB)» Slice ID (VLAN tag)» Code Option (4b, only 16 available)» Exception bits (from Parse)» Stats Index (from TCAM)

Outputs:» Shim Packet (buffer handle)

Buffer descriptor contains updated offset and length, if needed Initialization:

» None. Functionality:

» Prepend shim header to preserve packet annotations across NPU’s» Overwrite the existing ethernet header (Up to 18B) with:

Slice ID (16b) Code Option (4b) Exception Bits (12b) MN Frame Offset (16b) MN Frame Length (16b) Result Index (32b) Stats Index (16b) [This is the same on NPUA, NPUB] 30B for opaque slice data.

Proper memory alignment required This is written by Parse, not AddShim!

Status:» Written. Works for properly aligned packets. Needs optimization.

Page 25: SPP V2 Router Design

25 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Page 26: SPP V2 Router Design

26 - Mike Wilson - 04/21/23

TxASends shim packet to NPUB.Unmodified 10 Gbps Tx 2×ME.

Page 27: SPP V2 Router Design

27 - Mike Wilson - 04/21/23

SPP Version2 NPUA to NPUB Frame

SHIM (16B)»Slice ID (16b)»Code Option (4b)»Exception Bits (12b)»Result Index (32b)»Stats Index (16b)»Offset of MN Packet (16b)»Length of MN Packet (16b)»Memory Alignment Padding (2B)

IP Header, UDP Header may be overwritten by:»opaque slice data, written in Parse

PAD (nB)

CRC (4B)

UDP Payload(MN Packet)

Src Addr (4B)

Dst Addr (4B)

Ver/HLen/Tos/Len (4B)ID/Flags/FragOff (4B)

TTL (1B)Protocol = UDP (1B)

Hdr Cksum (2B)

SHIM (16B)

IP Options (0-40B)

Src Port (2B)Dst Port (2B)

UDP length (2B)UDP checksum (2B)

Type=IP (2B)IP

Header

UD

PH

eader

Eth

ern

et

Tra

iler

Indicates 8-Byte BoundariesAssuming no IP Options

Page 28: SPP V2 Router Design

28 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Port(4b)

Reserved(12b)

Eth. FrameLen (16b)

Buffer Handle(24b)Reserved(8b)

Page 29: SPP V2 Router Design

29 - Mike Wilson - 04/21/23

RxBNeeds to switch from NN output to Scratch or SRAM

» Comments in code indicate SRAM should work» Supporting code seems to be only for scratch rings

Needs further examination DZar notes there are some obscure #define's needed for

SRAM rings.

Page 30: SPP V2 Router Design

30 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Frame Length (16b)Stats Index (16b)

Buffer Handle(24b)Reserved(8b)

Reserved(12b)

PerSchedQID(15b)

Sch3b

QM2b

Port(4b)

Reserved(12b)

Eth. FrameLen (16b)

Buffer Handle(24b)Reserved(8b)

Page 31: SPP V2 Router Design

31 - Mike Wilson - 04/21/23

LookupB/Copy Inputs:

»Shim packet (buffer handle, frame length) Outputs:

»Packet (buffer handle, frame length)»QueueID (QM, Scheduler, Queue ID)»Stats Index

Initialization:»ResultTable (unicast+multicast)» local endpoint table»Ethernet SAddr»Per-slice Packet Limits

Functionality (Overview)»Copy shim header into buffer descriptor»Look up routing information from result index» If multicast, make the copies»Enqueue to correct QM (from ResultTable)

Status»Written, broken.»Needs changes to handling of ResultTable; result indices are now

absolute, not per-slice.

Page 32: SPP V2 Router Design

32 - Mike Wilson - 04/21/23

LookupB/Copy – Code Sketchif not currently processing mcast packet

read packet from SRAM ringextract shimload ResultTable valuefill buffer descriptorif unicast

if per-slice packet limit permitsupdate per-slice packet countwrite to SRAM ring for correct QM. (By qmschedID in result table value).

else drop bufferelse

start mcast processingif per-slice packet limit permits

update per-slice packet countfetch first header buffer descriptorif payload length ≠ 0

write ref count into payload descriptorelse drop payload buffer

elsedrop bufferfinish mcast processing

else (Currently processing buffer, have empty header buffer handle)fill header buffer descriptor

only chain if payload buffer is not emptyif still making copies

fetch next header buffer descriptorelse finish mcast processingwrite current header buffer handle to SRAM ring for correct QM. (By qmschedID).

signal next ME

Page 33: SPP V2 Router Design

33 - Mike Wilson - 04/21/23

ResultTable – Unicast Data needed to enqueue, rewrite packet:

»Fanout: Ignored (Memory padding)»QID

QMID, SchedID, QID (20b) (Lookup Result)

»Src MI: IP Saddr (32b) (Per SchedID Table) UDP Sport (16b) (Lookup Result)

»Tunnel Next Hop IP DAddr (32b) (Lookup Result) IP DPort (16b)(Lookup Result)

»Chassis Addressing Ethernet Dst MAC (48b) (Per SchedID Table)

»Slice Specific Lookup Result Data (?) (Lookup Result)

Ethernet Src MAC»Should be constant across all pkts.

IP SAddr (32b)Eth DA (48b)

Per Sched Entry:

Fanout (4b)QID (20b)

IP DAddr (32b)UDP DPort (16b)UDP SPort (16b)

Results Entry:

HFIndex (16b)

Page 34: SPP V2 Router Design

34 - Mike Wilson - 04/21/23

ResultTable – Multicast Fanout gives the number of copies (0..15) Data needed per copy on NPUB:

»QID QMID, SchedID, QID (20b) (Lookup Result)

»Src MI: IP Saddr (32b) (Per SchedID Table) UDP Sport (16b) (Lookup Result)

»Tunnel Next Hop IP DAddr (32b) (Lookup Result) IP DPort (16b)(Lookup Result)

»Chassis Addressing Ethernet Dst MAC (48b) (Per SchedID Table)

»Slice Specific Lookup Result Data (?) (Lookup Result)

Ethernet Src MAC»Should be constant across all pkts.

Support Multicast but optimize for Unicast

Fanout (4b)QID (20b)

IP DAddr (32b)UDP DPort (16b)UDP SPort (16b)

Results Entry:

IP SAddr (32b)Eth DA (48b)

Per Sched Entry:

HFIndex (16b)

×16

Page 35: SPP V2 Router Design

35 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Frame Length (16b)

Buffer Handle(24b)

Stats Index (16b)

Reserved(8b)

Reserved(12b)

PerSchedQID(15b)

Sch3b

QM2b

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Page 36: SPP V2 Router Design

36 - Mike Wilson - 04/21/23

QMNo change from V1

»Incorporates change to limit queues by #pktsSome changes in how control allocates bandwidth

»Need to ensure that slow HdrFmt blocks can’t tie up the system

»Currently looking at worst-case engineering (everyone runs at slowest block speed)

Page 37: SPP V2 Router Design

37 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1Buffer Handle(24b)Rsv

(3b)Intf(4b)

V1

Page 38: SPP V2 Router Design

38 - Mike Wilson - 04/21/23

HdrFmt / SubEncap Inputs:

» Buffer Handle» Remaining inputs come from Buffer Descriptor:

Multicast or Unicast (from buffer_next) Frame length, offset HFIndex (index into HFTable, a slice-specific table) ResultIndex (for tunnel headers)

Outputs:» Packet (buffer handle)

Buffer descriptor contains updated offset and length Initialization:

» HFTable, containing slice-specific data. For IPv4, this is unused.» ResultTable, tunnel header information

Functionality:» Substrate level:

read buffer descriptor and pass frame offset, length, HFIndex, mcast/ucast to slice-specific HdrFmt

» Slice level: arbitrary processing. For IPv4, this writes any next-hop information. Except for redirects such as exception packets, effectively does nothing.

» Substrate level: Encapsulate for output tunnel (from ResultTable) Update stats

Status:» Revisit multicast model» Needs Internal Header code (Missing!)

Page 39: SPP V2 Router Design

39 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1 Buffer Handle(24b)Rsv

(3b)Intf(4b)

V1

Page 40: SPP V2 Router Design

40 - Mike Wilson - 04/21/23

Scr2NN/FreelistMgr Inputs:

»Buffer Handle (possibly chained) Outputs:

»Buffer Handle (possibly chained) Initialization:

»None Functionality:

»Combines Freelist Manager with Scr2NN glue»FM: Read from scratch ring. Free buffers, correctly handling chained

buffers and reference counts.»Scr2NN: Read from Scratch, write to NN.

Status:»Needs to be reworked from scratch; my method of combining was

wrong and could (probably would) deadlock.»Both blocks exist, but combining them is not straight-forward.

Open question: how should we prioritize among these tasks? The author should ensure that no deadlock is possible. (TxB writes to FM; if FM ring is full, TxB stalls. If Scr2NN is writing to TxB, it stalls. Gridlock.)

»As of August 2009, we'll use a temporary 4×4 thread split and revisit later.

Page 41: SPP V2 Router Design

41 - Mike Wilson - 04/21/23

SRAM

TCAM

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt/SubEncap(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block DiagramNPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAM

Scr2NN/Freelist(1 ME)

AddShim(1 ME)

Decap(1 ME)

Parse(8 ME)

LookupA(1 ME)

TxA(2 ME)

SPISwitch

Buffer Handle(24b)Rsv(3b)

Intf(4b)

V1

Page 42: SPP V2 Router Design

42 - Mike Wilson - 04/21/23

TxBMust support chained buffers

»Multicast uses header buffers and payload buffers»Headers are slice-specific; we can’t rely on known, static lengths as we did in ONL.

Sends header from one buffer, payload from chained buffer.»Can TX do this? Comments in the code seem to imply that

chained (non-SOP) buffers must start at offset 0. Our payloads usually won’t.

This will probably take some TX modification, but there’s no reason why it won’t work. Might have a performance penalty, of course…. [DZar]

Page 43: SPP V2 Router Design

43 - Mike Wilson - 04/21/23

SPP V2 SideB SRAM Buffer DescriptorBuffer_Next (32b)LW0

LW1

LW2

LW3

LW4

LW5

LW6

Packet_Next (32b)LW7

Reserved (4b)

Free_list0000 (4b)

Ref_Cnt (8b)

Slice ID(xsid)(12b)Stats Index (16b)

ResultIndex (32b)

Buffer_Size (16b)

Packet_Size (16b)

Offset (16b)

Reserved (4b)

MR Exception Bits (16b)HFIndex (16b)

MR Bits (optional) (32b)

Written by Freelist Mgr

Written by Rx

Written by LookupB/Copy

Written by QM

Ref_Cnt (8b)

Written by Rx,Added to by CopyDecremented byFreelist Mgr

Written by Rx orLookupB/Copy

Page 44: SPP V2 Router Design

44 - Mike Wilson - 04/21/23

SPP V2 SideB SRAM Buffer Descriptor

HFIndex is an index into the HFTable. Unused in IPv4.» May not be needed in Buffer Descriptor, since

SubstrateEncap can fetch it using ResultIndex ResultIndex is used to get tunnel header info from the

ResultTable

Buffer_Next (32b)LW0

LW1

LW2

LW3

LW4

LW5

LW6

Packet_Next (32b)LW7

Reserved (4b)

Free_list0000 (4b)

Ref_Cnt (8b)

Slice ID(xsid)(12b)Stats Index (16b)

ResultIndex (32b)

Buffer_Size (16b)

Packet_Size (16b)

Offset (16b)

Reserved (4b)

MR Exception Bits (16b)HFIndex (16b)

MR Bits (optional) (32b)

Page 45: SPP V2 Router Design

45 - Mike Wilson - 04/21/23

SPP v2 ControlNew data path adds new Control requirements

Heterogeneous MNet execution times»Control must select parameters for LCI QMs, NPUB QMs to avoid Parse, HdrFmt execution lag

Slice is now partial VLAN tag»Must ensure all VLAN tags have distinct low 11b

Filter/Results now split across NPUA, NPUB»Must coordinate updates to multiple data locations»Synchronization issues require some care in Control

Page 46: SPP V2 Router Design

46 - Mike Wilson - 04/21/23

SPP v2 ControlNPUA Data areas requiring Control setupNPE MAC address at

»IPV4_SD_MAC_ADDR_HI32»IPV4_SD_MAC_ADDR_lo16

VLAN Table»Used by Decap, Parse»Maps VLANs to code options, data areas»2048-entry table at PL_SD_VLAN_CODE_OPT_TABLE_BASE

struct{ unsigned int code_opt; // only 4 lsb used unsigned int slice_data_ptr; unsigned int slice_data_size; }

Page 47: SPP V2 Router Design

47 - Mike Wilson - 04/21/23

SPP v2 ControlData areas requiring Control setup

VLAN Table -cont'd-»Pointer to slice-specific SRAM areas»Slice owners request amount needed

(IPv4 code option needs 72B for counters)»Control must pass along Slice owner initialization data

»Control can allocate in any 4B aligned location within Bank 3 addresses 0x300000..0x7FFFFF (upper 5MB of BANK3)

»Each slice-specific region must be at least SLICE_DATA_ENTRY_SIZE_MINIMUM (56B) in size

»Each code option has different additional size needs E.g., for IPv4, 56+64=128B total E.g., for i3, 56+3200 = 3256B total

Page 48: SPP V2 Router Design

48 - Mike Wilson - 04/21/23

SPP v2 ControlData areas requiring Control setup

TCAM filters»Used by LookupA»Tightly interlinked with tables on NPUB

Page 49: SPP V2 Router Design

49 - Mike Wilson - 04/21/23

SPP v2 Control NPUB Data areas requiring Control setup

NPE source MAC address (HdrFmt/SubstrateEncap)»LC_MAC_ADDR_HI32»LC_MAC_ADDR_LO32

Per-Slice (2048) packet limits table (LookupB/Copy) at LC_PER_SLICE_PACKET_LIMIT_BASE

struct { unsigned int current; unsigned int maximum; unsigned int overLimits;}

Queue Manager parameters»Must properly rate limit both bandwidth and slow HdrFmt code options

»No heterogeneous HdrFmt code options yet

Page 50: SPP V2 Router Design

50 - Mike Wilson - 04/21/23

SPP v2 ControlNPUB Data areas requiring Control setup

Result TableUsed by LookupB/Copy, HdrFmt/SubstrateEncap

»Results corresponding to TCAM lookups»Links to per-QM scheduler tunnel endpoint values»Also links to per-slice HdrFmt data areas

Page 51: SPP V2 Router Design

51 - Mike Wilson - 04/21/23

Filters and ResultsSlice owner maps filters to results

»Filter is 144b key, first 32b is substrate's Meta-Interface ID»Slice owner controls remaining 112b

Results have multiple pieces»Type: unicast / multicast»Output QID('s) (associated with Meta-Interface)

Control translates slice representation to substrate's tunnel»Index into slice data in HFTable for Header Format to use

Page 52: SPP V2 Router Design

52 - Mike Wilson - 04/21/23

Adding (Multicast) FilterSlice Owner View

x filters y unicast results z multicast results

1. Add filter <Meta-Interface In, IP DA, IP SA, DPort, Sport, Proto> with result <Type=Multicast, Result R (in 1..z)>

Result R = <Fanout, Meta-Interface, Index> [up to 16× entries]

ControlControl

TCAM slice/RxID/Dport key ResultIndex

map

Cop

y

MulticastResultTable

fanout qm sched qid Tunnel: DA/DPort/SPort... up to 16 ...

Map

Local"subnet"

Tunnel SA Eth DA32 Entries

Cop

y

HFTable

HFIndex

Next Hop

(Opaque)

Range

Validatio

n

Map

2. Update HFTable (index, length, bytestream)

Range

Validatio

n

Copy

Page 53: SPP V2 Router Design

53 - Mike Wilson - 04/21/23

Filters and Results First, some things to remember:

»This is the NPE: we are supporting protocols that may not be IP!»Order of filters in a TCAM database defines those filters’ priority

Lower numbered filter indices are higher priority

»TCAM filter lookup is done on the A-Side.»TCAM filter result gives us a pointer to a full result which resides

in SRAM on the B-Side. Thus the A-Side filter and the B-Side result need not be a 1-1

mapping We could have many filters using the same B-Side result.

»We are supporting Unicast and Multicast filters and results Multicast supports a maximum fanout of 16.

Page 54: SPP V2 Router Design

54 - Mike Wilson - 04/21/23

Filters and Results (continued) Slice owners allocate N unicast filters and M multicast filters.

»They get: N+M Filter id’s (0 – (N+M-1) )

Contiguous in the TCAM Order in TCAM indicates priority, lower id higher priority

N Unicast Result indices (0 – (N-1) ) Contiguous in the Unicast portion of Result table

M Multicast Result indices (0 - (M-1) ) Contiguous in the Multicast port of the Result table

»Filter id and Result index (unicast or multicast) are referenced separately.

Example: Filter id 4 might use unicast result index 12»Unicast and Multicast filters in TCAM can be mingled.

Remember: Order in TCAM is important. Example: A unicast catch-all (all wildcards) filter should

probably be the LAST filter in a slice’s set of filters so it does not override other filters including multicast filters.

Page 55: SPP V2 Router Design

55 - Mike Wilson - 04/21/23

Filters and Results (continued) Slice owners will have the ability to disable a filter.

»Control removes the filter from tthe TCAM (LookupA)»Result is left on NPUB for "in-flight" packets

Slice owners can also remove a filter»This deletes the results from the B-side

Page 56: SPP V2 Router Design

56 - Mike Wilson - 04/21/23

Filter / Result Operations

Type(1b) ResultIndex(31)

MC ResultBitMask(16b)

Stats Index (16)

TCAM Result (A-side)

UnicastN Results

16B per Result

MulticastM Blocks

16 Resultsper Block

16B per Result

Result Table (B-side)

Valid(1b)

QID (20b)IP DAddr (32b)

UDP DPort (16b)UDP SPort (16b)

HFIndex (16b)Pad (16b)

Pad (11b)

16B

If we use entire SRAM Bank: SRAM Banks are 8MB Result size is 16B TCAM has 128K 144b entries N + M = 128K (N+16*M) * 16 <= 8MB N = 104858 M = 26214

Result Entry (B-side)

Page 57: SPP V2 Router Design

57 - Mike Wilson - 04/21/23

Filter / Result Operations

add_mc_filter(fid, RxMI, Key, Mask, mcResultIndex, statIndex) update_mc_filter(fid, mcResultIndex, resultMask) add_mc_result(fid, mcResultIndex, entryIndex, Qinfo, DestInfo) update_mc_result(fid, mcResultIndex, entryIndex, Qinfo, DestInfo) remove_mc_filter(fid) remove_mc_result(mcResultIndex) add_uc_filter(fid, RxMI, Key, Mask, ucResultIndex, statIndex) update_uc_filter(fid, ucResultIndex, statIndex) add_uc_result(fid, ucResultIndex, Qinfo, DestInfo) update_uc_result(fid, ucResultIndex, Qinfo, DestInfo) remove_uc_filter(fid) remove_uc_result(ucResultIndex)

Valid(1b)QID (20b)

IP DAddr (32b)UDP DPort (16b)UDP SPort (16b)HFIndex (16b)

Type(1b) ResultIndex(31)

Result Bit Mask(16b)

Stats Index (16)

TCAM Result (A-side)

Result Entry (B-side)

Page 58: SPP V2 Router Design

58 - Mike Wilson - 04/21/23

Multicast Filter / Result Operations add_mc_filter(fid, RxMI, Key, mcResultIndex, resultMask, statIndex)

» Adds multicast filter to TCAM update_mc_filter(fid, mcResultIndex, resultMask, statIndex)

» Updates (re-writes) the TCAM result add_mc_result(mcResultIndex, entryIndex, Qinfo, DestInfo)

» Writes a MC result entry into Result Table» Marks result as valid

update_mc_result(mcResultIndex, entryIndex, Qinfo, DestInfo)» Updates (re-writes) a MC result entry in the Result Table» Marks result as valid» Implementation will almost certainly be same as add_mc_result so why have both?

remove_mc_filter(fid)» Removes the filter from the TCAM, leaves B-side results unchanged.

remove_mc_result(mcResultIndex)» Invalidates a multicast filter result

Page 59: SPP V2 Router Design

59 - Mike Wilson - 04/21/23

Unicast Filter / Result Operations add_uc_filter(fid, RxMI, Key, ucResultIndex, statIndex)

» Adds unicast filter to TCAM update_uc_filter(fid, ucResultIndex, statIndex)

» Updates (re-writes) the TCAM result add_uc_result(ucResultIndex, Qinfo, DestInfo)

» Writes a UC result entry into the Result table» Marks result as valid

update_uc_result(ucResultIndex, Qinfo, DestInfo)» Updates (re-writes) a UC result entry into the Result table» Marks result as valid» Implementation will almost certainly be same as add_uc_result so why have both?

remove_uc_filter(fid)» Removes the filter from the TCAM, leaves the B-Side results unchanged

remove_uc_result(ucResultIndex)» Invalidates a unicast filter result

Page 60: SPP V2 Router Design

60 - Mike Wilson - 04/21/23

Extra SlidesThe rest of the slides are old or for extra

information

Page 61: SPP V2 Router Design

61 - Mike Wilson - 04/21/23

Design Questions Small hole for abuse in HdrFmt

»QM rate limits on payload length»HdrFmt (after QM) can vastly increase packet length»Should the LookupB table give the padding size for each entry?

Enforced in SubEncap?»ANSWER: No, we will resort to our control of HdrFmt to force it to

behave. (We write all of the code options right now.)

What are the best places to update stats on NPUB?»ANSWER: Post-Q only

Is there any remaining reason that NPUB would need the source tunnel information?»ANSWER: No. If a code option needs it, put it into opaque slice

data.

Page 62: SPP V2 Router Design

62 - Mike Wilson - 04/21/23

Questions/Issues 4/28/08:

»How many code options? Limit of 16?

»To handle slow Code Options: LCI Queues would control traffic to Fast/Slow Parse Code

Classes of code options defined by how long their Parse code takes. Scheduler assigned to a class of code option.

NPE Queues would control traffic to Fast/Slow HF Code LCE Queues control the output rate to Interfaces.

»Multicast Problems: Impact of multicast traffic overloading Lookup/Copy and becoming a

bottleneck.»Rx on SideB, can it use SRAM output ring?

All our other 10G Rx’s have NN output ring.»Option for HF to send out additional pkts?»How to pass MR and substrate hdrs to TxB?

Through Ring or through Hdr Buffer associated with Hdr Buffer descriptor.

If the latter then what are the constraints in Tx for buffer chaining?

Page 63: SPP V2 Router Design

63 - Mike Wilson - 04/21/23

Meeting Notes1/15/08:

»QM: Add Pkt count to Queue Params, change limit from QLen to PktCount

»Add Per Slice Pkt limit to NPUA and NPUB»Limit Fanout to 16»MCast: Control will allocate all 16 entries for a multicast result entry, result entry will be typed as multicast or unicast and will not transition from one to the other.

»What happens to pkts in queues when there is a route change that sends that flow’s pkts to a different interface and queue? Pkt ordering problems?

Page 64: SPP V2 Router Design

64 - Mike Wilson - 04/21/23

SRAM

TxA(2 ME)

TCAM

Decap, Parse, LookupA, AddShim(8 MEs)

SRAM

Stats(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt(4 MEs)

Stats(1 ME)SRAM

NPE Version 2 Block DiagramLookup produces

resultIndx, statsIndx

slice#, resultIndx

, etc, passed in

shim

Lookup on <slice#, resultIndx>

yields fanout, list of QiDs;copy to queues, adding

copy#;(slice#, resultIndx remain

in packet buffer)

use slice# to select slice to format packet; use resultIndx to get

next-hop

flow

contr

ol?

for unicast, resultIndx replaced by QiD; allowing output side to skip lookup

SPISwitch

NPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

Page 65: SPP V2 Router Design

65 - Mike Wilson - 04/21/23

Questions/Issues Where are exit and entry points for packets sent to and from the GPE for exception

processing?» Parse (NPUA) and LookupA (NPUA) are where most exceptions are generated:

IP Options No Route Etc.

» HdrFormat (NPUB) is where we do ethernet header processing What needs to be in the SHIM going from NPUA to NPUB?

» ResultIndex (32b)» Exception Bits (12b)» StatsIndex (16b)» Slice# (12b)» ???

Will we support multi-copy in a way similar to the ONL Router? How big can the fanout be?

» How many QIDs need to be stored with the LookupB Result? Is there some encoding for the QIDs that can take into account support for multicast and the copy#? For

example: Multicast QID(20b)

– Multicast (1b): 1 – Copy# (4b)– PerMulticast QID(15b): One PerMulticast QID allocated for each Multicast

Unicast QID(20b)– Unicast (1b): 0– QID (19b)

Are there timing/synchronization issues with adding, deleting or changing lookup entries between the two NPUs databases?

Do we need flow control between TxA and RxB?

Page 66: SPP V2 Router Design

66 - Mike Wilson - 04/21/23

SRAM

TxA(2 ME)

TCAM

Decap, Parse, LookupA, AddShim(8 MEs)

SRAM

Stats(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt(4 MEs)

Stats(1 ME)SRAM

NPE Version 2 Block Diagram

flow

contr

ol?

SPISwitch

NPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

NPUA:»RxA:Same as Version 0»TxA: New 10Gb/s »Decap: Same as Version 0»Parse: Same as Version 0

New code options?»LookupA: Results will be different from Version 0»AddSim: New

Page 67: SPP V2 Router Design

67 - Mike Wilson - 04/21/23

SRAM

TxA(2 ME)

TCAM

Decap, Parse, LookupA, AddShim(8 MEs)

SRAM

Stats(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt(4 MEs)

Stats(1 ME)SRAM

NPE Version 2 Block Diagram

flow

contr

ol?

SPISwitch

NPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

NPUB:»RxB:Same as Version 0»TxB: New 10Gb/s

with L2 Header coming in on input ring?»LookupB: New»Copy: New, may be able to use some code from ONL Copy»QM: New, decoupled from Links »HF: New, may use some code from Version 0

Page 68: SPP V2 Router Design

68 - Mike Wilson - 04/21/23

SRAM

TxA(2 ME)

TCAM

Decap, Parse, LookupA, AddShim(8 MEs)

SRAM

StatsA(1 ME)

RxA(2 ME)

SRAM

SRAMSRAM

QueueManager(4 MEs)

RxB(2 ME)

TxB(2 ME)

LookupB&Copy(2 ME)

HdrFmt(4 MEs)

StatsB(1 ME)SRAM

NPE Version 2 Block Diagram

flow

contr

ol?

SPISwitch

NPUA

NPUB

SPISwitch

Sw

itch

Bla

de

GPE

SRAMFreeList

MgrB(1 ME)

Scr2NN(1 ME)

Sram2NN(1 ME)

NPUB has 17 MEscurrently spec’ed

FreeList MgrA

(1 ME)

Page 69: SPP V2 Router Design

69 - Mike Wilson - 04/21/23

SPP V2: MR Specific Code Where does the MR Specific Code reside in V2:

»Parse»HdrFormat

What about LookupA and LookupB?»Lookup is a “service” provided to the MRs by the Substrate.»No MR specific code needed in LookupA or LookupB

What about SideA AddShim?»The Exception bits that go in the shim are MR Specific but they should

be passed to AddShim and it will write them into the Shim. »No MR Specific code needed in AddShim.

What about SideB Copy?» Is there anything MR specific about setting up multiple copies of a

packet? There shouldn’t be. We will have the Copy block allocate a new hdr buffer

descriptor and link it to the existing data buffer descriptor and take care of reference counts.

The actual building of the new header(s) for the copies will be left to HF.»No MR Specific code needed in Copy.

Page 70: SPP V2 Router Design

70 - Mike Wilson - 04/21/23

SPP V2: Hdr Format Lots of changes for HF:

» Move behind QM» More general:

Support multiple source IP Addresses General support for Tunnels

Eventually different kinds of tunnels (UDP/IP, GRE, …)?» Support for Multicast

Dealing with header buffer descriptors Reading Fanout table

» Substrate portion of HF will need to do Decap type table lookup Slice ID (Code Option, Slice Memory Pointer, Slice Memory Size)

HF gets a buffer descriptor from the QM» The Substrate portion of HF must determine:

Code Option (8b) Slice ID (12b) Location of Next Hop information (20b - 32b)

LD vs. FWD? Stats Index (16b)

Should HF do this of QM?» The MR portion of HF must determine:

Exception bits (16b) Lets put all of the above data in the Buf Desc

» LookupB/Copy will need to write it there based on what comes across from SideA in the shim

Page 71: SPP V2 Router Design

71 - Mike Wilson - 04/21/23

SPP V2: ResultWe need to be much more general in our support

for Tunnels, Interfaces, MetaInterfaces, and Next Hops.

SideB Result:»Interface

IP SAddr (32b) Eth MAC DAddr (48b) (LC, GPE1, GPE2, …, GPEn) SchedulerId (8b): which QM should handle pkt

»TxMI: IP Sport (16b)

»TxNextHop: IP DAddr (32b) IP DPort (16b)

Page 72: SPP V2 Router Design

72 - Mike Wilson - 04/21/23

Data AreasWhere are the tables and what data is transmitted

from SideA to SideB?

SideA TablesShim between SideA and SideBSideB Tables

Page 73: SPP V2 Router Design

73 - Mike Wilson - 04/21/23

Pkt Processing Data and Tables SideA:

»MR/Slice Table: Generated by Control Used by:

Substrate Decap to retrieve a MR/Slice’s parameters Indexed by SliceId == VLAN Contains:

– Code option– Slice Memory ptr– Slice Memory size– ???

»TCAM: Generated by Control Used by:

LookupA Contains:

Key: Result:

Page 74: SPP V2 Router Design

74 - Mike Wilson - 04/21/23

Data Areas Shim between SideA and SideB

»Written to DRAM Buffer to be sent from SideA to SideB»Contains:

resultIndex (32b): Generated by Control Result of TCAM lookup on SideA Translates into an SRAM Address on SideB

exceptionBits (12b) Generated by SideA Parse/Lookup Used by:

– SideB HF statsIndex (16b)

Generated by Control Result of TCAM lookup on SideA Used by:

– SideA Lookup/AddShim to increment counters– SideB Lookup/Copy to increment PreQ Cntrs (or perhaps SideA is the PreQ cntrs)– SideB HF or QM to increment PostQ Cntrs

sliceId (12b) Generated by Control Result of Decap read of Ethernet hdr (VLAN) Used by:

– ??? codeOption (4b) Slice Memory Ptr (32b)

Page 75: SPP V2 Router Design

75 - Mike Wilson - 04/21/23

Data Areas SideB

»Data Buffer Descriptor»Hdr Buffer Descriptor

Used for multi-copy packets SPP V2 may require Tx to handle multi-buffer packets.

It is unclear if we can cleanly do that same thing that we do with ONL where HF passes the Ethernet header to Tx.

We may also need to have support for MR specific per copy data»Results Table

Generated by Control Used by:

LookupB/Copy HF

– Should HF get its per copy info from here as well. Contains:

Fanout (if fanout is > 1 we can overload some of the following fields with a pointer into a Fanout table)

QID InterfaceId TxMI Id

– Probably doesn’t help to make it an index into a table for UDP Tunnels since UDP Port is 16 bits

– But for tunnels other than UDP tunnels it may help? TX NextHop Id

– Index into a table of Tunnel Next Hops

Page 76: SPP V2 Router Design

76 - Mike Wilson - 04/21/23

Data Areas (continued)SideB (continued)

»Fanout Table Generated by Control Used by:

LookupB/Copy HF

Contains: QID[Fanout] InterfaceId TxMI Id Tx Next Hop ID[Fanout]

Implementation Choices: One contiguous block of memory

– Fixed size or variable sized Chained with one set of values per entry Chained with N (N=4?) sets of values per entry