27
TM October 2013

Freescale PowerPoint Templatecache.freescale.com/files/training/doc/dwf/DWF13_AMF_NET_T0355... · • T4xxx has a total of 50 Software Portals (SP), increase from 10 SP found in the

  • Upload
    lyduong

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

TM

October 2013

TM 2

• Data Path Acceleration Architecture (DPAA)

• Buffer Manager

• Queue Manager

• Frame Manager

• Accelerator Overview

− Security Engine

− Pattern Match Engine

− RapidIO Message Manager

− Decompression/Compression Engine

− Data Center Bridging

3 TM

QMan

Queue

Manager

BMan

Buffer

Manager

SEC

Security

Engine

PME

Pattern

Match

Engine

FMan

Frame

Manager

RMan

Rapid I/O

Manager

Cores

Rapid I/O

Messaging

and more

Ethernet

Collection of cores, HW

accelerators and bridges, tied

together by HW Buffer and

Queue Managers.

4 TM

Hardware Accelerators

FMAN

Frame

Manager

50 Gbps aggregate Parse,

Classify, Distribute

BMAN

Buffer

Manager

64 buffer pools

QMAN

Queue

Manager

Up to 224 queues

RMAN

Rapid IO

Manager

Seamless mapping sRIO

to DPAA

SEC

Security

40Gbps: IPSec, SSL

Public Key 25K/s 1024b

RSA

PME

Pattern

Matching

10Gbps aggregate

DCE

Data

Compression

20Gbps aggregate

Saving CPU Cycles for higher value work

Compress and

Decompress

traffic across the

Internet

Protects against

internal and

external Internet

attacks

Frees CPU from

draining repetitive

RSA, VPN and

HTTPs traffic

Identifies traffic

and targets CPU

or accelerator

New Enhanced

Line rate

50Gbps

Networking

Quality of Service

for FCoE in

converged data

center networking

5 TM

• Buffer: Unit of contiguous memory, allocated by software

• Frame: Single buffer or list of buffers that hold data, for example, packet payload, header, and other control information

• Frame Descriptor (FD): Proxy structure used to represent frames

• Frame Queue:

− FIFO of (related) Frame Descriptors (e.g. TCP session)

− The basic queuing structure supported by QMan

• Frame Queue Descriptor (FQD): Structure used to manage Frame Queues

Buffer

Buffer

Ethernet

Frame Pre-

amble

Dest

addr

Src

addr Type Data CRC

Buffer Buffer

FD

FD

FQD FD FD FD FD …

6 TM

Length

Simple Frame Multi-buffer Frame

(Scatter/gather)

D PID

BPID

Address

Offset

Status/Cmd

000

Buffer

Frame Descriptor

D PID

BPID

Address

Offset

Length

Status/Cmd

100

Frame Descriptor

Address

Length

BPID

Offset

00

Address

Length

BPID

Offset (=0)

00

Address

Length

BPID

Offset (=0)

01

Data

Data

Data

S/G List

Packet

0 1 2 3 4 5 6 7 8 9 1

0

1

1

1

2

1

3

1

4

1

5

1

6

1

7

1

8

1

9

2

0

2

1

2

2

2

3

2

4

2

5

2

6

2

7

2

8

2

9

3

0

3

1

D

D

LIODN

offset

BPID ELIO

DN

offset

- - - - addr

addr (cont)

Fmt Offset Length

STATUS/CMD

7 TM

FQD Selected Field Description:

• FQD_LINK: Link to the next FQD in a queue of FQDs, used for Work Queues

• ORPRWS: ORP Restoration Window Size

• OA: ORP Auto Advance NESN Window Size

• ODP_SEQ: ODP Sequence Number

• ORP_NESN: ORP Next Expected Sequence Number.

• ORP_EA_HPTR, ORP_EA_TPTR: ORP Early Arrival Head and Tail Pointer

• PFDR_HPTR, PFDR_TPTR : PFDR Head and Tail Pointer

• CONTEXT_A, CONTEXT_B: Frame Queue Context A and B

• STATE: FQ State

• DEST_WQ: Destination Work Queue

• ICS_SURP: Intra-Class Scheduling Surplus or Deficit.

• IS: Intra-Class Scheduling Surplus or Deficit identifier

• ICS_CRED: Intra-Class Scheduling Credit

• CONG_ID: Congestion Group ID

• RA[1-2]_SFDR_PTR: SFDR Pointer for Recently Arrived frame # 1 and 2

• TD_MANT, TD_EXP : Tail Drop threshold Exponent and Mantissa

• C: FQD in external memory or in cache (Qman 1.1)

• X: XON or XOFF for flow control command (Qman1.1)

0 1 2 3 4 5 6 7 8 9 1

0

1

1

1

2

1

3

1

4

1

5

1

6

1

7

1

8

1

9

2

0

2

1

2

2

2

3

2

4

2

5

2

6

2

7

2

8

2

9

3

0

3

1

- - - - - - - - FQD_LINK

OR

PR

WS

OA

ODP_SEQ ORP_NESN

OL

WS

ORP_EA_HSEQ ORP_EA_TSEQ

ORP_EA_HPTR ORP_EA_TPTR

ORP_EA_TPTR PFDR_HPTR

(cont…) PFDR_TPTR

CONTEXT_A

(cont…)

CONTEXT_B

FQ_CTRL

FE

R

ST

A

TE

DEST_WQ

ICS_SURP

IS ICS_CRED

BYTE_CNT

CONG_ID FRM_CNT

NR

A

OA

C

C RA1_SFDR_PTR X

IT - - - RA2_SFDR_PTR

NO

D

OAL OD1_SFDR_PTR OAL OD2_SFDR_PTR

NP

C

- - - OD3_SFDR_PTR - - - TD_MANT TD_EXP

8 TM

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

• FMan receives packets

− allocates internal buffers

− retrieves data from MAC

• BMI

− acquires a buffer from BMan

− uses DMA to store data in it

• Parse+classify+keygen select a queue and policer profile

• Policer “colors” and optionally discards frame

• QMan applies active queue management and enqueues frame

• Frame is enqueued to one of a pool of cores

• Available core dequeue FD for processing MAC

BMI

Parser

Classifier

Keygen

Policer

QMI

WRED

Enqueue

Dequeue

To

Memory

10GE GE GE GE GE

Frame Manager

(FMan) DMA

Policer Keygen

(Distribution)

Parser Classifier

QMI

BMI

Memory Buffer

Manager

Queue

Manager

D

WQ0

WQ1

WQ2

WQ3

WQ4

WQ5

WQ6

WQ7

Power Architecture™

Core

D-Cache I-Cache

L2 Cache

ENQ

FD

DEQ

Return

Buf Ptr

Request

Buffer

DDR

D

PKT

D

PKT

DDR

9 TM

• Maintains pools of buffers that have been provided by software.

− Manages up to 64 Buffer Pools

− Pools contain “tokens” that usually are (but need not be) physical addresses

of buffers.

− SW fills the pools with any token values it wants any time it wants.

− Typical: pool of little buffers, pool of medium buffers, pool of large buffers

− Can get interrupt if pool depletes.

• Cores, Frame Manager, Security Engine,… acquire these buffers by

requesting them from BMan.

− The requesting block must indicate which buffer pool it is requesting a

buffer from using Buffer Pool ID (BPID)

− BMan then removes buffers from one of its buffer pools and passes them to

the module making the request.

• Modules release buffers back to BMan, providing the token (physical

address) and buffer (BPID) in which to store it.

10 TM

• Standardized command interface to SW and HW

− Up to 50 Software Portals (T4240)

− Up to 6 Direct Connect Portals (T4240)

− Up to 64 separate pools of free buffers

• BMan keeps a small per-pool stockpile of buffer pointers in internal memory

− stockpile of 64 buffer pointers per pool, Maximum 2G buffer pointers

− Absorbs bursts of acquire/release commands without external memory access

− minimized access to memory for buffer pool management.

• Pools (buffer pointers) overflow into DRAM

• LIFO buffer allocation policy

− A released buffer is immediately used for receiving new data, using cache lines previously allocated

… … SW Portal DCP Portal

System Memory

……

Per-Pool

Buffer Stacks

Central Service

Sequencer

On chip Memory

……

Per-Pool

Stockpile

Com

fort Z

one

16-5

5

63 D

eep

Fetch/Flush

Sequencer

Depletion

Threshold

Free Pool

Low Watermark

Core FMan, SEC, PME, etc

CoreNet

11 TM

• Queue Manager acts as a central resource in the multicore datapath infrastructure, managing the queueing of data between:

− Cores (including IPC)

− Hardware offload accelerators

− Network interfaces – Frame Manager

• Queue management

− High performance interfaces (“portals”) for enqueue/dequeue

− Internal buffering of queue/frame data to enhance performance

• Congestion avoidance and management

− RED/WRED

− Tail drop for single queues and aggregates of queues

− Congestion notification for “loss-less” flow control

• Load spreading across processing engines (cores, HW accelerators)

• Order restoration, Order preservation/atomicity

• Delivery to cache/HW accelerators of per queue context information with the data (Frames)

FQD

Cache

Queue Manager

(QMan)

FMan

FMan

SEC

PME …

FD

Memory

Queuing

Engines

Software Portals

CoreNet To Cores

Dire

ct C

on

ne

ct p

orta

ls

DCE

RMan

12 TM

• T4xxx has a total of 50 Software Portals (SP), increase from 10 SP found in the P class processors.

• Supports Customer Edge Egress Traffic Management (CEETM) that provides hierarchical class based scheduling and traffic shaping:

− Available as an alternate to FQ/WQ scheduling mode on the egress side of specific direct connect portals

− Enhanced class based scheduling supporting 16 class queues per channel

− Token bucket based dual rate shaping representing Committed Rate (CR) and Excess Rate (ER)

− Congestion avoidance mechanism equivalent to that provided by FQ congestion groups

• A total of 48 algorithmic sequencers are provided, allowing multiple enqueue/dequeue operations to execute simultaneously.

• Support up to 295M enqueue/dequeue operations per second.

13 TM

Each Frame Manager (FMan) supports:

• Two 10GE MAC and 6GE MACs

− Ethernet & HiGig support

− Rmon/ifMIB stats, EEE, 802.3bf, DCB (PFC, ETS)

• Header Parsing

− L2/L3/L4 parse and validate (checksum)

− User defined protocols supported

• Classification

− Exact match classification for selected traffic

• Distribution

− Based on exact match or hash based queue selection for load spreading

− Can store packet to guest specific private memory based on PCD

• Policing

− Color aware dual rate, 3 color

BMI

Shared

Memory

Buffer

DMA

MACs

Scheduler

Classifier

1GE 1GE

10GE

QMI

Parser

Key

Gen

Policer

IC

Frame

Frame

QMan

BMan

10GE

1GE 1GE

1GE

1GE

14 TM

• Performs parsing of common L2/L3/L4 headers, including tunneled protocols

• Can be augmented by the user to parse other standard protocols

• Can also parse proprietary, user-defined headers at any layer:

− Self-describing, using standard fields such as proprietary Ethertype, Protocol ID, Next Header, etc.

− Non-Self-Describing through configuration.

• Parse results, including proprietary fields, can be used by the classifier, and/or software.

• Soft parse can modify any field in parse results

15 TM

B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1x

Logic

al P

ort ID

Sh

imR

L2R L3R L4

R

Cla

ssific

atio

n

Pla

n ID

Nxt

Hdr

Running

Sum

Flags,

Flag

offset

Rout T

ype

RH

P

2x

Sh

im

Offs

et_

1

Shim

Offs

et_

2

IP P

id o

ffset

Eth

Offs

et

LLC

+S

nap

Offs

et

VL

AN

TC

I

offs

et_

1

VLA

NT

CI

offs

et_

n

LastE

TypeO

ffset

PP

PoE

Offs

et

MP

LS

Offs

et_

1

MP

LS

Offs

et_

n

IPO

ffset_

1

IPO

ffset_

n

GR

E

Offs

et

L4 O

ffset

NxtH

dr

offs

et

The Frame Manager Parser stores all parsing results in the Parse Array.

Ethernet

VLAN

Incoming frame

Other L3

DA EthType

8100 VLAN ... TOS ... PID ... SIP DIP SPORT DPORT .... SA

EthType

0800

Parse Result Array (simplified)

EthType

<UDF>

UDF

Length

UDF

Content

compare ethtype to UDF

ethtype

read UDF length

write UDF result

write UDF offset

write parse continue offset

IPv4

TCP/UDP

16 TM

• After parsing is complete, the user can identify the proper action to be taken.

• Hash based: Spreading and distribution of frames according to hash performed on selected fields of the frame.

− Performed by the KeyGen module

− Allows a hash key to be built from many different fields in the frame to provide

− differentiation between flows

− A key can be built exclusively from fields present in the frame based on what the parse function has detected. Alternatively, default values can be used if fields are not present in the frame.

− Hashes selected fields in the frame as part of a spreading mechanism

− The result is a specific frame queue identifier (FQID). To support added control, this FQID can be indexed by values found in the frame, such as TOS or p-bits, or any other desired fields.

• Table Lookup: Action to take on frame is determined by table lookup on selected fields of the frame.

17 TM

UDP Flow Definition

Preamble

7B

….

Start of Fr

1B

Dest MAC

6B

Src MAC

6B

EThertype

2B

For packets with TCP and IPv4 headers, a flow can be defined as the 5-tuple:

• IPv4 protocol, IPv4 source, IPv4 destination, TCP source port, and TCP destination port

For packets with UDP and IPv4 headers, a flow can be defined as the 5-tuple:

• IPv4 protocol, IPv4 source, IPv4 destination, UDP source port,UDP destination port

Version

0x45

DiffServe

1B

Length

2B

ID

2B

Frag Offset

2B

TCP Flow Definition

TTL

1B

Protocol

0x11

Hdr Check

2B

IP Source

4B

IP Dest

4B

UDP SPort

2B

UDP DPort

2B

Preamble

7B

….

Start of Fr

1B

Dest MAC

6B

Src MAC

6B

EThertype

2B

Version

0x45

DiffServe

1B

Length

2B

ID

2B

Frag Offset

2B

TTL

1B

Protocol

0x11

Hdr Check

2B

IP Source

4B

IP Dest

4B

TCP SPort

2B

TCP DPort

2B

18 TM

• 32 Key Generation schemes

− Direct Method

Used for port based or post coarse

classification

− Indirect Method

Based on the parsed protocol stack of the

frame and the source port

The presence (or absence) of valid headers can

direct the scheme used

• 256 Classification Plans

− Indicates which fields of a parsed packet are

of interest for key generation

Build Key

Hash Key

Mask Bits

FQ Base

Addition

Logical ‘OR’

with Special

Fields

Parser Result/Frame

56 byte Key

8 byte Hash Key

24 bit field

24 bit FQID

Classification

Plan/LCV

Exact Key for

Table Lookup

19 TM

• After parsing is complete, the user can identify the proper action to be taken.

• Hash based: Spreading and distribution of frames according to hash performed on selected fields of the frame.

• Table Lookup: Action to take on frame is determined by table lookup on selected fields of the frame

− Performed by the FMan Controller module

− Looks up certain fields in the frame to determine subsequent action to take, including policing. The FMan contains internal memory that holds small tables for this purpose.

− Classification look-ups are performed based on the combination of user configuration and what fields the Parser actually encountered.

− Look-ups can be chained together such that a successful look-up can provide key information for a subsequent look-up. After all the look-ups are complete, the final classification result provides either a hash key to use for spreading, or a FQID directly.

20 TM

• FMan key new features for QorIQ T4 processors

− Six 1G/2.5G multirate Ethernet MACs (mEMACs) per Frame Manager

− Two 10G multirate Ethernet MACs (mEMACs) per Frame Manager

− QMan interface: Supports priority based flow control message pass from Ethernet MAC to Qman

− Comply with IEEE 803.3az (Energy efficient ethernet) and IEEE 802.1QBbb, in addition of IEEE Std 802.3®, IEEE 802.3u, IEEE 802.3x, IEEE 802.3z, IEEE 802.3ac, IEEE 802.3ab, and IEEE-1588 v2 (clock synchronization over Ethernet)

− Port Virtualization: Virtual Storage profile (SPID) selection after classification or distribution function evaluation.

− Rx port multicast support.

− Egress Shaping.

− Offline port: able to copy the frame into new buffers and enqueue back to the QMan.

21 TM

Public Key Hardware Accelerators (PKHA) ~25K RSA Ops/sec (1024b) RSA and Diffie-Hellman (to 4096b) Elliptic curve cryptography (1023b)

Data Encryption Standard Accelerators (DESA) ~15Gbps DES, 3DES (2K, 3K) ECB, CBC, OFB modes

Advanced Encryption Standard Accelerators (AESA) ~40Gbps Key lengths of 128-, 192-, and 256-bit ECB, CBC, CTR, CCM, GCM, CMAC, OFB, CFB, and XTS

ARC Four Hardware Accelerators (AFHA) ~7.5Gbps

Compatible with RC4 algorithm Message Digest Hardware Accelerators (MDHA) ~40Gbps

SHA-1, SHA-2 256,384,512-bit digests MD5 128-bit digest HMAC with all algorithms

Kasumi/F8 Hardware Accelerators (KFHA) ~9Gbps F8 , F9 as required for 3GPP A5/3 for GSM and EDGE GEA-3 for GPRS

Snow 3G Hardware Accelerators (STHA) ~12Gbps Implements Snow 3.0

ZUC Hardware Accelerators (ZHA) ~14Gbps Implements 128-EEA3 & 128-EIA3

CRC Unit~40Gbps Standard and user defined polynomials

Random Number Generator, random IV generation

Supports protocol processing for the following: •IPSec •802.1ae (MACSEC) •SSL/TLS/DTLS •3GPP RLC •LTE PDCP •SRTP •802.11i (WiFi) •802.16e (WiMax)

Job Queue

Controller

Descriptor

Controllers

CHAs

DM

A

RT

IC

Queue

Interface

Job Ring I/F

22 TM

• Regex support plus significant extensions:

− Patterns can be split into 256 sets each of which can contain 16 subsets

− 32K patterns of up to 128B length

− 9.6 Gbps raw performance

• Combined hash/NFA technology

− No “explosion” in number of patterns due to wildcards

− Low system memory utilization

− Fast pattern database compiles and incremental updates

• Matching across “work units”

− Finds patterns in streamed data

• Pipeline of processing

− PME offers pipeline of filtering, matching, and behavior base engine for complete pattern matching solution

On-Chip

System

Bus

Interface

Pattern

Matcher

Frame

Agent

(PMFA)

Data

Examination

Engine

(DXE)

Stateful

Rule

Engine

(SRE)

Key

Element

Scanning

Engine

(KES)

Hash

Tables

Access to Pattern Descriptors and State

Pattern Matching Engine components

Cache Cache

User Definable Reports

Cor

eNet

B

Ma

n

QM

an

23 TM

• Many queues allow multiple inbound/outbound queues per core

− Hardware queue management via QorIQ Data Path Architecture (DPAA)

• Supports all messaging-style transaction types

− Type 11 Messaging

− Type 10 Doorbells

− Type 9 Data Streaming

• Enables low overhead direct core-to-core communication

Core Core Core Core

10G SRIO

QorIQ or DSP

Core Core Core Core

10G SRIO

QorIQ or DSP

Type9 User PDU

Channelized CPU-

to-CPU transport Device-to-Device

Transport

MSG User PDU

24 TM

• Deflate

− As specified as in RFC1951

• GZIP

− As specified in RFC1952

• Zlib

− As specified in RFC1950

− Interoperable with the zlib 1.2.5 compression library

• Encoding

− supports Base 64 encoding and decoding (RFC4648).

• Operate up to 600Mhz

− 10Gbps Compress

− 10Gbps Decompress

− 20Gbps Aggregate

32KB

History

Frame

Agent

QMan

I/F

BMan

I/F

Bus

I/F

Decompressor

Compressor

QMan

Portal

BMan

Portal

To

Corenet

4KB

History

25 TM

• QMan 1.2 (e.g. QorIQ T42xx) supports Data Center Bridging (DCB).

• DCB refers to a series of inter-related IEEE specifications collectively designed to enhance Ethernet LAN traffic prioritization and congestion management.

• DCB can be used in:

− Between data center network nodes:

− LAN/network traffic

− Storage Area Network (SAN) (e.g. Fiber Channel (loss sensitive )) and,

− IPC traffic (e.g. Infiniband (low latency))

• The DPAA is compliant with the following DCB specifications (traffic management related) :

− IEEE Std. 802.1Qbb: Priority-based flow control (PFC)

To avoid frame loss, a PFC Pause frames can be sent autonomously by HW.

− IEEE Std. 802.1Qaz: Enhanced transmission selection (ETS)

Support Weighted bandwidth fairness.

26 TM

ETS CoS based

Bandwidth Management

• Enables Intelligent sharing of

bandwidth between traffic classes

control of bandwidth

• 802.1Qaz

10 GE Realized Traffic Utilization

3G/s HPC Traffic

3G/s

2G/s

3G/s Storage Traffic

3G/s

3G/s

LAN Traffic

4G/s

5G/s 3G/s

t1 t2 t3

Offered Traffic

t1 t2 t3

3G/s 3G/s

3G/s 3G/s 3G/s

2G/s

3G/s 4G/s 6G/s

Priority Flow Control

• Enables lossless behavior

for each class of service

• PAUSE sent per virtual lane

when buffers limit exceeded

• IEEE 802.1Qbb

Transmit Queues Ethernet Link

Receive Buffers

Zero Zero

One One

Two Two

Five Five

Four Four

Six Six

Seven Seven

Three Three STOP PAUSE Eight

Virtual

Lanes

TM