33
VPP Host Stack Transport and Session Layers Florin Coras, Dave Barach

VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

  • Upload
    others

  • View
    33

  • Download
    2

Embed Size (px)

Citation preview

Page 1: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host StackTransport and Session Layers

Florin Coras, Dave Barach

Page 2: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Motivation

FD.io Mini-Summit at KubeCon North America 2018

App 1

send()

App 2

recv()

user space

kernelFIFO

TCP

IP

device

FIFO

TCP

IP

device

Page 3: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Motivation

FIFO

App 1

TCP

IP

device

FIFO

TCP

IP

device

FIFO

device

FIFO

device

tap dpdk tap

FD.io Mini-Summit at KubeCon North America 2018

L2, L3

user space

kernel

App 2

recv()send()

VPP

Page 4: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Why not this?

App 1

FD.io Mini-Summit at KubeCon North America 2018

App 2

dpdk, L2, L3

TCP

FIFO

user space

kernel

VPP

Page 5: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

App

rx txmq

VPP

Page 6: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: Session Layer

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

App

rx txmq§ App-interface sub-layer maintains per app state

and offers support for conveying session events§ Allocates and manages sessions/segments/fifos§ Handles segmentation of data into buffers

before sending it to transport protocols§ Binary/native C API for external/builtin

applications

Page 7: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: Session Layer

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

App

rx txmq

§ Session lookup tables (5-tuple) and local/global session rule tables (filters)

§ Support for pluggable transport protocols§ Can do tx-pacing if transport asks for it§ Offers API for enqueueing data for the apps§ Isolates network resources via namespacing

§ App-interface sub-layer maintains per app state and offers support for conveying session events

§ Allocates and manages sessions/segments/fifos§ Handles segmentation of data into buffers

before sending it to transport protocols§ Binary/native C API for external/builtin

applications

Page 8: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: SVM

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

App

rx txmq

§ Shared memory segments:§ Allocated by the app-interface sub-layer

and mapped by applications§ Preferred without file backing (memfd).

Support for segments with file backing (shm) will be deprecated

Page 9: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: SVM

FD.io Mini-Summit at KubeCon North America 2018

§ message queue§ Allocated in first shared memory segment § Has two rings for control and io events

from vpp to app§ Supports condvar or eventfd notifications

§ fifos§ Allocated within shared memory segments § Fixed position and size§ Lock free enqueue/dequeue but atomic

size increment§ Option to dequeue/peek data§ Support for out-of-order data enqueues§ Access data as segments

Session

TCP

IP, DPDK

App Interface

App

rx txmq

Page 10: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: TCP

FD.io Mini-Summit at KubeCon North America 2018

§ Clean-slate implementation

§ “Complete” state machine implementation

§ Connection management and flow control

(window management)

§ Timers and retransmission, fast retransmit,

SACK

§ NewReno and Cubic congestion control, SACK

based fast recovery

§ Tx pacing

§ Checksum offloading

§ Linux compatibility tested with IWL TCP

protocol tester

Session

TCP

IP, DPDK

App Interface

App

rx txmq

Page 11: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: Comms Library (VCL)

FD.io Mini-Summit at KubeCon North America 2018

§ Apps can directly use the raw session layer APIs

but then need to:

§ Manage binary api and message queue

interaction with vpp

§ Maintain session state, potentially deal

with thread safety

§ Implement async communication

mechanisms

Session

TCP

IP, DPDK

App Interface

App

rx txmq

Page 12: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: Comms Library (VCL)

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

§ VPP Comms library (VCL)

§ Manages interaction with session layer

§ Abstracts sessions to integer session

handles

§ Exposes epoll/select/poll functions

§ Supports multi-threaded and multi-process

applications

§ Can handle mq notifications with both

mutex-condvar pair and eventfds

App Interface

VCL

rx txmq

App

Page 13: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

VPP Host Stack: Comms Library (VCL)

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

§ VPP Comms library (VCL)

§ Manages interaction with session layer

§ Abstracts sessions to integer session

handles

§ Exposes epoll/select/poll functions

§ Supports multi-threaded and multi-process

applications

§ Can handle mq notifications with both

mutex-condvar pair and eventfds

App Interface

LDP + VCL

rx txmq

§ LDP library § Uses LD_PRELOAD to intercept and

redirect syscalls to VCL

§ Manages fd to session handle translation

§ When it works, it requires no changes to applications

§ Do not expect it to always work

App

Page 14: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Application Attachment

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

App

§ Connect to binary API§ Request attachment to

session layer

Page 15: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Application Attachment

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

App

mq

§ Session layer allocates shared segment and message queue

§ App mounts segment and starts listening for events on msg queue

§ Connect to binary API§ Request attachment to

session layer

Page 16: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Session Establishment

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

server

mq

Session

TCP

IP, DPDK

App Interface

client

mqconnect bindbound notification

Page 17: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Session Establishment

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

server

mq

Session

TCP

IP, DPDK

client

mq

App Interface

handshake

App Interface

rx tx rx txallocate session and fifosconnected notification

allocate session and fifosaccepted notification

Page 18: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Data Transfer

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

server

mq

Session

TCP

IP, DPDK

client

mq

Congestion controlReliable transport

rx tx rx txenqueue to fifotx io event

enqueue to fiforx io notification

App Interface

poll tx eventsdequeue to bufferadd tcp header

App Interface

Page 19: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Data Transfer

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

server

mq

Session

TCP

IP, DPDK

client

mq

Congestion controlReliable transport

rx tx rx tx

enqueue to fiforx io notification

App Interface App Interface

Some rough numbers on a E2699 w/XL710: ~36Gbps/core (1.5k MTU) half-duplex!

enqueue to fifotx io event

poll tx eventsdequeue to bufferadd tcp header

Page 20: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Redirected Connections (Cut-through)

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

client server

connect bind

Page 21: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Redirected Connections (Cut-through)

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

client server

cmq

acceptedconnected

§ App interface sub-layer:§ Tracks the sessions§ Allocates ssvm segment for fifos

and message queues for events§ Asks peers to map segment

smq

tx/rx

rx/tx

Page 22: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Redirected Connections (Cut-through)

FD.io Mini-Summit at KubeCon North America 2018

Session

TCP

IP, DPDK

App Interface

client server

Throughput is around ~120Gbps half-duplex if receiver does not touch the data!

§ App interface sub-layer:§ Tracks the sessions§ Allocates ssvm segment for fifos

and message queues for events§ Asks peers to map segment

acceptedconnected

tx/rxrx/tx

cmq

smq

Page 23: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Multi-threading for stream connections

FD.io Mini-Summit at KubeCon North America 2018

Session

App

Session

DPDK

TCP

IP

TCP

IP

Core 0 Core 1

§ Connections/sessions ’pinned’ to a thread

§ Per-thread data structures/state

mq rx txrx tx

App Interface

Page 24: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Apps with multiple workers

FD.io Mini-Summit at KubeCon North America 2018

DPDK

TCP

IP

TCP

IP

Core 0 Core 1

§ App interface sub-layer allocates app workers

§ App workers have their own segment managers and message queues

§ Sessions are associated to app-workers

mq rx txrx tx

App Interface

mq

SessionSession

§ Applications can register multiple “app-workers”

§ Each worker gets its own message queue

app:wrk1 app:wrk2

Page 25: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Apps with multiple workers: VCL

FD.io Mini-Summit at KubeCon North America 2018

DPDK

TCP

IP

TCP

IP

Core 0 Core 1

mq rx txrx tx

App Interface

mq

SessionSession

§ VCL has API for apps to register a new worker thread

§ Sessions are pinned to a worker§ New workers are automatically

registered on app fork§ Forked children automatically

”share” the parent’s sessions

app:wrk1 app:wrk2

Page 26: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

• Get the Code, Build the Code, Run the Code• Session layer: src/vnet/session• TCP: src/vnet/tcp• SVM: src/svm• VCL: src/vcl

• Read/Watch the Tutorials• Read/Watch VPP Tutorials• Join the Mailing Lists

FD.io Mini-Summit at KubeCon North America 2018

Next steps – Get involved

Page 27: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Thank you!

FD.io Mini-Summit at KubeCon North America 2018

? Florin Corasemail:([email protected])irc: florinc

Page 28: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Data Transfer: Dgram Transports

FD.io Mini-Summit at KubeCon North America 2018

Session

UDP

IP, DPDK

server

mq

Session

UDP

IP, DPDK

client

mq rx tx rx txenqueue to fifo with dgram headertx io event

enqueue with dgram hdr to fiforx io notification

App Interface

poll tx eventsdequeue to bufferadd proto header

App Interface

Page 29: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Features: Namespaces

FD.io Mini-Summit at KubeCon North America 2018

Session

App

TCP

VPP

Session

TCP

Session

TCP

IP IP IP

ns1 ns2 ns3

fib1 fib2

Request access to vpp ns + secret

Namespaces are configured independently and associate applications to network layer resources like interfaces and fib tables

App Interface

Page 30: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Features: Session Tables

FD.io Mini-Summit at KubeCon North America 2018

NS Local Session Table

TCP

NS Local Session Table

TCP

ns1 ns2

fib1

Global Session Table

App1

Request access to global and/or local scope

App Interface

Page 31: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

Features: Session Tables

FD.io Mini-Summit at KubeCon North America 2018

NS Local Session Table

TCP

NS Local Session Table

TCP

ns1 ns2

fib1

Global Session Table

§ Both table have “rules table” that can be used for filtering

§ Local tables are namespace specific and can be used for egress filtering

§ Global tables are fib table specific and can be used for ingress filtering

App1

App Interface

Page 32: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

TLS App

FD.io Mini-Summit at KubeCon North America 2018

App

TLS App

App Session

TCP

TLS Engine(openssl, mbedtls)

TLS context

rx tx

rx tx

§ TLS App registers as transport at VPP init time

§ TLS protocol implementation handled by plugin “engines”. We support openssl and mbedtls

§ Client app registers key and certificate via api and requests tls as session transport

§ CA certs read at TLS app init time. Defaults to reading /etc/ssl/certs/ca-certificates.crt

§ Ping and Ray from Intel working on accelerating the openssl engine with QAT cards

Page 33: VPP Host Stack - FD.io · VPP Host Stack: CommsLibrary (VCL) FD.io Mini-Summit at KubeCon North America 2018 Session TCP IP, DPDK § VPP Commslibrary (VCL) § Manages interaction

TLS App

FD.io Mini-Summit at KubeCon North America 2018

App

TLS App

App Session

TCP

TLS Engine(openssl, mbedtls)

TLS context

rx tx

rx tx

§ TLS App registers as transport at VPP init time

§ TLS protocol implementation handled by plugin “engines”. We support openssl and mbedtls

§ Client app registers key and certificate via api and requests tls as session transport

§ CA certs read at TLS app init time. Defaults to reading /etc/ssl/certs/ca-certificates.crt

§ Ping and Ray from Intel working on accelerating the openssl engine with QAT cards

Some rough OpenSSL numbers on a E2699: ~1Gbps/core (no hw accel)