31
5 th Annual Application of Multicore I/O Processors in Virtualized Data Centers Virtualized Data Centers Nabil Damouny Rolf Neugebauer ESC – Multicore Expo San Jose, CA April 27 2010 April 27, 2010

Multicore I/O Processors In Virtual Data Centers

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Multicore I/O Processors In Virtual Data Centers

5th Annual

Application of Multicore I/O ppProcessors in

Virtualized Data CentersVirtualized Data CentersNabil Damouny

Rolf Neugebauer

ESC – Multicore ExpoSan Jose, CA

April 27 2010April 27, 2010

Page 2: Multicore I/O Processors In Virtual Data Centers

5th Annual

Outline Networking Market Dynamics Cloud Computing & the Virtualized Data Center The Need for an Intelligent I/O Coprocessor I/O Processing in Virtualized Data centers

1. SW-based (Bridge & vSwitch) 2. I/O Gateway3. Virtual Ethernet Port Aggregation (VEPA)4. Server-based

I/O Coprocessor Requirements Meeting the I/O Coprocessor Challenge in Virtualized Data Centers Heterogeneous Multicore Architecture Netronome’s Network Flow Processors and Acceleration Cards

Summary and Conclusion.

Data center virtualization is not complete until the I/O subsystem is also virtualized.

2ESC Silicon Valley – April, 2010 2

y

2

Page 3: Multicore I/O Processors In Virtual Data Centers

5th Annual

About Netronome• Fabless semiconductor company, developing Network Flow Processing

solutions for high-performance, programmable, L2-L7 applications

• Network coprocessors for x86 designs• Most complex processing per packet than any other architecture• Best in class performance per watt• Unmatched integration with x86 CPUs

• Family of products including processors, acceleration cards, development tools, software libraries and professional services

Intel Agreements Summary• Founded in 2003

• Solid background in networking, communications, security,voice and video applications, high-performance computing

• Comprised of networking and silicon veterans

Intel Agreements Summary• IXP28XX Technology License• SDK Software License• HDK Hardware License

S l d M k ti• Global Presence

• Boston, Massachusetts; Santa Clara, California; Pittsburgh,Pennsylvania; Cambridge, United Kingdom; Shenzhen, China;Penang, Malaysia

• Sales and Marketing• QPI Technology License

33ESC Silicon Valley – April, 2010

3

Page 4: Multicore I/O Processors In Virtual Data Centers

5th Annual

Networking Market DynamicsNetworking Market DynamicsEventually, every packet from every flow of communications Application

Market Drivers

of communications services will be intelligently processed.

ApplicationAwarenessEmail, Web, Multimedia

ContentInspection

Voice. Video, Data,Executables

IntegratedSecurity

VPN, SSL, Spam,Anti-Virus, IDS/IPS, ExecutablesFirewall

IncreasingB d idth

DeviceIntelligent

NetworkingS it hi R tiBandwidth

Millions of packets and flows at 10GigE

and beyond

VirtualizationMulticore, Multi-OSMulti-app, Multi-I/O

Switching, Routing,WiMax, 3GPP LTE,Security Blades &Appliances, Data

Center Servers

Source: Morgan Stanley

Increasing Bandwidth, Greater Security Requirements and the need for Application and Content-aware Networking are Driving the Evolution to

Intelligent Networking (L2-L7) from Today’s Simpler (L2-L3 only) Networks.

4ESC Silicon Valley – April, 2010 4

4

Page 5: Multicore I/O Processors In Virtual Data Centers

5th Annual Unified Computing in Virtualized Data centers .…

Requires Intelligent NetworkingRequires Intelligent Networking Unified Computing: The convergence of computing, networking, and storage in a

virtualized environment Applies to the enterprise (private or internal) and service providers

Environment: Uncorrelated high I/O data rates Networking Networking Web servers, especially virtualized servers Unified Computing - combination of servers and networking

Requirements for high-performance intelligent networking I/O coprocessing for multicore IA/x86 to scale applications Intelligent flow-based switching for inter-VM communications Manage complex high performance networking interfaces

The advent of many VMs and the need for IOV creates a new set of requirements that mandates a more intelligent approach for managing I/O.

55ESC Silicon Valley – April, 2010

5

Page 6: Multicore I/O Processors In Virtual Data Centers

5th Annual Cloud Computing … Definition & Services

Cloud Computing Defined: IT-related capabilities are provided “as a service” using Internet

technologies to multiple external customers. P blic Clo ds Public Clouds

Private Clouds

Types of services available in Cloud Computing: Software-as-a-service: Software applications delivered over the

Web

Infrastructure- as-a-service: Remotely accessible server and storage capacitystorage capacity

Platform- as-a-service: compute-and-software platform that lets developers build and deploy Web applications on a hosted infrastructureinfrastructure.

Cloud computing technologies play a crucial role in allowing companies to scale their data center infrastructure to meet performance and TCO requirements.

6ESC Silicon Valley – April, 2010 6

6

Page 7: Multicore I/O Processors In Virtual Data Centers

5th Annual

The Need for an I/O Coprocessor … In the Virtualized data Center… In the Virtualized data Center

Efficient delivery of data to VMs at high rates (20+ Gbs) Requires intelligent IOV solution.Requires intelligent IOV solution.

Just L2+ processing is not enough VLANs, ACLs, etc only cover the base Stateful load-balancing requires flow-awareness

Clouds are hostile environments: Stateful firewalls, IPS/IDS, deep packet inspection capabilities

Multicore x86 CPUs show poor packet processing performance

A it bl f h dli illi f t t f l fl Are unsuitable for handling millions of stateful flows Have high power consumption

Introduce an intelligent I/O-Coprocessor to assist x86 Multicore CPUs

7ESC Silicon Valley – April, 2010 7

7

Page 8: Multicore I/O Processors In Virtual Data Centers

5th Annual

IDC … on I/O VirtualizationVirtualization

“If I/O is not sufficient, then it could limit all the gains brought about by the virtualization process”about by the virtualization process

I/O subsystem needs to deliver peak throughput and lower latency to the VMs and to the applications they host.

As the VM density increases, most customers are scaling I/O capacity by installing more adapters.

IOV is simply the abstraction of the logical details of I/O from the physical, essentially to separate the upper-layer protocols from the physical connection or transport.

If I/O is not sufficient, then it could limit all the gainsbrought about by the virtualization process

8ESC Silicon Valley – April, 2010 8

8

Page 9: Multicore I/O Processors In Virtual Data Centers

5th Annual I/O Coprocessor in a Virtualized Heterogeneous Multicore Architecture

Multicore CPU Multicore CPUVM1

OS

VM2OS

VM3

OS

VMnOS

VM1

OS

VM2OS

VM3

OS

VMnOS

VNICVNIC VNIC VNIC VNICVNIC VNIC VNIC

x86x86Chipset Control plane

PCIe Gen2

I/O C

Data planePCIe Gen2

IOV10GE 10GE

I/O CoprocessorHigh-speed

Serial interface

9

Interlaken * Future

ESC Silicon Valley – April, 2010 99

Page 10: Multicore I/O Processors In Virtual Data Centers

5th Annual I/O Coprocessor Requirements in a Heterogeneous Multicore Architecture

Addressing the Inter-VM Switching and I/O Challenge

Inter-chip access•Demultiplexing and classification

TCP ffl dore

6 • TCP offload

• Host offload for burdensome I/O, security, DPI functions

Mul

tico

x86

IOV• Zero copy, big block transfers to multiple cores, VMs or endpoints

• Full I/O virtualization with Intel

Mul

ticor

ew

Pro

cess

or

IOV

VTd

• Programmable egress traffic management

MFl

ow

Heterogeneous Multicore Processing Solutions are >4x performance of (Multicore x86 + standard NIC).

1010

p ( )

ESC Silicon Valley – April, 2010 10

Page 11: Multicore I/O Processors In Virtual Data Centers

5th Annual Challenges in Virtualized Data Centers

Rack of single core servers and switches

5 years ago

Many virtual machines and cores in one server

2004 2009

What was a rack of servers five years ago is now a single server including

networking (switch, IPS, FW..)

11

2004 2009 g ( )

Many cores results in 10’s of VMs and network I/O challenge.

11

ESC Silicon Valley – April, 2010 11

11

Page 12: Multicore I/O Processors In Virtual Data Centers

5th Annual

IEEE 802.1 Addressing Ethernet Virtualization in data CenterVirtualization in data Center

Current IEEE 802.1Q Bridges Do not allow packet to be sent back to same port within same VLAN

D t h i ibilit i t id tit f i t l VM ithi h i l t ti Do not have visibility into identity of virtual VMs within physical stations Extensions to Bridge and End Station behaviors needed to support

virtualization

Q ( ) / ( IEEE 802.1Qbg EVB (Edge Virtual Bridging), VEB/VEPA (Virtual Ethernet Bridge / Virtual Ethernet Port Aggregation) & 802.1Qbh Bridge Port Extension (PE) Address management issues created by the explosion in VMs in data

centers sharing access to network through embedded bridgecenters – sharing access to network through embedded bridge

Discuss methods to offload policy, security, and management processing from virtual switches on NICs and blade servers, to physical Ethernet switchesswitches

Managing Network I/O and Inter-VM Switching will Require Various Implementation Alternatives

1212

p

Page 13: Multicore I/O Processors In Virtual Data Centers

5th Annual

OpenFlow Switching / vSwitch

OpenFlow Switching includes: Flow Tables used to implement packet processing

OpenFlow protocol used to manipulate the flow entries OpenFlow protocol used to manipulate the flow entries. Enables acceleration of stateful security functions: Application VM with associated security VM (e.g. FW, IPS, anti-virus). Network traffic will be classified and transit the security VM prior to being y p g

allowed to reach the application VM. If new flow has been “blessed” pass packets straight to App VM. Flow based policies for white/black lists (not just L2)

Software-based virtual switches will have difficulty coping with: Large numbers of flows per second; Many packets per second, i.e. high throughput at small packet sizes;

A i l l t Assuring low latency.

Network Flow Processors architecture fits well with OpenFlow.

13ESC Silicon Valley – April, 2010 13

15

Page 14: Multicore I/O Processors In Virtual Data Centers

5th Annual

1A. Software-Based Switching (Bridge) in Virtual Server(Bridge) in Virtual Server

Software virtual switchVMWare, Xen & Linux Bridge

(initially had noACL’ VLAN’ t)ACL’s, VLAN’s support)

VMWare and Xen putVMWare and Xen put switches as software

modules in their VMM - but they lacked key features, and

were slow!

14

were slow!

1413

Page 15: Multicore I/O Processors In Virtual Data Centers

5th Annual 1B. Enhanced Software-Based Switching (vSwitch) in Virtual Server

Cisco Nexus 1000VCisco Nexus 1000VCisco Nexus 1000V (ACLs, VLANs, IOS)

for VMWare; OpenVSwitch (flow

Cisco Nexus 1000V (ACLs, VLANs, IOS)

for VMWare; OpenVSwitch (flow

based) for XenServerbased) for XenServer

But with added functionality the performance reduces

15

the performance reduces hugely - what happens if FW

and IPS are added?Example:Cisco Nexus N1000

Good Solution for low-performance systems. High Latency

1515

Good Solution for low performance systems. High Latency

ESC Silicon Valley – April, 201014

Page 16: Multicore I/O Processors In Virtual Data Centers

5th Annual

2. I/O Gateway

Delivers Three Key Functions:• In‐rack server communications switch 

l t f k Eth t it h• replaces top‐of‐rack Ethernet switch• 10/20Gbps PCIe fabric 

• Centralized enclosure for I/O adapters used by servers in the rackSource: Aprius

Note: Xsigo Next I/O Virtensys• shared (network, storage)• assigned (specialty accelerators)

• Virtualized I/O configuration 

Note: Xsigo, Next I/O, Virtensysuse similar concepts

New approach using PCIe or Infiniband interconnects,and security functions within gateway

1616

and security functions within gateway

ESC Silicon Valley – April, 201016

Page 17: Multicore I/O Processors In Virtual Data Centers

5th Annual 3. Virtual Ethernet Port Aggregation (VEPA)

Offloads policy, security and management processing from virtual switches on NICs and blade servers, i t h i l Eth t it h (into physical Ethernet switches (e.g. ToR switch)

IEEE VEPA is an extension to physical and virtual switching

VEPA allows VMs to use external switches to access features like ACLs, policies, VLAN assignments.

All Inter-VM traffic has to traverse the physical network infrastructure. Additional security features, load balancers

etc. implemented in external appliances

17ESC Silicon Valley – April, 2010 17

17

Page 18: Multicore I/O Processors In Virtual Data Centers

5th Annual 4. Moving Switching Into The Server

Switch moved from IA/x86 into

Netronome NFP-32xx

Moving the switching to Netronome based Coprocessor leads to release of cycles on IA and increased application performance Adding IPS or FW is no

18

performance. Adding IPS or FW is no problem!

Server-based NIC or LoM - Use Existing Wiring. Security processing in the Server

18

18

Server based NIC or LoM Use Existing Wiring. Security processing in the Server

18

Page 19: Multicore I/O Processors In Virtual Data Centers

5th Annual

Intelligent I/O Sharing Alternatives; SummaryAddressing Inter-VM Switching and the Network I/O Challenge

Software-based switch I/O Gateway VEPA Server-based

switch

P fVery good – except

Performance Poor Very good for inter-VM switching

Very good

Power Poor Wastes IA Cycles Good Good Good

U l t d dManagement Network or server

admin

Unclear – standard if I/O Gateway

implements a switch

Network admin owns

Depends who owns the switch

Security Software-basedAdds Latency

Centralized. Adding security increases

Centralized. Adding security increases Centralized

+Distributedy Adds Latency cost and latency cost and latency +Distributed

Flexibility High Depends on architecture

Medium –standard switch High

Reliability Low Good Good Good Di t ib t dReliability Distributed

Cost Less costly but wastes IA cycles

<VEPA: Card in server <CNA & ToR Sw part of Gateway

Low, but higher for intelligent ToR

switches

<VEPA – card is same as CNA in ToR. But VEPA much simpler,

cheaper

1919ESC Silicon Valley – April, 2010

19

Page 20: Multicore I/O Processors In Virtual Data Centers

5th Annual Performance of SR-IOV NIC, Linux Bridge and a vSwitch

vSwitches require more packet processing & hence drop packets much earlier.

2020ESC Silicon Valley – April, 2010

20

much earlier.

Page 21: Multicore I/O Processors In Virtual Data Centers

5th Annual Performance of SR-IOV NIC, an old style Bridge and a vSwitch

vSwitches Provide more Flexibility and Functionality, but…Drop Packets Earlier; Consumes more CPU Cycles

2121ESC Silicon Valley – April, 2010

21

Drop Packets Earlier; Consumes more CPU Cycles

Page 22: Multicore I/O Processors In Virtual Data Centers

5th Annual Performance & CPU Load of SR-IOV NIC, Linux Bridge and a vSwitch

Combining Flexibility of vSwitches with Performance of SR-IOV NICsRequires an Intelligent I/O Coprocessor

2222ESC Silicon Valley – April, 2010

22

Page 23: Multicore I/O Processors In Virtual Data Centers

5th Annual

Requirements for I/O CoprocessorCoprocessor

Intelligent, Stateful, Flow-based switchingIntelligent, Stateful, Flow based switchingIntegrated IOVLoad balancingLoad balancingIntegrated securityGlue less interface to CPU subsystemGlue-less interface to CPU subsystem

Netronome “Netrok Flow Processor” is an Intelligent I/O Coprocessor

23ESC Silicon Valley – April, 2010 23

23

Page 24: Multicore I/O Processors In Virtual Data Centers

5th Annual Netronome Silicon & PCIe Cards NFP-3240 based PCIe Cards NFP-3240 based PCIe Cards 20Gbps of line rate packet and flow processing per NFE

6x1GigE, 2x10GigE (SPF+), netmod interfacesg g ( )

PCIe Gen2 (8 lanes)

Virtualized Linux drivers via SR-IOV

Flexible/configurable memory options

Packet time stamping with nanosecond granularity

Integrated cryptography

Packet capture and Inline applications

Hardware-based stateful flow management

TCAM based traffic filtering

D i fl b d l d b l i t 86 CPU Dynamic flow-based load balancing to x86 CPUs

Highly programmable, intelligent, virtualized acceleration cards for network security appliances and virtualized servers

24

© 2009 Netronome Systems Confidential 24

24

Page 25: Multicore I/O Processors In Virtual Data Centers

5th Annual

Summary and ConclusionConclusion

Inter-VM switching and intelligent I/O device sharing are integral part of data center virtualization There are many implementations alternatives There are many implementations alternatives

Heterogeneous architecture addresses this challenge I/O Coprocessor Complements multicore x86 with packet processingI/O Coprocessor Complements multicore x86 with packet processing

performance; handling millions of stateful flows; Lowering power consumption

Netronome’s NFP-32xx processor family integrates inter-VM switching and I/O virtualization capabilities Netronome’s PCIe card family integrates the intelligent, programmable, y g g , p g ,

flow-based, Network Card functionality with IOV, for the data center.

Heterogeneous architecture (Network Flow Processing + Multicore x86) addresses the need for inter-VM switching and intelligent I/O sharing.

2525

g g g

ESC Silicon Valley – April, 201025

Page 26: Multicore I/O Processors In Virtual Data Centers

5th Annual

BackupBackup

26

Page 27: Multicore I/O Processors In Virtual Data Centers

5th Annual Session Info & AbstractAbstract

https://www.cmpevents.com/ESCw10/a.asp?option=C&V=11&SessID=10701 Application of Multicore I/O Processors in Virtualized Data Centers Application of Multicore I/O Processors in Virtualized Data Centers

Speaker: Nabil Damouny (Senior Director, Marketing, Netronome Systems), Rolf Neugebauer (Staff Software Engineer, Netronome Systems)Date/Time: (April 27, 2010) 8:30am — 9:15amFormats: Audience level: Intermediate

Presentation AbstractThis presentation will discuss the applications of integrated multicore processors, optimized for networking I/O applications, in virtualized data centers. Data centers are increasingly being built with multicore virtualized servers. As the number of

i th i th b f VM t f tcores in the server increases, the number of VMs goes up at an even faster pace. These servers need to have access to high-performance network I/O, resulting in the requirement to implement I/O sharing in a virtualized, intelligent way. In addition, a mechanism for high-performance inter-VM switching will also be needed. Flow-based solutions, such as flow classification, routing and load balancing supporting in excess of 8M flows are effective ways to address thebalancing, supporting in excess of 8M flows, are effective ways to address the above challenges. Track: Multicore Expo – Networking & Telecom

27

Page 28: Multicore I/O Processors In Virtual Data Centers

5th Annual NFP-32xx Integrates Flow-Based L2 Functions

For Inter-VM SwitchingFor Inter-VM Switching• Flow Classification• Switching between physical networking ports• Switching between virtual NICs, without host intervention

Switching between any physical port and any virtual port• Switching between any physical port and any virtual port• Stateful flow-based switching

VM1

C1

VM2

C2

VM3

C3

VMn

Cn

CPU (Host)

TxRxEthernet

NFE

( )VNICVNIC VNIC VNIC

NFP-32xx SupportsTxRx

Interconnection Link

EthernetSwitch

NFP-32xx Supports > 8 Million Flows

28

Interconnection Link

28ESC Silicon Valley – April, 2010

Page 29: Multicore I/O Processors In Virtual Data Centers

5th Annual I/O Virtualization (IOV) Requirements

Support multiple virtual functions (VFs) over PCIe Lower cost, lower power

Dynamically assign VFs to different VMs Support multiple NIC functions: Crypto, PCAP, etc…pp p yp , ,

Capability to pin I/O device to specific CPU core/VM Enable consolidation and isolation Enable consolidation and isolation

Flow-based load balancing to x86 multicore CPUsHi h f t l Higher-performance at lower power

Intelligent I/O virtualization is required in multicore CPU designsPCI-SIG introduced SR-IOV standards for this purpose

2929ESC Silicon Valley – April, 2010

Page 30: Multicore I/O Processors In Virtual Data Centers

5th Annual

The Need for Intelligent I/O VirtualizationI/O Virtualization• Use commodity multicore hardware• Virtualization for:

C lid ti• Consolidation• Move “legacy” applications & OSs

to multicore• Isolationso at o

• I/O devices need to be shared

• Load balance/direct traffic to VMs• Pin VMs to cores• Direct traffic to cores/VMs

I l t d i f VM• Isolate device access from VMs

A good IOV solutions provides all of the above!

30

A good IOV solutions provides all of the above!

30ESC Silicon Valley – April, 2010

Page 31: Multicore I/O Processors In Virtual Data Centers

5th Annual

NFP Security Capabilities Internal instruction unit Internal instruction unit DMA, bulk crypt/hash, PKI control, sequenced through cryptography

instructions with multithreaded controller

H d l t d b lk t h (20 Gb ) Hardware accelerated bulk cryptography (20+Gbps) AES-128,192, 256 bit keys ECB, CBC, GCM, CTR, OFB, CFB, CM, f8 support

3DES, DES with3DES, DES with ECB, CBC support

ARC-4 SHA-1, SHA-1 HMAC

S S C f SHA-2, SHA-2 HMAC family 224/256/384/512-bit support

PKI modular exponentiation

PKI Modular ExponentiationEncrypt/Authenticate

p 20k+ ops Up to 2048 bit Supports CRT

Integrated high performance modern crypto algorithms, with a PKI engine, in a multi-threaded programmable environment

31

© 2010 Netronome, Inc. - Confidential. 31