Upload
doxuyen
View
227
Download
1
Embed Size (px)
Citation preview
Eli Karpilovski – Cloud Advisory Council Chairman
High Performance OpenStack Cloud
2012 CLOUD ADVISORY COUNCIL 2
Development of next generation cloud architecture
• Providing open specification for cloud infrastructures –
› Use existing infrastructure to extend business
› Publications of best practices for optimizing cloud efficiency and utilization
• Enable ease-of-use with comprehensive cloud management and tools
Publish cloud best practices
• Provide IT and application managers with cloud tools
• Design, architect, use, development
• Strengthen the qualification and integration of cloud solutions
• Drive standards across industry
Cloud Providers feedback regarding their direction and priorities around cloud standards development
Cloud Advisory Council – Our Mission
2012 CLOUD ADVISORY COUNCIL 3
Eli Karpilovski - Cloud Advisory Chairman
Paul Rad - High performance Cloud Group Chair
Kenny Li - Cloud Advisory Council Group Chair for Cloud Performance
David Fishman - Cloud Advisory Council Co-Chairman for Open Source
Eco-system
Brian Sparks - Cloud Advisory Council Media Relations Director
Cloud Advisory Council - Board Members
2012 CLOUD ADVISORY COUNCIL 4
Exponential Data Growth – Best Interconnect Required
0.8 Zettabyte
2009
35 Zettabyte
2020
44X
Source: IDC
2012 CLOUD ADVISORY COUNCIL 5
The Power of Data
Data-Intensive Simulations Internet of Things National Security
Healthcare Smart Cars
Congestion-Free Traffic Business Intelligence
2012 CLOUD ADVISORY COUNCIL 6
The Freedom to Choose Your Cloud Stack
Open Daylight Open Compute Project
Software
of Choice
Management
of Choice
Open Source
Hardware
Open Platform Closed & Proprietary
2012 CLOUD ADVISORY COUNCIL 7
OpenStack: Open Source Cloud APIs
2012 CLOUD ADVISORY COUNCIL 8
OpenStack IaaS Interfaces
SERVICE INTERFACE PROJECT EXAMPLE USE CASE
Compute Nova Configure VMs
Block Storage Cinder Set and assign persistent block level storage
Network Neutron Manage networks and IP addresses
Object Store Swift Horizontal scaling of object/file storage
Images Glance Disk and server images discovery, registration, etc.
Dashboard Horizon GUI for admins to control cloud operations
Authentication Keystone Common auth across components of cloud
2012 CLOUD ADVISORY COUNCIL 9
OpenStack IaaS Logical Architecture
9
2012 CLOUD ADVISORY COUNCIL 10
OpenStack Open Source Components
• OpenStack
• Linux
• KVM etc.
• Python
• Ruby
• Puppet
• Cobbler
• mCollective/Salt
• RabbitMQ
• ....
Trust me
I’m also a networking
Guru
2012 CLOUD ADVISORY COUNCIL 11
What does OpenStack run on?
“Standard Hardware”
2012 CLOUD ADVISORY COUNCIL 12
Data Must Always Be Accessible and at Real-Time
Compute Storage Archive Sensor Data
Smart Interconnect Required to Unleash The Power of Data
CPU CPU
Lower Latency, Higher Bandwidth, RDMA, Offloads, NIC/Switch Routing, Overlay Networks
2012 CLOUD ADVISORY COUNCIL 13
Remote Direct Memory Access (RDMA) Advantages
ZERO Copy Remote Data Transfer
Low Latency, High Performance Data Transfers
InfiniBand - 56Gb/s RoCE* – 40Gb/s
Kernel Bypass Protocol Offload
* RDMA over Converged Ethernet
Application Application USER
KERNEL
HARDWA
RE
Buffer Buffer
2012 CLOUD ADVISORY COUNCIL 14
RDMA – How it Works
RDMA over InfiniBand or
Ethernet
KE
RN
EL
HA
RD
WA
RE
U
SE
R
RACK 1
OS
NIC Buffer 1
Application
1 Application
2
OS
Buffer 1
NIC Buffer 1
TCP/IP
RACK 2
HCA HCA
Buffer 1 Buffer 1
Buffer 1
Buffer 1
Buffer 1
2012 CLOUD ADVISORY COUNCIL 15
3 Ways to Introduce RDMA in a Virtualized Environment
VM 1 VM 1 VM 1 VM 1
vSwitch
Eth SCSI
eIPoIB
SCSi
Midware Virtual
Function
SR-IOV
(IB/ Eth)
Physical
Function
iSER
(IB/Eth)
OS
VF driver
Paravirtualization
VM
Interfaces
Hypervisor
Adapter
Driver
Adapter
HW
1 2
3
2012 CLOUD ADVISORY COUNCIL 16
What is iSER (iSCSI Extension for RDMA)
iSCSI over RDMA solution
• InifiniBand or RoCE
Comprehensive storage networking and management capabilities derived from
iSCSI
• Discovery, naming, security, error-recovery, booting, etc.
Leverage on the wide following of iSCSI
• OS code and storage products
• Management tools and standard interfaces
• Standardization, Testing and protocol maturity
2012 CLOUD ADVISORY COUNCIL 17
iSCSI Mapping to iSER / RDMA Transport
iSER eliminates the traditional iSCSI/TCP bottlenecks :
• Zero copy using RDMA
• CRC calculated by hardware
• Work with message boundaries instead of streams
• Transport protocol implemented in hardware (minimal CPU cycles per IO)
BHS AHS HD Data DD
Protocol frames
(RDMA)
iSCSI PDU
RC Send RC RDMA Read/Write
X In HW
X In HW
2012 CLOUD ADVISORY COUNCIL 18
RDMA Accelerate OpenStack Storage
RDMA Accelerate iSCSI Storage
Hypervisor (KVM)
OS
VM OS
VM OS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Utilizing OpenStack Built-in components/Management - Open-iSCSI, tgt target,
Cinder To accelerate Storage Access
1.3
5.5
0
1
2
3
4
5
6
iSCSI over TCP iSER
Gb
ps Cinder / Volume Storage Performance *
* iSER patches are available on OpenStack
branch: https://github.com/mellanox/openstack
2012 CLOUD ADVISORY COUNCIL 19
Wire
Push
IPoIB
Heade
r
Hypervisor
Driver
(eIPoIB)
Guest OS
VM
IPoIB - Applications see IP/Ethernet over KVM
Hypervisor (KVM)
Guest OS
VM
Guest OS
VM
Guest OS
VM
ConnectX-3
VPI
Compute Servers IP
Header
IP Data
TCP/UDP
IP
Header Ethernet
Header
IP Data
TCP/UDP
IP
Header
IPoIB
Header
IP Data
TCP/UDP
IP
Header
IP Data
TCP/UDP
IPoIB
Header
InfiniBand
Header
IP Packet
Ethernet Frame
IP Packet
IPoIB Packet
IPoIB
IB layer
Ethernet
TCP/IP
eIPoIB mapper
CR
C
IP Data
TCP/UDP
SwitchX VPI Switch & Gateway
VPI: Standard Ethernet & InfiniBand
Hypervisor
Driver
(IPoIB)
Mellanox Driver
IP
Header
2012 CLOUD ADVISORY COUNCIL 20
Single Root I/O Virtualization (SR-IOV)
PCIe device presents multiple instances to the OS/Hypervisor
Enables Application Direct Access (ADA)
• Reduces CPU overhead and improves application performance
Eliminates virtualization penalty with RDMA & ADA
• Low latency applications benefit from the Virtual infrastructure
VF Device Driver
VM VF Device Driver
VM VF Device Driver
VMn
Virtual NIC
VM
Physical Function Device Driver
PF VF VF VF
2012 CLOUD ADVISORY COUNCIL 21
SR-IOV Accelerates RoCE
• Enables native RoCE performance in virtualized environments
SR-IOV Boosts Ethernet Performance
10
20
30
40
1 VM 2 VM 4 VM 8 VM 16 VM
Th
rou
gh
pu
t (G
b/s
)
RoCE – SR-IOV Throughput
Throughput (Gb/S)
0
0.5
1
1.5
2
2.5
3
1 VM 2 VM 4 VM 8 VM
La
ten
cy (
us
)
RoCE - SR-IOV Latency
Message Size 2B Message Size 16B Message Size 32B
No Performance Compromise in Virtualized Environment
2012 CLOUD ADVISORY COUNCIL 22
Single Root IO Virtualization - Latency Performance Comparison
0
10
20
30
40
50
60
70
80
90
16 64 256 1024 4096 16384
Late
ncy [
us]
Message Size [Bytes]
VM to VM (same machine) - TCP PVVM to VM (2 machines) - TCP PVVM to VM (same machine) - RDMAVM to VM (2 machines) - RDMA
20x lower latency
than a vNIC
SR-IOV Virtualization with bare metal latency
2012 CLOUD ADVISORY COUNCIL 23
Network Virtualization – Evolution in Deployment Models
Various tunneling
protocols run directly on
hypervisor (VXLAN,
NVGRE, etc.)
PURE OVERLAY
OpenFlow in physical
switches everywhere
PURE OPENFLOW
HYBRID NETWORK VIRTUALIZATION
Combination of OpenFlow on Switch and
Overlay on Hypervisor
2012 CLOUD ADVISORY COUNCIL 24
Hybrid Network Virtualization Approach: Big Challenges
Perf
orm
ance
Number of Workloads
Impact of Software Based Tunneling
Expected Performance
2012 CLOUD ADVISORY COUNCIL 25
Distributed, Accelerated Hybrid Network Virtualization
Neutron
SDN
Applications SDN
Applications
Cloud
Management
OpenStack Manager
SDN Controller
OS VM
Para-
virtual
OS VM
OS VM
OS VM
SR-
IOV to
the VM
10/40GbE or
InfiniBand Ports
Embedded
Switch
OpenFlow
Agent
Neutron
Agent
Create/delete,
configure policy
per VM vNIC
Servers
tap tap
OpenFlow 1.0 support for NIC and Switch
Real-time provisioning via OpenFlow
OpenFlow Counters / Statistics, drop/allow/mirror Ingress ACLs
ML2
2012 CLOUD ADVISORY COUNCIL 26
Hardware QoS Delivers Significant Performance Improvement
eSwitch & SR-IOV Integrated Adapter Technology
Performance Impact
2012 CLOUD ADVISORY COUNCIL 27
Server
VM1 VM2 VM3 VM4
Acceleration of Overlay Networks
Overlay Network Virtualization: Isolation, Simplicity, Scalability
Virtual Domain 3
Virtual Domain 2
Virtual Domain 1
Physical
View
Server
VM5 VM6 VM7 VM8
Mellanox SDN Switches & Routers
Virtual
View
NVGRE/VXLAN Overlay Networks Virtual Overlay Networks Simplifies
Management and VM Migration
ConnectX-3 Pro
Overlay Accelerators Enable
Bare Metal Performance OpenFlow
Virtual Network
Management API
2012 CLOUD ADVISORY COUNCIL 28
Cloud Overlay Acceleration Results With NIC Hardware Offload
Higher
Is Better
65%
Improvement
Higher Throughput for Less CPU Transport Overhead
Lower
Is Better
79%
Improvement
NVGRE Initial Results
2012 CLOUD ADVISORY COUNCIL 29
Storage
• 6x performance improvement switching from iSCSI TCP to iSER (RDMA)
Compute
• RDMA Support for applications Ethernet or InfiniBand
• Full support for all native Ethernet security and isolation features
• SRIOV to provide a faster path to bare metal performance
› 20x performance improvement VM to VM connectivity
• Support of bridging from InfiniBand to Ethernet
Controller
• Neutron plugins to support seamless integration with Folsom, Grizzly and Havana
HPC OpenStack Cloud Infrastructure Benefits