Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
September 2016
Interconnect Your Future
- Interconnect Technologies -© 2016 Mellanox Technologies 2
The Ever Growing Demand for Higher Performance
2000 202020102005
“Roadrunner”
1st
2015
Terascale Petascale Exascale
Single-Core to Many-CoreSMP to Clusters
Performance Development
Co-Design
HW SW
APP
Hardware
Software
Application
The Interconnect is the Enabling Technology
- Interconnect Technologies -© 2016 Mellanox Technologies 3
The Intelligent Interconnect to Enable Exascale Performance
CPU-Centric Co-Design
Work on The Data as it Moves
Enables Performance and Scale
Must Wait for the Data
Creates Performance Bottlenecks
Limited to Main CPU Usage
Results in Performance Limitation
Creating Synergies
Enables Higher Performance and Scale
- Interconnect Technologies -© 2016 Mellanox Technologies 4
ISC’16: Introducing ConnectX-5 World’s Smartest Adapter
- Interconnect Technologies -© 2016 Mellanox Technologies 5
SHArP Performance Advantage
MiniFE is a Finite Element mini-application
• Implements kernels that represent
implicit finite-element applications
10X to 25X Performance Improvement!
AllRedcue MPI Collective
- Interconnect Technologies -© 2016 Mellanox Technologies 6
Highest-Performance 100Gb/s Interconnect Solutions
Transceivers
Active Optical and Copper Cables
(10 / 25 / 40 / 50 / 56 / 100Gb/s) VCSELs, Silicon Photonics and Copper
36 EDR (100Gb/s) Ports, <90ns Latency
Throughput of 7.2Tb/s
7.02 Billion msg/sec (195M msg/sec/port)
100Gb/s Adapter, 0.6us latency
200 million messages per second
(10 / 25 / 40 / 50 / 56 / 100Gb/s)
32 100GbE Ports, 64 25/50GbE Ports
(10 / 25 / 40 / 50 / 100GbE)
Throughput of 6.4Tb/s
MPI, SHMEM/PGAS, UPC
For Commercial and Open Source Applications
Leverages Hardware Accelerations
- Interconnect Technologies -© 2016 Mellanox Technologies 7
The Performance Advantage of EDR 100G InfiniBand (28-80%)
28%
- Interconnect Technologies -© 2016 Mellanox Technologies 8
Mellanox Connects the World’s Fastest Supercomputer
93 Petaflop performance, 3X higher versus #2 on the TOP500
40K nodes, 10 million cores, 256 cores per CPU
Mellanox adapter and switch solutions
National Supercomputing Center in Wuxi, China
#1 on the TOP500 Supercomputing List
The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems
Mellanox connects 41.2% of overall TOP500 systems
Mellanox connects 70.4% of the TOP500 HPC platforms
Mellanox connects 46 Petascale systems, Nearly 50% of the total Petascale systems
InfiniBand is the Interconnect of Choice for
HPC Compute and Storage Infrastructures
- Interconnect Technologies -© 2016 Mellanox Technologies 9
Machine Learning
- Interconnect Technologies -© 2016 Mellanox Technologies 10
GPUDirect Enables Efficient Training Platform for Deep Neural Network
CPU
CPU
CPU GPU
GPU
GPU
3 Nodes with 3 GPUs for 3 days
Mellanox InfiniBand and GPU-Direct1K nodes (16K cores) for 1 week
- Interconnect Technologies -© 2016 Mellanox Technologies 11
Deep Learning - Transforming Data to Intelligence
Smart Interconnect Required to Unleash The Power of Data
GPUs
CPUs
Storage
More Data Better Models Faster Compute
- Interconnect Technologies -© 2016 Mellanox Technologies 12
Enable Real-time Decision Making
TeraSort Big Sur Machine Learning Platform
0
200
400
600
800
1000
1200
Intel 10Gb/s Mellanox 10Gb/s Mellanox 40Gb/s
Execu
tio
n T
ime (
in s
eco
nd
s)
Ethernet Network
70%
improvement
Fraud Detection workload
0
100
200
300
400
500
600
700
Existing Solution Aerospike withMellanox + Samsung
NVMe
Tota
l T
ransaction T
ime (
in m
s)
CPU + Storage + Network Fraud Detection Algorithm
1.8x more time for running
fraud detection algorithm
- Interconnect Technologies -© 2016 Mellanox Technologies 13
GPU Computing
- Interconnect Technologies -© 2016 Mellanox Technologies 14
GPUDirect™ RDMA Ecosustem
With GPUDirect™ RDMA
Using PeerDirect™
- Interconnect Technologies -© 2016 Mellanox Technologies 15
Mellanox PeerDirect™ with NVIDIA GPUDirect RDMA
102%
0
10
20
30
40
50
60
70
80
2 4
Ave
rag
e t
ime
pe
r it
era
tio
n (
us
)
Number of nodes/GPUs
2D stencil benchmark
RDMA only RDMA+PeerSync
27% faster 23% faster
- Interconnect Technologies -© 2016 Mellanox Technologies 16
Basic GPU computing
GPU – from Server to Service
rCUDA – remote CUDA
- Interconnect Technologies -© 2016 Mellanox Technologies 17
GPU Virtualization – Benefits
Flexible Resources Assignment Resources’ Consolidation and Management
- Interconnect Technologies -© 2016 Mellanox Technologies 18
Storage and Data Access
- Interconnect Technologies -© 2016 Mellanox Technologies 19
Storage Technology Evolution
0.1
10
1000
HD SSD NVM
Acce
ss T
ime
(mic
ro-S
ec)
Storage Media Technology
Access T
ime in M
icro
Seconds
- Interconnect Technologies -© 2016 Mellanox Technologies 20
NVMe Technology
- Interconnect Technologies -© 2016 Mellanox Technologies 21
NVME over Fabrics – RDMA-based networking storage
- Interconnect Technologies -© 2016 Mellanox Technologies 22
RDMA-based Remote NVME Access (NVME over Fabrics)
Target Server
Micron FlashMicron Flash
Micron FlashMicron Flash
Mellanox NICMellanox NIC
PC
Ie
Initiator Server
Mellanox NICMellanox NIC
100GbE
0
500
1000
1500
2000
2500
3000
3500
4000
Read Write
Operations/sec – local vs. remote
Local Remote
- Interconnect Technologies -© 2016 Mellanox Technologies 23
BlueField System-on-a-Chip (SoC) Solution
Integration of ConnectX5 + Multicore ARM
State of the art capabilities• 10 / 25 / 40 / 50 / 100G Ethernet & InfiniBand
• PCIe Gen3 /Gen4• Hardware acceleration offload
- RDMA, RoCE, NVMeF, RAID
Family of products• Range of ARM core counts and I/O ports/speeds
• Price/Performance points
NVMe Flash Storage Arrays
Scale-Out Storage (NVMe over Fabric)
Accelerating & Virtualizing VNFs
Open vSwitch (OVS), SDN
Overlay networking offloads
- Interconnect Technologies -© 2016 Mellanox Technologies 24
Rack view
Scale-Out NVMe Storage System with BlueField
SSD
PCIe
Gen3/4
Ethernet/InfiniBand:
10, 25,40,50,100G
(2 ports)
DDR4
Storage
Shelf SSD
DDR4
PCIe
Gen3/4
PCIe
Ethernet/InfiniBand:
10, 25,40,50,100G
(2 ports)
Network Fabric
8 – 16 Drives
BMC
SSD SSD
Compute/Storage Head Node
TOR Switch
Storage/NVMfShelf
Compute/Storage Head Node
Storage/NVMfShelf
Storage/NVMfShelf
Storage/NVMfShelf
Storage/NVMfShelf
Storage/NVMfShelf
Storage/NVMfShelf
Storage/NVMfShelf
- Interconnect Technologies -© 2016 Mellanox Technologies 25
Technology Leadership
2001 2015
RDMA
RoCE
IO & Network
Virtualization
GPU-Direct
Remote GPU
(rCUDA)
RDMA NAS
Optics
Software
Cables
Systems
Boards
Silicon
Cables
Systems
Boards
Silicon
Systems
Boards
Silicon
Boards
Silicon
Software
Cables
Systems
Boards
Silicon
Intelligent
Interconnect
Processing
Optics
Software
Cables
Systems
Boards
Silicon
- Interconnect Technologies -© 2016 Mellanox Technologies 26
Strategy: Network-based Computing
Users
Users
Intelligence
Network Offloads
Computing for applications
Smart Network
Increase Datacenter Value
Network functions
On CPU
- Interconnect Technologies -© 2016 Mellanox Technologies 27
Computing Evolution
Scale-out
Services
Thank You