20
Paving the Road to Exascale June 2017 Interconnect Your Future

Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

Paving the Road to Exascale

June 2017

Interconnect Your Future

Page 2: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 2

InfiniBand Delivers Best Return on Investment

30-100% Higher Return on Investment

Up to 50% Saving on Capital and Operation Expenses

Highest Applications Performance, Scalability and Productivity

1.3X Better 2X Better 1.4X Better 1.7X Better 1.3X Better

Molecular Dynamics Chemistry

Automotive Genomics Weather

Page 3: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 3

Exponential Data Growth – The Need for Intelligent and Faster Interconnect

CPU-Centric (Onload) Data-Centric (Offload)

Must Wait for the Data

Creates Performance Bottlenecks Analyze Data as it Moves!

Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale

Page 4: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 4

Data Centric Architecture to Overcome Latency Bottlenecks

HPC / Machine Learning

Communications Latencies of 30-40us

HPC / Machine Learning

Communications Latencies of 3-4us

CPU-Centric (Onload) Data-Centric (Offload)

Network In-Network Computing

Intelligent Interconnect Paves the Road to Exascale Performance

Page 5: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 5

In-Network Computing to Enable Data-Centric Data Center

GPU

GPU

CPU CPU

CPU

CPU

CPU

GPU

GPU

In-Network Computing Key for Highest Return on Investment

Page 6: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 6

In-Network Computing to Enable Data-Centric Data Centers

GPU

GPU

CPU CPU

CPU

CPU

CPU

GPU

GPU

In-Network Computing Key for Highest Return on Investment

CORE-Direct

RDMA

Tag-Matching

SHARP

SHIELD

Programmable

(FPGA)

Programmable (ARM)

Security

GPUDirect

Security NVMe Storage

NVMe over Fabrics

Page 7: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 7

In-Network Computing Advantages with SHARP Technology

Critical for High Performance Computing and Machine Learning Applications

Page 8: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 8

SHIELD – Self Healing Interconnect Technology

Page 9: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 9

Consider a Flow From A to B

Data

Server A Server B

Page 10: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 10

The Simple Case: Local Fix

Server A Server B

Data

Page 11: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 11

The Remote Case: Using FRN’s (Fault Recovery Notifications)

Server A Server B

Data

FRN

Data

Page 12: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 12

MPI Tag-Matching Offload Advantages

Lo

wer

is b

ett

er 31%

Lo

wer

is b

ett

er 97%

Mellanox In-Network Computing Technology Deliver Highest Performance

31% lower latency and 97% lower CPU utilization for MPI operations

Performance comparisons based on ConnectX-5

Page 13: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 13

ConnectX-5 / Switch-IB2 Adaptive Routing and Out-of-Order

95% Network Utilization with Adaptive Routing

Page 14: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 14

GPU-GPU Internode MPI Latency

Low

er is

Bette

r

Performance of MPI with GPUDirect RDMA

88% Lower Latency

GPU-GPU Internode MPI Bandwidth

Hig

her

is B

ett

er

10X Increase in Throughput

Source: Prof. DK Panda

9.3X

2.18 usec

10x

Page 15: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 15

2X Acceleration for Baidu

Machine Learning Software from Baidu

• Usage: word prediction, translation, image processing

RDMA (GPUDirect) speeds training

• Lowers latency, increases throughput

• More cores for training

• Even better results with optimized RDMA

~2X Acceleration for Paddle Training with RDMA

Page 16: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 16

The Generation of In-Network Computing – 10X Higher Performance

GPU

GPU

CPU CPU

CPU

CPU

CPU

GPU

GPU

In-Network Computing Key for Highest Return on Investment

SHARP

SHIELD CORE-Direct

RDMA

Tag-Matching

Programmable

(FPGA)

Programmable (ARM)

Security

GPUDirect

Security NVMe Storage

NVMe over Fabrics

Page 17: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 17

Highest-Performance 200Gb/s Interconnect Solutions

Transceivers

Active Optical and Copper Cables

(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s) VCSELs, Silicon Photonics and Copper

40 HDR (200Gb/s) InfiniBand Ports

80 HDR100 InfiniBand Ports

Throughput of 16Tb/s, <90ns Latency

200Gb/s Adapter, 0.6us latency

200 million messages per second

(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s)

MPI, SHMEM/PGAS, UPC

For Commercial and Open Source Applications

Leverages Hardware Accelerations

Page 18: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 18

Applications Performance Comparison - Examples

Page 19: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

© 2017 Mellanox Technologies 19

InfiniBand Delivers Best Return on Investment

30-100% Higher Return on Investment

Up to 50% Saving on Capital and Operation Expenses

Highest Applications Performance, Scalability and Productivity

1.3X Better 2X Better 1.4X Better 1.7X Better 1.3X Better

Molecular Dynamics Chemistry

Automotive Genomics Weather

Page 20: Interconnect Your Future...Jun 28, 2017  · Mellanox In-Network Computing Technology Deliver Highest Performance 31% lower latency and 97% lower CPU utilization for MPI operations

Thank You