25
1 © 2018 Mellanox Technologies | Confidential Sept 2018 – Darren J. Harkins High Performance Interconnect Mapping Applications to the Cluster :- Understanding the choices for Topology with Scaling and Processor

High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

1© 2018 Mellanox Technologies | Confidential

Sept 2018 – Darren J. Harkins

High Performance InterconnectMapping Applications to the Cluster :- Understanding the choices for Topology with Scaling and Processor

Page 2: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

7© 2018 Mellanox Technologies | Confidential

Linking boxes

Adaptor

Adaptor

Adaptor

Adaptor

Adaptor

Adaptor

Page 3: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

8© 2018 Mellanox Technologies | Confidential

Linking more boxes

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Page 4: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

9© 2018 Mellanox Technologies | Confidential

Linking more boxes – Non-blocking or 1:1

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Page 5: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

10© 2018 Mellanox Technologies | Confidential

Linking more boxes – Non-blocking 36 ports

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Page 6: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

11© 2018 Mellanox Technologies | Confidential

Linking more boxes – Non-blocking or 2:1

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Page 7: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

12© 2018 Mellanox Technologies | Confidential

Linking more boxes – 2:1 – Island of 24

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Page 8: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

13© 2018 Mellanox Technologies | Confidential

Linking more boxes – Non-blocking 36 ports

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Switch

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Ada

pto

r

Page 9: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

14© 2018 Mellanox Technologies | Confidential

Variety of Topologies

Torus DragonflyFat TreeHypercube

Page 10: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

15© 2018 Mellanox Technologies | Confidential

Linking lots of boxes – 3D-Torus

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Page 11: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

16© 2018 Mellanox Technologies | Confidential

Linking lots of boxes – Hyper Cube

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Switch Switch

Page 12: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

17© 2018 Mellanox Technologies | Confidential

Linking lots of boxes – Fat TreeSwitch Switch

Switch Switch Switch Switch

Switch Switch Switch Switch Switch Switch Switch Switch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Page 13: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

18© 2018 Mellanox Technologies | Confidential

Linking lots of boxes – DragonFly+

Switch Switch Switch Switch SwitchSwitch

Switch

Switch

Switch

Switch

Switch

Switch

Switch Switch Switch Switch SwitchSwitch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch

Switch Switch Switch Switch SwitchSwitch

Switch Switch Switch Switch SwitchSwitch

Page 14: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

19© 2018 Mellanox Technologies | Confidential

Linking different boxes

X86AMD,

Other

ARM(Advanced RISC

Machine)

Lots of

People

FPGA(Fiel d Programm able

Gate A rray)

Xilinx, Altera

GPU

AMD, NVidia

TPU

Google

POWERIBM,

OpenPOWER

Adaptor Adaptor Adaptor Adaptor Adaptor Adaptor

Adaptor Adaptor Adaptor Adaptor Adaptor Adaptor

X86AMD,

Other

ARM(Advanced RISC

Machine)

Lots of

People

FPGA(Fiel d Programm able

Gate A rray)

Xilinx, Altera

GPU

AMD, NVidia

TPU

Google

POWERIBM,

OpenPOWER

Switch

Page 15: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

20© 2018 Mellanox Technologies | Confidential

Transporting data between boxes (TCP/IP)

Application

Transport

Internet

Link

ProcessorMemory

Adaptor

Switch

Application

Transport

Internet

Link

ProcessorMemory

Adaptor

Switch

Page 16: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

21© 2018 Mellanox Technologies | Confidential

Transporting data between boxes (RDMA)

RDMA over InfiniBand or

Ethernet

KE

RN

EL

HA

RD

WA

RE

US

ER

RACK 1

OS

NIC Buffer 1

Application

1Application

2

OS

Buffer 1

NICBuffer 1

TCP/IP

RACK 2

HCA HCA

Buffer 1Buffer 1

Buffer 1

Buffer 1

Buffer 1

Page 17: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

24© 2018 Mellanox Technologies | Confidential

Middleware – keeping programming simple

Adaptor

Switch

Processor

Memory

Adaptor

Switch

Processor

Memory

App

licatio

n

MP

I

RDMA

InfiniBand

App

licatio

n

MP

IRDMA

InfiniBand

Page 18: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

28© 2018 Mellanox Technologies | Confidential

Middleware – MPI for AI

Framework (Torch, TensorFlow, Caffe, CNTK, Paddle, .)

MPI

TCP/IP

Interconnect

RDMACuda

rCUDA

Page 19: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

30© 2018 Mellanox Technologies | Confidential

Scalability – too long on the Interconnect

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Node

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

Page 20: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

31© 2018 Mellanox Technologies | Confidential

Scalability – Simplifying one to many

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Node

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

Page 21: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

32© 2018 Mellanox Technologies | Confidential

Scalability – Compute in the interconnect

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Sw

itch

Node

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

NodeNode

Collective tree created to nodes

Nodes send data up the tree to leaf switches,

where collectives operation is run

Leaf switches send data up the tree to spine,

where collectives operation is run

Result is sent to egress leaf

Result arrives at Requester node where no

operation is needed

Page 22: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

34© 2018 Mellanox Technologies | Confidential

Storage : More than just xPUs on the Interconnect

100usec 200usec 6000usec

25

usec

1 us

20 usec

10

usec

Mechanical Disks

(~6msec)

Software Disk

With SSDs

(~0.5msec)

With Fast Network

(~0.2msec)

With RDMA

(~0.05msec)

Network

100usec 200usec

200usec25

usec

25

usec

180 IOPs

3000 IOPs

4300 IOPs

20,000 IOPs

Synchronous (back to back)

With Full OS Bypass &

NV-Dimm/Cache

(~0.007msec)

1 us

6

us

3

us

>100,000 IOPs

Synchronous

Page 23: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

41© 2018 Mellanox Technologies | Confidential

Questions?

Page 24: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

42© 2018 Mellanox Technologies | Confidential

Darren J. Harkins

Staff Systems Engineer

[email protected]

+44 7944 786208

Thank You

Page 25: High Performance Interconnectpyweb.swan.ac.uk/diracday/assets/talks/03_Technical/05-Darren_Har… · © 2018 Mellanox Technologies | Confidential 1 Sept 2018 –Darren J. Harkins

43© 2018 Mellanox Technologies | Confidential

Thank You