30
Piero Altoè, Carlo Nardone | 30 November 2016 NVIDIA DGX-1 WEBINAR

NVIDIA DGX-1 WEBINAR - E4 Computer Engineering S.p.a ... Intelligence... · Quiz/exam questions/solutions Text and e-books ... 8 TB RAID 0 | Quad IB 100Gbps, ... TESLA P100 ACCELERATOR

Embed Size (px)

Citation preview

Piero Altoè, Carlo Nardone | 30 November 2016

NVIDIA DGX-1 WEBINAR

2

GPU Computing

NVIDIAComputing for the Most Demanding Users

Computing Human Imagination

Computing Human Intelligence

3

DEEP LEARNING —A NEW COMPUTING MODEL

“Software that writes software”

“little girl is eating

piece of cake"

LEARNING

ALGORITHM

“millions of trillions

of FLOPS”

4

72%

74%

84%

88%

93%

96%

2010 2011 2012 2013 2014 2015

“SUPERHUMAN” RESULTSSPARK HYPERSCALE ADOPTION

Deep Learning

ImageNet — Accuracy %

Cloud Services with AI Powered by NVIDIA

Alibaba/Aliyun Amazon Baidu eBay Facebook

Flickr Google iFLYTEK iQIYI JD.com

Orange Periscope Pinterest Qihoo 360 Shazam

Skype Sogou Twitter Yahoo Supermarket Yandex YelpHand-coded CV

Human

74%76%

5

NVIDIA’S GPU EDUCATORS PROGRAM

The Flagship Offering: GPU Teaching Kits - Breaking the barriers of GPU education in academia:

Lecture slidesLecture videosHands-on labs/solutionsLarger coding projects/solutionsQuiz/exam questions/solutionsText and e-books

Different kits for different coursesAccelerated/Parallel Computing (available now!)Robotics (available now!)Machine/Deep Learning (coming soon!) Computer Vision, Computer Architecture, Computational Domain Sciences, Mathematics, etc. (future)

Get started today! developer.nvidia.com/educators

Advancing STEM Education with Accelerated Computing

6

TESLA ACCELERATED COMPUTING PLATFORMFocused on Co-Design for Accelerated Data Center

ProductiveProgrammingModel & Tools

Expert Co-Design

Accessibility

APPLICATION

MIDDLEWARE

SYS SW

LARGE SYSTEMS

PROCESSOR

Fast GPUEngineered for High Throughput

0,0

0,5

1,0

1,5

2,0

2,5

3,0

3,5

4,0

4,5

5,0

5,5

2008 2010 2012 2014 2016

NVIDIA GPU x86 CPUTFLOPS

M2090

M1060

K20

K80

Fast GPU+

Strong CPU

P100

7

NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning

COMPUTER VISION SPEECH AND AUDIO BEHAVIOR

Object Detection Voice Recognition TranslationRecommendation

EnginesSentiment Analysis

DEEP LEARNING

cuDNN

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

NCCL

cuFFT

Mocha.jl

Image Classification

DEEP LEARNING

SDK

FRAMEWORKS

APPLICATIONS

8

“Horus can process and

identify obstacles 48 times faster than would be possible with

CPUs.”

-Saverio Murgia, Horus CEO and co-founder

9

NVIDIA DGX-1AI Supercomputer-in-a-Box

170 TFLOPS | 8x Tesla P100 16GB | NVLink Hybrid Cube Mesh

2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U — 3200W

10NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

INTRODUCING TESLA P100New GPU Architecture to Enable the World’s Fastest Compute Node

Pascal Architecture NVLink CoWoS HBM2 Page Migration Engine

Highest Compute Performance GPU Interconnect for Maximum Scalability

Unifying Compute & Memory in Single Package

Simple Parallel Programming with Virtually Unlimited Memory Space

Unified Memory

CPU

Tesla P100

11

TESLA P100 ACCELERATOR

Compute 5.3 TF DP ∙ 10.6 TF SP ∙ 21.2 TF HP

Memory HBM2: 720 GB/s ∙ 16 GB

Interconnect NVLink (up to 8 way) + PCIe Gen3

ProgrammabilityPage Migration Engine

Unified Memory

Availability DGX-1: Order Now

12

8 GPU CUBE MESH

PCIe

Switch

CPU

PCIe

Switch

CPU

0

1

3

2

4

5

7

6

0

32

1 5

67

4

13

NVLINK ENABLES LINEAR MULTI-GPU SCALING

1,0x

2,0x

3,0x

4,0x

5,0x

6,0x

7,0x

8,0x

1GPU 2GPU 4GPU 8GPU

AlexnetOWT

DGX-1

P100 PCIE

Deepmark test with NVCaffe. AlexnetOWT use batch 128, Incep-v3/ResNet-50 use batch 32, weak scaling,

P100 and DGX-1 are measured, FP32 training, software optimization in progress, CUDA8/cuDNN5.1, Ubuntu 14.04

1,0x

2,0x

3,0x

4,0x

5,0x

6,0x

7,0x

8,0x

1GPU 2GPU 4GPU 8GPU

Incep-v3

DGX-1

P100 PCIE

1,0x

2,0x

3,0x

4,0x

5,0x

6,0x

7,0x

8,0x

1GPU 2GPU 4GPU 8GPU

ResNet-50

DGX-1

P100 PCIE

Speedup

2.3x

1.3x

1.5x

14

Instant productivity — plug-and-play, supporting every AI framework

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments —virtualized and containerized

Direct access to NVIDIA experts

DGX-1 STACKFully integrated Deep Learning platform

15

NVIDIA DGX-1 SOFTWAREOptimized for Deep Learning Performance

Accelerated Deep Learning

cuDNN NCCL

cuSPARSE cuBLAS cuFFT

Container Based Applications

NVIDIA Cloud Management

Digits DL Frameworks GPU Apps

Research & Develop Deploy & ManagePackage & Test

16

DGX-1 IN THE WORKFLOW A complete GPU-accelerated deep learning workflow

MANAGE TRAIN DEPLOY

DIGITS

DATA CENTER AUTOMOTIVE

TRAINTEST

MANAGE / AUGMENTEMBEDDED

TENSOR RT (GIE)

MODEL ZOO

17

DGX-1 — 6 STEPS TO DEEP LEARNING

LoginMonitoring

PortalLaunch

ContainerInteracting with Jobs

Create Training

Training Run

LOGIN

18

DGX-1 — 6 STEPS TO DEEP LEARNING

LoginMonitoring

PortalLaunch

ContainerInteracting with Jobs

Create Training

Training Run

MONITORING PORTAL

19

DGX-1 — 6 STEPS TO DEEP LEARNING

LoginMonitoring

PortalLaunch

ContainerInteracting with Jobs

Create Training

Training Run

LAUNCH CONTAINER

20

DGX-1 — 6 STEPS TO DEEP LEARNING

LoginMonitoring

PortalLaunch

ContainerInteracting with Jobs

Create Training

Training Run

INTERACTING WITH JOBS

21

DGX-1 — 6 STEPS TO DEEP LEARNING

LoginMonitoring

PortalLaunch

ContainerInteracting with Jobs

Create Training

Training Run

CREATE TRAINING

22

DGX-1 — 6 STEPS TO DEEP LEARNING

LoginMonitoring

PortalLaunch

ContainerInteracting with Jobs

Create Training

Training Run

TRAINING RUN

23

NVIDIA EXPERTISE AT EVERY STEP

Solution ArchitectsGlobal Network

of PartnersDeep Learning

InstituteGTC

Conferences

1:1 support

Network training setup

Network optimization

Certified expert instructors

Worldwide workshops

Online courses

Epicenter of industry leaders

Onsite training

Global reach

NVIDIA Partner Network

OEMs

Startups

Need image

24

DEEP LEARNING EVERYWHERE

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

NVIDIA DEEP LEARNING PLATFORM

GTX - DEVELOPMENT

DEEP LEARNING SDK

TESLA - DEPLOYMENT DGX1 - ENTERPRISE

Purpose: Deep Learnig Test, Development, Benchmarks, Small Neural Networks

Purpose: Deep Learnig applicationfor medium data set analysis and Medium Neural Networks

Purpose: Appliance NVIDIA DGX1 with Artificial Intelligence software. Large Neural Networks

NVIDIA SOFTWARE

Workstation High-End ServerMid-Range Server

27

Thirdy Party Development (VISION / SPEECH / BEHAVIOUR / FINANCE / IoT)

Object Detection

NVIDIA SOFTWARE

Mocha.jl

Image Classification Language TranslationRecommendation

Engines Sentiment AnalysisVoice Recognition

Workstation BOXX High-End Server

DEEP LEARNING FRAMEWORKS

Workstation High-End ServerMid-Range Server

• Purpose: Deep LearnigTest, Development, Benchmarks, Small NeuralNetworks

• Price 1 GPU: < 10K€

Purpose: Deep Learnigapplication for medium data set analysis and Medium Neural Networks• Price 2 GPU: < 40K€• Price 4 GPU: < 60K€

• Purpose: Appliance NVIDIA DGX1 with ArtificialIntelligence software, Large Neural Networks

• Price 8 GPU: < 150K€

NVIDIA SOFTWARE

Fino al 31/01/17

Fino al 31/01/17 è disponibile una promo EDU sui seguenti prodotti NVIDIA:

• K80• P100 (12GB)• P100 (16GB)• DGX1

I prodotti saranno disponibili anche su MEPA. Per informazioni scrivere a [email protected]