THE EXPANDING UNIVERSE OF HPCs22.q4cdn.com/364334381/files/doc_presentations/...42 new top 500 systems first ai supercomputers first exascale science abci summit ... ornl | stony brook

1

2

THE EXPANDING UNIVERSE OF HPC

JENSEN HUANG | SC19

3

Forward-Looking Statements

Except for the historical information contained herein, certain matters in this presentation including, but not limited to, statements as to: our strategies, growth, position, opportunities, and continued expansion; the performance, benefits and impact of our products and technologies, including the NVIDIA EGX Edge Supercomputing Platform, NVIDIA HPC for ARM, NVIDIA MAGNUM IO, NVIDIA RAPIDS Data Science, and NVIDIA DGX-2; other predictions and estimates; the expanding universe of HPC; future eras of AI training; and NVIDIA’s collaboration with Microsoft on intelligent edge computing are forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. These forward-looking statements and any other forward-looking statements that go beyond historical facts that are made in this presentation are subject to risks and uncertainties that may cause actual results to differ materially. Important factors that could cause actual results to differ materially include: global economic conditions; our reliance on third parties to manufacture, assemble, package and test our products; the impact of technological development and competition; development of new products and technologies or enhancements to our existing product and technologies; market acceptance of our products or our partners’ products; design, manufacturing or software defects; changes in consumer preferences and demands; changes in industry standards and interfaces; unexpected loss of performance of our products or technologies when integrated into systems and other factors. For a complete discussion of factors that could materially affect our financial results and operations, please refer to the reports we file from time to time with the SEC, including our Form 10-K for the annual period ended January 27, 2019 and our Form 10-Q for the quarterly period ended October 27, 2019. Copies of reports we file with the SEC are posted on our website and are available from NVIDIA without charge. These forward-looking statements are not guarantees of future performance and speak only as of November 18, 2019, based on information currently available to us. Except as required by law, NVIDIA disclaims any obligation to update these forward-looking statements to reflect future events or circumstances.

SAFE HARBOR

4

AT THE INTERSECTION OF GRAPHICS, SIMULATION, AI

5

6

COMPUTING FOR THE DA VINCIS OF OUR TIME

FIRST AI SUPERCOMPUTERS FIRST EXASCALE SCIENCE42 NEW TOP 500 SYSTEMS

ABCI

SUMMIT

CLIMATE

LBNL | NVIDIA

GENOMICS

ORNL

NUCLEAR WASTE

REMEDIATION

LBNL | PNNL Brown U. NVIDIA

CANCER DETECTION

ORNL | Stony Brook U.

7

FULL STACK SPEED-UP

CUDA-X

CUDA

AI DRIVEMETRO ISAACCLARARAPIDS AERIALCG

CUDA 10.2

cuTENSOR 1.0

cuSOLVER 10.3

cuBLAS 10.2

cuDNN 7.6

TensorRT 6.0

DALI 0.15

NCCL 2.5

IndeX 2.1

OptiX 7.0

RAPIDS 0.10

Spark XGBoost

3x in 2 Years

2017

2019

2018

Time to Solution

27 Hours

20 Hours

10 Hours

Amber

Chroma

GROMACS

GTC

LAMMPS

MILC

NAMD

QE

SPECFEM3D

TensorFlow

VASP

Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50] , VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.

8


NETWORKEDGE ANALYTICS

SIMULATION

AI

Edge

Cloud

Arm

Data Analytics

Extreme IO

EXTREME IO

9

INCREDIBLE ADVANCES IN AI

WRITING

DIALOG

TRANSLATION

SUMMARIZATION

Q&A

CLASSIFICATION

2012 2019

BERTTRANSFORMER

ALEXNETCNN

3D POSE

DENOISING

SEGMENTATION

OBJECT RECOGNITION

CLASSIFICATION

IMAGE GENERATION

10

GPU COMPUTING POWERS AI ADVANCES

#1 MLPERF — AI TRAINING + AI INFERENCE HPC COMPUTING CHALLENGE

Doubling

2 Years

Doubling3.4 Months

Two Distinct Eras of AI TrainingSuper Moore’s Law — From 600 to 2 Hours in 5 Years

K80 SERVER

DGX 2 Hours

600 Hours

Time to Train (ResNet-50)

11

NVIDIA AI END-TO-END PLATFORM

TRAINING AUTONOMOUS MACHINES

DGX HGX EGX AGX

EDGE AICLOUD

12

AI FOR SCIENCE

EXPERIMENTATION

DATA

SIMULATION

DATA

NEURAL ESTIMATION

Real-time Steering Fast Approximation

Design Space Exploration

ICF + MERLIN — Fusion

Inverse Problems

LIGO — Gravitational Waves

Faster Prediction

ANI + MD – Chemistry

Real-time Steering

ITER – Fusion Energy

13

14

1xData Transfer

100xData Collected

STREAMING AI

SOFTWARE-DEFINED

SENSORS

BUILD MODELSSTREAMING AI

PROCESSING

ECMWF: 287 TB/dayLSST: 20 TB/day

SKA: 16 TB/sec

15

NVIDIA EGX STACK

NGC

Kubernetes Networking Storage Security

CUDA-X

Third-Party ISVs

METROPOLIS

IMAGEPROCESSING

DECODE DNN GRAPHICS ENCODE

DEEPSTREAM

Powered by NVIDIA CUDA Tensor Core GPU | Secured Boot Root of Trust

Cryptographic Acceleration for IPsec and TLS | NVMe-oF over TCP and RDMA

Industrial-strength Cloud Native and AI Stack

NVIDIA EGX EDGE SUPERCOMPUTING PLATFORM

16

VERTICAL INDUSTRY FRAMEWORKS

Clara Metropolis

Isaac Omniverse Aerial

DRIVE

WORLD’S LARGEST DELIVERY SERVICE ADOPTS NVIDIA AI

PUTTING AI TO WORK

17

NVIDIA EGXEdge Supercomputing Platform

18

SUPERCOMPUTING CLOUD

Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],

Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.

CPU Instance 48 Hours, $152

Amber, Chroma,

GROMACS, GTC, LAMMPS

MILC, NAMD, QE, SPECFEM3D,

TensorFlow, VASP

SUPER COMPUTING IS HARD — CLOUD HPC IS EXPENSIVE

19

SUPERCOMPUTING CLOUD

8x GPU Instance

1x GPU Instance

CPU Instance 48 Hours, $152

Amber, Chroma,

GROMACS, GTC, LAMMPS

MILC, NAMD, QE, SPECFEM3D,

TensorFlow, VASP

Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],

Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.

SUPER COMPUTING IS HARD —GPU CLOUD 1/7TH COST OF CPU CLOUD

48x Faster, 1/7th the Cost

20

ICECUBE OBSERVATORY DETECTING NEUTRINOS

50K NVIDIA GPUs IN THE CLOUD

350 PF OF SIMULATION FOR 2 HOURS

PRODUCED 5% OF ANNUAL SIMULATION DATA

AWS, MICROSOFT AZURE, GOOGLE CLOUD PLATFORM

DISTRIBUTED ACROSS U.S., EUROPE, APAC

Frank Wüerthwein, Ph.D.

Executive Director, Open Science Grid

Igor Sfiligoi

Lead Developer and Researcher

MULTIPLE GENERATIONS, ONE APPLICATION

Events Processed Per GPU Type

V100

M60

K80

T4

P40

P100

THE LARGEST CLOUD SIMULATION IN HISTORY

21

Up to 800 V100 GPUs Connected via Mellanox InfiniBand

ANNOUNCINGWORLD’S LARGEST ON-DEMAND SUPERCOMPUTER

22

DIVERSE ARM ARCHITECTURES

AMPERE COMPUTING eMAGHyperscale and Storage

AMAZON GRAVITONHyperscale and SmartNIC

MARVELL THUNDERX2Hyperscale, Storage and HPC

FUJITSU A64FXSupercomputing

HUAWEI KUNPENG 920Big Data and Edge

23

NVIDIA CUDA ON ARM AT OAK RIDGE NATIONAL LAB

Benchmark Application [Dataset]: GROMACS [ADH Dodec- Dev prototype], LAMMPS [LJ 2.5], MILC [Apex Small], NAMD [apoa1_npt_cuda], Quantum Espresso [AUSURF112-jR], Relion [Plasmodium Ribosome], SPECFEM3D

[four_material_simple_model], TensorFlow [ResNet50: Batch:256]; CPU node: 2x ThunderX2 9975; GPU node: Same CPU node + 2x V100 32GB PCIe ; *1xV100 for GROMACS, MILC, and TensorFlow

24

ANNOUNCINGNVIDIA HPC FOR ARM

HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink

4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge

CUDA on Arm Beta Available Now

NIC PCIe Switch PCIe Switch NIC

CPU CPU

GPU

GPU

GPU

GPU

25





NIC PCIe Switch PCIe Switch NIC

CPU CPU

GPU

GPU

GPU

GPU

26





APPLICATIONS

PROGRAMMING MODELS

C++

CUDA

FORTRAN

COMET

DCA++ GROMACS

INDEX

LAMMPS

LSMS

MATLAB

MILC

NAMD

OPTIX

RELION

TENSORFLOW

PARAVIEW

OPENACC

PYTHON

ARM ALLINEA STUDIO

BRIGHT COMPUTING

CMAKE

CUDA-GDB

CUPTI

GCC

LLVM

NVCC

PAPI

SINGULARITY

SLURM

TAU

GAMERA

SDKS

QUANTUM ESPRESSO

PERFORCE TOTALVIEW

PGI

SCORE-P

VMD

27

28

50 GB/s 50 GB/s

EXTREME COMPUTE NEEDS EXTREME IO

TRADITIONAL RDMA

NODE A NODE B

PCIe Switch

CPU

System Memory

GPU

NIC

PCIe Switch

CPU

System Memory

GPU

NIC

29


GPUDIRECT RDMA

NODE A NODE B

PCIe Switch

CPU

System Memory

GPU

100 GB/s

NIC

PCIe Switch

CPU

System Memory

GPU

NIC

30


TRADITIONAL STORAGE

PCIe Switch

CPU

System Memory

GPU

GPUDIRECT RDMA

NODE A NODE B

NIC

PCIe Switch

CPU

System Memory

GPU

100 GB/s

NIC

PCIe Switch

CPU

System Memory

GPU

NIC

50 GB/s

31


GPUDIRECT STORAGE

PCIe Switch

CPU

System Memory

GPU

Storage

100 GB/s

GPUDIRECT RDMA

NODE A NODE B

NIC

PCIe Switch

CPU

System Memory

GPU

100 GB/s

NIC

PCIe Switch

CPU

System Memory

GPU

NIC

32

ANNOUNCING NVIDIA MAGNUM IOAcceleration Libraries for Large-scale HPC and IO

High-bandwidth, Low-latency, Massive Storage Access with Lower CPU Utilization

GPUDIRECT STORAGE

PCIe Switch

CPU

System Memory

GPU

Storage

100 GB/s

GPUDIRECT RDMA

NODE A NODE B

NIC

PCIe Switch

CPU

System Memory

GPU

100 GB/s

NIC

PCIe Switch

CPU

System Memory

GPU

NIC

33

PYTHON

CUDA

APACHE ARROW

CUDF CUGRAPH

RAPIDS

CUML

PANDAS SCI-KL / XGBOOST

CUDNN

DEEP LEARNING

FRAMEWORKS

DASK

NVIDIA RAPIDS DATA SCIENCE

Open Source | Multi-GPU and Multi-Node | Up to 100x Speed-Up | 150K Downloads in 1 Year

Data Load and Processing Times from Hours to Minutes | Used by NERSC, ORNL, NASA, SDSC

34

NVIDIA MAGNUM IO BOOSTS RAPIDS DATA ANALYTICS

20x ON TPC-H STRUCTURAL BIOLOGY — 3x VMDNEW PANGEO XARRAY ZARR READER FOR CLIMATE

Q4 TPC-H Benchmark Work Breakdown: With Repeated Query

0 400,000 800,000 1,200,000

WITHOUT GDS

WITH GDS

Latency (msec)

CUDA Startup

GPU and CPU Allocation

Data Preload

WarmupQuery

Repeat Query

CleanUp

Driver Close

35

ANNOUNCING WORLD’S LARGEST INTERACTIVE VOLUME VISUALIZATION

Simulating Mars Lander with FUN3D | Interactively Visualizing 150 TB; Unstructured Mesh

4 NVIDIA DGX-2 Streaming 400 GB/s | NVIDIA Magnum IO | NVIDIA IndeX

36

ANNOUNCINGNVIDIA DGX-2 AS SUPERCOMPUTING ANALYTICS INSTRUMENT

16 V100 GPUs - 2 PF Tensor Core | 512 GB HBM2 - 16 TB/s | 8 MLNX CX5 - 800 Gbps

30 TB NVMe - 53 GB/s with Magnum IO | Fabric Storage - 100 GB/s with Magnum IO

2.3x Faster Than Current IO500 10-node Leader

Powered by NVIDIA Magnum IO

EXTREME WEATHER AI INFERENCE

NVIDIA TENSOR RT

3D VOLUME ANALYTICSPANGEO XARRAY

VMD COMPUTATIONAL MICROSCOPENVIDIA OPTIX

3D INTERACTIVE VOLUME RENDERINGNVIDIA INDEX

TPC-H RECORD10 TB JOIN

NVIDIA RAPIDS

37


NETWORKEDGE ANALYTICS

SIMULATION

EXTREME IO

NVIDIA HPC for ARM

NVIDIA EGX Edge

Supercomputing Platform

NVIDIA DGX-2 Supercomputing

Analytics Instrument

NVIDIA Magnum IO

NGC

Azure

38

Documents

THE EXPANDING UNIVERSE OF HPCs22.q4cdn.com/364334381/files/doc_presentations/...42 new top 500 systems first ai supercomputers first exascale science abci summit ... ornl | stony brook