Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
2
THE EXPANDING UNIVERSE OF HPC
JENSEN HUANG | SC19
3
Forward-Looking Statements
Except for the historical information contained herein, certain matters in this presentation including, but not limited to, statements as to: our strategies, growth, position, opportunities, and continued expansion; the performance, benefits and impact of our products and technologies, including the NVIDIA EGX Edge Supercomputing Platform, NVIDIA HPC for ARM, NVIDIA MAGNUM IO, NVIDIA RAPIDS Data Science, and NVIDIA DGX-2; other predictions and estimates; the expanding universe of HPC; future eras of AI training; and NVIDIA’s collaboration with Microsoft on intelligent edge computing are forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. These forward-looking statements and any other forward-looking statements that go beyond historical facts that are made in this presentation are subject to risks and uncertainties that may cause actual results to differ materially. Important factors that could cause actual results to differ materially include: global economic conditions; our reliance on third parties to manufacture, assemble, package and test our products; the impact of technological development and competition; development of new products and technologies or enhancements to our existing product and technologies; market acceptance of our products or our partners’ products; design, manufacturing or software defects; changes in consumer preferences and demands; changes in industry standards and interfaces; unexpected loss of performance of our products or technologies when integrated into systems and other factors. For a complete discussion of factors that could materially affect our financial results and operations, please refer to the reports we file from time to time with the SEC, including our Form 10-K for the annual period ended January 27, 2019 and our Form 10-Q for the quarterly period ended October 27, 2019. Copies of reports we file with the SEC are posted on our website and are available from NVIDIA without charge. These forward-looking statements are not guarantees of future performance and speak only as of November 18, 2019, based on information currently available to us. Except as required by law, NVIDIA disclaims any obligation to update these forward-looking statements to reflect future events or circumstances.
SAFE HARBOR
4
AT THE INTERSECTION OF GRAPHICS, SIMULATION, AI
5
6
COMPUTING FOR THE DA VINCIS OF OUR TIME
FIRST AI SUPERCOMPUTERS FIRST EXASCALE SCIENCE42 NEW TOP 500 SYSTEMS
ABCI
SUMMIT
CLIMATE
LBNL | NVIDIA
GENOMICS
ORNL
NUCLEAR WASTE
REMEDIATION
LBNL | PNNL Brown U. NVIDIA
CANCER DETECTION
ORNL | Stony Brook U.
7
FULL STACK SPEED-UP
CUDA-X
CUDA
AI DRIVEMETRO ISAACCLARARAPIDS AERIALCG
CUDA 10.2
cuTENSOR 1.0
cuSOLVER 10.3
cuBLAS 10.2
cuDNN 7.6
TensorRT 6.0
DALI 0.15
NCCL 2.5
IndeX 2.1
OptiX 7.0
RAPIDS 0.10
Spark XGBoost
3x in 2 Years
2017
2019
2018
Time to Solution
27 Hours
20 Hours
10 Hours
Amber
Chroma
GROMACS
GTC
LAMMPS
MILC
NAMD
QE
SPECFEM3D
TensorFlow
VASP
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda], Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50] , VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
8
THE EXPANDING UNIVERSE OF HPC
NETWORKEDGE ANALYTICS
SIMULATION
AI
Edge
Cloud
Arm
Data Analytics
Extreme IO
EXTREME IO
9
INCREDIBLE ADVANCES IN AI
WRITING
DIALOG
TRANSLATION
SUMMARIZATION
Q&A
CLASSIFICATION
2012 2019
BERTTRANSFORMER
ALEXNETCNN
3D POSE
DENOISING
SEGMENTATION
OBJECT RECOGNITION
CLASSIFICATION
IMAGE GENERATION
10
GPU COMPUTING POWERS AI ADVANCES
#1 MLPERF — AI TRAINING + AI INFERENCE HPC COMPUTING CHALLENGE
Doubling
2 Years
Doubling3.4 Months
Two Distinct Eras of AI TrainingSuper Moore’s Law — From 600 to 2 Hours in 5 Years
K80 SERVER
DGX 2 Hours
600 Hours
Time to Train (ResNet-50)
11
NVIDIA AI END-TO-END PLATFORM
TRAINING AUTONOMOUS MACHINES
DGX HGX EGX AGX
EDGE AICLOUD
12
AI FOR SCIENCE
EXPERIMENTATION
DATA
SIMULATION
DATA
NEURAL ESTIMATION
Real-time Steering Fast Approximation
Design Space Exploration
ICF + MERLIN — Fusion
Inverse Problems
LIGO — Gravitational Waves
Faster Prediction
ANI + MD – Chemistry
Real-time Steering
ITER – Fusion Energy
13
14
1xData Transfer
100xData Collected
STREAMING AI
SOFTWARE-DEFINED
SENSORS
BUILD MODELSSTREAMING AI
PROCESSING
ECMWF: 287 TB/dayLSST: 20 TB/day
SKA: 16 TB/sec
15
NVIDIA EGX STACK
NGC
Kubernetes Networking Storage Security
CUDA-X
Third-Party ISVs
METROPOLIS
IMAGEPROCESSING
DECODE DNN GRAPHICS ENCODE
DEEPSTREAM
Powered by NVIDIA CUDA Tensor Core GPU | Secured Boot Root of Trust
Cryptographic Acceleration for IPsec and TLS | NVMe-oF over TCP and RDMA
Industrial-strength Cloud Native and AI Stack
NVIDIA EGX EDGE SUPERCOMPUTING PLATFORM
16
VERTICAL INDUSTRY FRAMEWORKS
Clara Metropolis
Isaac Omniverse Aerial
DRIVE
WORLD’S LARGEST DELIVERY SERVICE ADOPTS NVIDIA AI
PUTTING AI TO WORK
17
NVIDIA EGXEdge Supercomputing Platform
18
SUPERCOMPUTING CLOUD
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],
Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
CPU Instance 48 Hours, $152
Amber, Chroma,
GROMACS, GTC, LAMMPS
MILC, NAMD, QE, SPECFEM3D,
TensorFlow, VASP
SUPER COMPUTING IS HARD — CLOUD HPC IS EXPENSIVE
19
SUPERCOMPUTING CLOUD
8x GPU Instance
1x GPU Instance
CPU Instance 48 Hours, $152
Amber, Chroma,
GROMACS, GTC, LAMMPS
MILC, NAMD, QE, SPECFEM3D,
TensorFlow, VASP
Benchmark Application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec: Dev Prototype], GTC [moi#proc.in], LAMMPS [LJ 2.5], MILC [Apex Medium], NAMD [stmv_nve_cuda],
Quantum Espresso [AUSURF112-jR], SPECFEM3D [four_material_simple_model]; TensorFlow [ResNet-50], VASP [Si-Huge]; GPU node: with dual-socket CPUs with 4x V100 GPU.
SUPER COMPUTING IS HARD —GPU CLOUD 1/7TH COST OF CPU CLOUD
48x Faster, 1/7th the Cost
20
ICECUBE OBSERVATORY DETECTING NEUTRINOS
50K NVIDIA GPUs IN THE CLOUD
350 PF OF SIMULATION FOR 2 HOURS
PRODUCED 5% OF ANNUAL SIMULATION DATA
AWS, MICROSOFT AZURE, GOOGLE CLOUD PLATFORM
DISTRIBUTED ACROSS U.S., EUROPE, APAC
Frank Wüerthwein, Ph.D.
Executive Director, Open Science Grid
Igor Sfiligoi
Lead Developer and Researcher
MULTIPLE GENERATIONS, ONE APPLICATION
Events Processed Per GPU Type
V100
M60
K80
T4
P40
P100
THE LARGEST CLOUD SIMULATION IN HISTORY
21
Up to 800 V100 GPUs Connected via Mellanox InfiniBand
ANNOUNCINGWORLD’S LARGEST ON-DEMAND SUPERCOMPUTER
22
DIVERSE ARM ARCHITECTURES
AMPERE COMPUTING eMAGHyperscale and Storage
AMAZON GRAVITONHyperscale and SmartNIC
MARVELL THUNDERX2Hyperscale, Storage and HPC
FUJITSU A64FXSupercomputing
HUAWEI KUNPENG 920Big Data and Edge
23
NVIDIA CUDA ON ARM AT OAK RIDGE NATIONAL LAB
Benchmark Application [Dataset]: GROMACS [ADH Dodec- Dev prototype], LAMMPS [LJ 2.5], MILC [Apex Small], NAMD [apoa1_npt_cuda], Quantum Espresso [AUSURF112-jR], Relion [Plasmodium Ribosome], SPECFEM3D
[four_material_simple_model], TensorFlow [ResNet50: Batch:256]; CPU node: 2x ThunderX2 9975; GPU node: Same CPU node + 2x V100 32GB PCIe ; *1xV100 for GROMACS, MILC, and TensorFlow
24
ANNOUNCINGNVIDIA HPC FOR ARM
HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink
4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge
CUDA on Arm Beta Available Now
NIC PCIe Switch PCIe Switch NIC
CPU CPU
GPU
GPU
GPU
GPU
25
ANNOUNCINGNVIDIA HPC FOR ARM
HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink
4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge
CUDA on Arm Beta Available Now
NIC PCIe Switch PCIe Switch NIC
CPU CPU
GPU
GPU
GPU
GPU
26
ANNOUNCINGNVIDIA HPC FOR ARM
HPC Server Reference Platform | 8 V100 Tensor Core GPUs with NVLink
4 100 Gbps Mellanox InfiniBand| Systems Ranging from Supercomputer, Hyperscale, to Edge
CUDA on Arm Beta Available Now
APPLICATIONS
PROGRAMMING MODELS
C++
CUDA
FORTRAN
COMET
DCA++ GROMACS
INDEX
LAMMPS
LSMS
MATLAB
MILC
NAMD
OPTIX
RELION
TENSORFLOW
PARAVIEW
OPENACC
PYTHON
ARM ALLINEA STUDIO
BRIGHT COMPUTING
CMAKE
CUDA-GDB
CUPTI
GCC
LLVM
NVCC
PAPI
SINGULARITY
SLURM
TAU
GAMERA
SDKS
QUANTUM ESPRESSO
PERFORCE TOTALVIEW
PGI
SCORE-P
VMD
27
28
50 GB/s 50 GB/s
EXTREME COMPUTE NEEDS EXTREME IO
TRADITIONAL RDMA
NODE A NODE B
PCIe Switch
CPU
System Memory
GPU
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
29
EXTREME COMPUTE NEEDS EXTREME IO
GPUDIRECT RDMA
NODE A NODE B
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
30
EXTREME COMPUTE NEEDS EXTREME IO
TRADITIONAL STORAGE
PCIe Switch
CPU
System Memory
GPU
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
50 GB/s
31
EXTREME COMPUTE NEEDS EXTREME IO
GPUDIRECT STORAGE
PCIe Switch
CPU
System Memory
GPU
Storage
100 GB/s
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
32
ANNOUNCING NVIDIA MAGNUM IOAcceleration Libraries for Large-scale HPC and IO
High-bandwidth, Low-latency, Massive Storage Access with Lower CPU Utilization
GPUDIRECT STORAGE
PCIe Switch
CPU
System Memory
GPU
Storage
100 GB/s
GPUDIRECT RDMA
NODE A NODE B
NIC
PCIe Switch
CPU
System Memory
GPU
100 GB/s
NIC
PCIe Switch
CPU
System Memory
GPU
NIC
33
PYTHON
CUDA
APACHE ARROW
CUDF CUGRAPH
RAPIDS
CUML
PANDAS SCI-KL / XGBOOST
CUDNN
DEEP LEARNING
FRAMEWORKS
DASK
NVIDIA RAPIDS DATA SCIENCE
Open Source | Multi-GPU and Multi-Node | Up to 100x Speed-Up | 150K Downloads in 1 Year
Data Load and Processing Times from Hours to Minutes | Used by NERSC, ORNL, NASA, SDSC
34
NVIDIA MAGNUM IO BOOSTS RAPIDS DATA ANALYTICS
20x ON TPC-H STRUCTURAL BIOLOGY — 3x VMDNEW PANGEO XARRAY ZARR READER FOR CLIMATE
Q4 TPC-H Benchmark Work Breakdown: With Repeated Query
0 400,000 800,000 1,200,000
WITHOUT GDS
WITH GDS
Latency (msec)
CUDA Startup
GPU and CPU Allocation
Data Preload
WarmupQuery
Repeat Query
CleanUp
Driver Close
35
ANNOUNCING WORLD’S LARGEST INTERACTIVE VOLUME VISUALIZATION
Simulating Mars Lander with FUN3D | Interactively Visualizing 150 TB; Unstructured Mesh
4 NVIDIA DGX-2 Streaming 400 GB/s | NVIDIA Magnum IO | NVIDIA IndeX
36
ANNOUNCINGNVIDIA DGX-2 AS SUPERCOMPUTING ANALYTICS INSTRUMENT
16 V100 GPUs - 2 PF Tensor Core | 512 GB HBM2 - 16 TB/s | 8 MLNX CX5 - 800 Gbps
30 TB NVMe - 53 GB/s with Magnum IO | Fabric Storage - 100 GB/s with Magnum IO
2.3x Faster Than Current IO500 10-node Leader
Powered by NVIDIA Magnum IO
EXTREME WEATHER AI INFERENCE
NVIDIA TENSOR RT
3D VOLUME ANALYTICSPANGEO XARRAY
VMD COMPUTATIONAL MICROSCOPENVIDIA OPTIX
3D INTERACTIVE VOLUME RENDERINGNVIDIA INDEX
TPC-H RECORD10 TB JOIN
NVIDIA RAPIDS
37
THE EXPANDING UNIVERSE OF HPC
NETWORKEDGE ANALYTICS
SIMULATION
EXTREME IO
NVIDIA HPC for ARM
NVIDIA EGX Edge
Supercomputing Platform
NVIDIA DGX-2 Supercomputing
Analytics Instrument
NVIDIA Magnum IO
NGC
Azure
38