S8901 – Quadro for AI, VR and Simulation - NVIDIA · 2018. 4. 11. · PCI Express Gen 3 x16: NVLink. 200 GB/sec Bidirectional | 25% improvement: Display Connectors. 4x DisplayPort

S8901 – Quadro for AI, VR and Simulation

Carl Flygare, PNYQuadro Product Marketing Manager

Allen Bourgoyne, NVIDIASenior Product Marketing Manager

“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”Edsger Dijkstra

Intelligence Abounds in NatureA very small sampling

Technological IntelligenceHomo sapiens’ essential differentiator

Thalmocortical brain network

3 million neurons, 476 million synapses

Full human brain

106 billion neurons, 1,000 trillion synapses

Artificial Intelligence: Where we Stand TodayGoogle’s IQ is slightly below a six-year-old human’s

Google 47.28 | 78.42% increase since 2014

Baidu 32.92 | 40.08% increase since 2014

Microsoft Bing 31.98

Apple Siri 23.90

Source: http://www.zdnet.com/article/google-ai-vs-siri-vs-bing-iq-tests-show-one-is-smartest-by-a-mile/

AI IQ’s significantly lower than an 18-year-old’s average 97 score

In 2014 two of the three researchers found Google had an IQ of 26.5 compared to Baidu’s 23.5

NVIDIA QuadroEvery segment benefits from AI, VR and simulation

Manufacturing CAE Media and Entertainment Automotive

AEC Energy (Oil and Gas) Scientific and Technical Healthcare

Entry

NVIDIA Quadro | AI, VR and Simulation Open New Possibilities

Small and Simple CAD Models, Entry PLM

Medium Size and Complexity CAD Models,PLM, Basic DCC, Medical Imaging

Professional VR, Complex CAD Models, CAE, PhotorealisticRendering, Complex DCC and VFX, Medical Imaging

P4000 8 GB

Professional VR, Very Complex CAD Models, CAE, PhotorealisticRendering, Advanced DCC and VFX, 3D Medical Imaging

P5000 16 GB

P6000 24 GBCollaborative VR, Extremely Complex CAD Models, CAE, PhotorealisticRendering, DCC and VFX, Seismic Exploration, 3D Medical Imaging

P620 2 GB

P400 2 GB

P2000 5 GB

P1000 4 GB

GP100 16 GBAI (Deep Learning) Development, Collaborative VR, CAE Simulations,Ultimate CAD Models, Photorealistic Rendering and GPGPU Compute

Basic Mid Range Upper Range High End Ultra High End

GV100 32 GB

NVIDIA Quadro GP100NVIDIA Quadro GV100 | Reinventing the Workstation for AI

NVIDIA Quadro GP100NVIDIA Quadro GV100 x 2 | NVLink Scalable Workstation AI

NVIDIA Quadro GV100 and NVLinkScaling performance and memory*

*Application support for NVLink required. Maximum of two GV100 boards can be connected with NVLink.

High speed GPU and memory connection for GV100

▪ NVLink combines two GV100s for twice the compute power and 64 GB of memory

▪ Up to 200 GB/sec bidirectional bandwidth, 25% improvement

▪ Used in pairs, two dedicated NVLink connectors on GV100 boards

▪ Provides SLI functionality for GV100 boards

NVIDIA Quadro GV100Technical specifications

GPU Architecture Volta

CUDA and Tensor Cores 2560 (FP64), 5120 (FP32), 640 (Tensor)

Memory Capacity 32 GB HBM2

Peak Memory Bandwidth 870 GB/sec

FP64 (Double Precision) 7.4 TFLOPS | 42% improvement

FP32 (Single Precision) 14.8 TFLOPS | 44% improvement

FP16 (Half Precision) 118.5 TFLOPS (Matrix Multiply with FP16 or 32 Accumulate)

INT8 (Integer) 59.3 TOPS | 26% improvement

System Interface PCI Express Gen 3 x16

NVLink 200 GB/sec Bidirectional | 25% improvement

Display Connectors 4x DisplayPort 1.4 with HDCP 2.2

4K Display Support 4x 4096 x 2160 at 120 Hz with HDR



VR Ready and Stereo Yes, Stereo via 3-pin mini-DIN Connector Bracket

NVIDIA Quadro GV100Unmatched compute capabilities

INT8 59.3

FP64FP32FP16

7.414.8118.5

TFLOPSTFLOPSTFLOPSTOPS

NVIDIA Quadro GV100Features and benefits relative to GP100

GP100 GV100 Benefit

GPU Architecture Pascal Volta Most powerful, efficient and AI optimized GPU

CUDA Cores 3584 5120 Significantly greater compute and rendering performance

FP64 Performance 5.2 TFLOPS 7.2 TFLOPS 1.4x greater FP64 compute performance

Memory Size 16 GB HBM2 32 GB HBM2 2.0x memory capacity

Memory Bus Width 4096-bit 4096-bit Radically advanced memory bus implementation

Peak Memory Bandwidth 717 GB/sec 870 GB/sec Move data to and from GPU 1.2x faster

Display Support 4x DP 1.4 + 1x DVI-D DL 4x DP 1.4 and HDCP 2.2 Supports four 4K, 5K or 8K displays, latest HDCP

HDR Image Support Yes Yes More lifelike images

Advanced Display Quadro Sync II Quadro Sync II Synchronize up to 8 GPUs per system

VR Ready Yes Yes, GV100 implements full suite of hardware optimizations

NVLink NVLink (First Generation) NVLink (Second Generation) Higher performance means lower latency

Board Power 235 W 250 W Better performance per Watt

Auxiliary Power Connector 8-pin PCIe 8-pin PCIe Simplified power supply connectivity

Form Factor 4.4” H x 10.5” L Dual Slot 4.4” H x 10.5” L Dual Slot No significant mechanical or thermal changes

NVIDIA Quadro GV100Redefines state of the art across essential solutions

Artificial Intelligence

Tensor processor cores

NVIDIA GPU deep learning stack

ISV DL and ML framework optimization

Iterate and innovate faster

Reduce training time

RTX Rendering

Unrivaled FP32 performance

Largest models in GPU memory

AI accelerated photorealistic rendering

Neural network character animation

Apply AI to simultaneous video streams

Compute

Industry leading HPC capabilities

Work with largest datasets

Integrate simulation into design process

Utilize generative design algorithms

Fastest FEA, CFD, CEM available

Immersive Visualization (VR)

Includes VR hardware optimizations

Full NVIDIA VRWORKS support

Create new AI-augmented technologies

Visualize the largest datasets

Collaborative VR environments (Holodeck)

Connect two GV100 boards with NVLink to provide 64 GB of memory and twice the GPU processing power in standard workstation enclosures

NVIDIA Quadro GV100RTX rendering lets you dream and create at the speed of thought

Architectural Design

Visualize cities or urban street scenes in every photorealistic detail

Product Design

Design with physically based lights and materials in realtime

Media and Entertainment

Perfect every shot with GPI accelerated and AI enhanced rendering

Work at full fidelity, utilizing massive datasets with 2x larger memory capacity

Master rendering projects interactively with AI (Deep Neural Network) technology

NVIDIA QuadroRTX supercharges rendering with AI accelerated denoising

Denoising On

20 Frames

Denoising Off

20 Frames

Denoising Off

290 Frames

High quality results with fluid visual interactivity throughout the design process

NVIDIA QuadroCompanies working with NVIDIA’s OptiX AI denoiser technology

Image courtesy of Isotropix, rendered with Clarisse and denoised with NVIDIA OptiX.

NVIDIA QuadroCAD and CAE workflow elements

Design (CAD) Simulation (CAE) Post-ProcessingPre-Processing

NVIDIA Quadro GV100Benefit from the ultimate immersive experiences

RTX Rendered Graphics Interactive Physics GPU-Accelerated AIRealtime Collaboration

2x larger memory capacity lets you work with high fidelity, massive datasets (v. GP100)

Benefit from unconstrained Holodeck experiences with full-featured VR performance and capabilities

NVIDIA Quadro GV100Realize new opportunities with AI

32 GB or 64 GB capacity (NVLink) trains neural networks with massive datasets

Develop with NVIDIA optimized Deep Learning frameworks and deploy with NGC interoperability and scalability

Accelerate AI training and inferencing on workstations with Tensor cores and NVLink

NGC

Retail store inferencing with Quadro by DeepBlue Technology, China

Development Aggregation Inferencing At-The-Edge

NVIDIA Quadro GV100 AI Training PerformanceUp to 2x improvement in Deep Learning training performance*

GP100 Batch Size 256

GV100 Batch Size 256

Tensor Flow ResNet-50 Training IPS


*Based on TensorFlow Resnet-50 Training. Tests run on dual Intel Xeon E5 2690 v4 at 2.6 GHz, NVIDIA driver version 390.19, ResNet-50 Training.

400

300

200

100

500

600

700



Caffe ResNet-50 Training IPS

GV 100 Batch Size 256

500

400

300

200

100

600

700

800

NVIDIA Quadro GV100 Deep Learning Training PerformanceOver 2x improvement in Deep Learning training and inference performance*

1

Batch Size

TensorRT ResNet-50 Inference

8

*Based on TensorFlow Resnet-50 Training, TensorRT ResNet-50 Inference tests. Tests run on dual Intel Xeon E5 2690 v4 at 2.6 GHz, NVIDIA driver version 390.19, ResNet-50 Training.

400

300

200

100

500

600

700



Tensor FlowResNet-50 Training

GV 100 Batch Size 512

400

300

200

100

500

600

700

2 4

NVIDIA Quadro GV100 Scientific Compute PerformanceMore than 2x improvement over the previous generation*

GP100

LAAMPS Atomic Fluid Benchmark

GV100

*Based LAMMPS molecular modeling benchmark. Tests run on dual Intel Xeon E5 2690 v4 at 2.6 GHz, NVIDIA driver version 390.19, ResNet-50 Training.

400

300

200

100

500

600

700

FP32 FP64

CUDA Basic Linear Algebra Solver Benchmark

FP16

1.0

0.5

1.5

2.0

CUBLABS 2560 x 2048 x 8192

NVIDIA Quadro GV100 CAE ExampleSignificant ANSYS Mechanical 19 Acceleration*

*Power Supply Module (V19cg-1). 2x Xeon E5-2699 v4 at 2.2 GHz, 22 cores, HT off, NVIDIA driver 390.40 TCC, 256GB DRAM, CentOS 7.2.1511 64-bit. ANSYS Mechanical 19 benchmark model. Steady state thermal analysis of a power supply module, 5.3 Mdofs, JCG, real-value symmetric.

0 1 2 3 4

4 CPU Cores

3 CPU Cores + GV100

8 CPU Cores

8 CPU Cores + GV100

16 GPU Cores

Base License | 1.0

Base License | 2.65

Base + 4 HPC Licenses | 1.71


Power Supply Module (V19cg-1)


NVIDIA Quadro GV100 CAE ExampleStandout ANSYS Fluent 19 Acceleration*

*Power Supply Module (V19cg-1). 2x Xeon E5-2699 v4 at 2.2 GHz, 22 cores, HT off, NVIDIA driver 390.40 TCC, 256GB DRAM, CentOS 7.2.1511 64-bit. ANSYS Mechanical 19 benchmark model. Steady state thermal analysis of a power supply module, 5.3 Mdofs, JCG, real-value symmetric.

0 1 2 3 4 5 6

4 CPU Cores

3 CPU Cores + GV100

8 CPU Cores

16 CPU Cores + 2x GV100

32 CPU Cores

Base License | 1.0

Base License | 1.53



Pipes Model 9.6 Million Cells

Base + 2 HPC Packs | 3.29

8 CPU Cores + GV100 Base + 5 HPC Licenses | 2.67

16 CPU Cores Base + 12 HPC Licenses | 2.74

32 CPU Cores + 2x GV100 Base + 2 HPC Packs | 5.55

NVIDIA Quadro GV100 Rendering PerformanceSOLIDWORKS Visualize scales to over 29x faster than CPU*

*Based on 2x GV100, Xeon E5-2697 v3, 14 cores at 2.6 GHz, 32 GB DRAM, Win 10 Pro 64-bit Fall Creator’s Update and NVIDIA driver version 390.77. Tests run at 4K UHD (3840 x 2160) resolution.

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

CPU

P4000

P5000

P6000

GP100

GV100

2x GP100

2x GV100

P2000

NVIDIA Quadro GV100 Graphics PerformanceUp to 1.3x better than previous generation*

*Based on SPECviewperf 12.2.2 results.

1.4

1.0

0.6

1.2

0.8

Quadro GP100 Quadro GV100

geomean 3dsmax catia energy maya swmedicalcreo showcase snx

0.4

0.2

Slide Number 1Slide Number 2Slide Number 3Intelligence Abounds in Nature�A very small samplingTechnological Intelligence�Homo sapiens’ essential differentiatorArtificial Intelligence: Where we Stand Today�Google’s IQ is slightly below a six-year-old human’sNVIDIA Quadro�Every segment benefits from AI, VR and simulationNVIDIA Quadro | AI, VR and Simulation Open New Possibilities NVIDIA Quadro | AI, VR and Simulation Open New Possibilities NVIDIA Quadro GP100NVIDIA Quadro GP100NVIDIA Quadro GV100 and NVLink�Scaling performance and memory*NVIDIA Quadro GV100�Technical specificationsNVIDIA Quadro GV100�Unmatched compute capabilitiesNVIDIA Quadro GV100�Features and benefits relative to GP100NVIDIA Quadro GV100�Redefines state of the art across essential solutionsNVIDIA Quadro GV100�RTX rendering lets you dream and create at the speed of thoughtNVIDIA Quadro�RTX supercharges rendering with AI accelerated denoisingNVIDIA Quadro�Companies working with NVIDIA’s OptiX AI denoiser technologyNVIDIA Quadro�CAD and CAE workflow elementsNVIDIA Quadro GV100�Benefit from the ultimate immersive experiencesNVIDIA Quadro GV100�Realize new opportunities with AINVIDIA Quadro GV100 AI Training Performance�Up to 2x improvement in Deep Learning training performance*NVIDIA Quadro GV100 Deep Learning Training Performance�Over 2x improvement in Deep Learning training and inference performance*NVIDIA Quadro GV100 Scientific Compute Performance�More than 2x improvement over the previous generation*NVIDIA Quadro GV100 CAE Example�Significant ANSYS Mechanical 19 Acceleration*NVIDIA Quadro GV100 CAE Example�Standout ANSYS Fluent 19 Acceleration*NVIDIA Quadro GV100 Rendering Performance�SOLIDWORKS Visualize scales to over 29x faster than CPU*NVIDIA Quadro GV100 Graphics Performance�Up to 1.3x better than previous generation*Slide Number 30

Documents

S8901 – Quadro for AI, VR and Simulation - NVIDIA · 2018. 4. 11. · PCI Express Gen 3 x16: NVLink. 200 GB/sec Bidirectional | 25% improvement: Display Connectors. 4x DisplayPort